How to prevent XPath injection attacks

Written by

Marcelo Oliveira

May 10, 2023

0 mins read

Web applications are vulnerable to several kinds of attacks, but they’re particularly susceptible to code injection attacks. One such attack, the XPath Injection, takes advantage of websites that require user-supplied information to access data stored in XML format. All sites that use a database in XML format might be vulnerable to this attack.

XPath is a query syntax that websites can use to search their XML data stores. When properly executed, XPath queries are a legitimate means of locating data held by elements and attributes. Websites using databases with XML formatted data integrate user input (for example, a username and password) into XPath queries to locate the required data.

However, similar to other injection attacks (such as the popular SQL injection attack) a malicious actor can exploit this process using malformed information. An attacker can find out how the XML data is structured and access data they may not normally have access to. Attackers who obtain this information can elevate their access privileges (an “escalation of privilege” attack), possibly compromising the application and sensitive data and using the information or access to attack the rest of the organization.

Beyond data deletion or corruption, data exfiltration can enable hackers to run application commands, provide entry for further server attacks, and bring down the entire organization with other attacks. The consequences of these attacks can damage or destroy the reputation of our applications and a company’s credibility with its clients.

Fortunately, we can protect our sites against XPath injection attacks. In this article, we’ll explore XML vulnerabilities and learn how to prevent them from compromising application data.

Identifying and correcting XPath injection vulnerabilities

To see how we can prevent XPath injections, we’ll need to examine one. Then, we’ll learn how best to protect our apps and data from this attack.

Exploring a vulnerable authentication app

Let’s use an application requiring users to log in with a username and password.

Create a folder named xpath and inside this folder, create a file named users_info.xml with the XML data below:

1<users>
2
<userinfo>
  <user_id>1</user_id>
  <username>jennifer</username>
  <password>4uyFh6v0</password>
  <account_no>757456</account_no>
</userinfo>
9
<userinfo>
  <user_id>2</user_id>
  <username>kimkimani</username>
  <password>6wreD1678</password>
  <account_no>870965</account_no>
</userinfo>
16
<userinfo>
  <user_id>3</user_id>
  <username>michael</username>
  <password>gf5dG63g</password>
  <account_no>345689</account_no>
</userinfo>
23
<userinfo>
  <user_id>4</user_id>
  <username>robert</username>
  <password>yH8wej3h</password>
  <account_no>096417</account_no>
</userinfo>
30
31</users>

Inside the 0xpath folder, create a new file, name it main.py, and paste the following code into the file:

1# Import the required modules
2from lxml import etree
3
4# Parse the XML data
5tree = etree.parse("users_info.xml")
6
7def login(username, password):
8	expression = "/users/userinfo[username='" + username + "' and password='" + password + "']"
9
10	results = tree.xpath(expression)
11
12	# if no results, which means login details did not match
13	if len(results) < 1:
14    		print("Incorrect credentials! Check your username or password.")
15
16	# if login details match
17	else:
18    	for result in results:
19        	print(f'Login successful! username: {result[1].text}, password: {result[2].text}') 
20# Get the user-provided input
21username = input("Enter your username: ")
22password = input("Enter your password: ")
23
24login(username, password)

The code loads the XML data, and then executes an XPath query to select the XML data that matches the user-provided credentials. If it finds a match, it authenticates the user. If it doesn’t find a match, the app displays the message, “Invalid credentials! Check your username or password.”

Before testing this code, install the Python lxml package for processing XML data. Run this command to install it:

1pip install lxml

To test the code, call the login function and pass your username and password. You can do this by adding this code snippet to the bottom of your main.py file:

1login(username="kimkimani", password="6wreD1678")

To execute the file, run the command below in the terminal.

1python3 main.py

A success message is displayed. You can also try to log in with incorrect credentials, which results in a failure message.

This code is vulnerable to XPath Injection attacks because it links the user input to the XPath query, modifying the structure of the query and injecting malicious code.

A typical XPath query is a text string specifying the elements you want to select from XML data.

In the code above, the query "/users/userinfo[username='" + username + "' and password='" + password + "']" selects all userinfo elements that match the user-supplied username and password.

However, supplying a username and password containing special characters or keywords used in XPath queries modifies the structure of the query, and allows an attacker to bypass the login system. For example, supplying "' or 1=1" as both username and password results in the following XPath query:

1//users/userinfo[username = '' or 1=1' and password = '' or 1=1' ]

This query selects all the userinfo elements with a username and password attribute that are either empty or equal to "1=1". The logical expression "1=1" always evaluates to true regardless of the actual values of the username and password attributes.

Suppose an attacker wants to gain access without a correct username or password. Instead of valid credentials, the attacker could insert any of the following XPath injections into the login form as both the username and password:

'or'1'='1
' or 1=1 or 'a'='a'
' or ''=''
' or '1'='1'
'a' or true() or '
'text' or '1' = '1'

Each of the above statements successfully bypasses application security by exploiting the logic and structure of XPath queries because all the XPath queries formed from the above inputs always evaluate to true. Also, assuming attackers know or can guess a username (for example, kimkimani), they can run the following query:

1kimkimani' or '1'='1

When we run this query in the form of a username and password, the resulting query selects the userinfo element with the username kimkimani, any userinfo element named kimkimani or '1'='1', or any userinfo element where the value of the username attribute is '1'='1' and the value of the password attribute is '1'='1'. Since all userinfo elements meet these conditions, the login system allows the attacker to log in.

Furthermore, depending on the XML document’s structure, an attacker can send an injection based on the node positions as follows:

1'or position()=4 or'

This query allows attackers to bypass authentication based on the user’s position in the XML document’s structure. The position() function is a built-in XPath function that returns the position of the current node relative to its parent. The above expression evaluates to true for the fourth node in the XML document allowing the attacker to log in.

Preventing XPath injection vulnerabilities

While there is no shortage of ways to exploit insecure XML data, we have several modes of alleviating these vulnerabilities within our websites and applications. Let’s explore a few.

Sanitize inputs

One strategy for preventing XPath Injection attacks is to filter the characters that users can input. Although it’s not foolproof, this method ensures that we block inputs that could form the bases of injection queries.

To implement this approach, create a file, regex.py, inside the xpath folder and paste the following code into the file.

1# Import the required modules
2from lxml import etree
3import re
4
5# Parse the XML data
6tree = etree.parse("users_info.xml")
7
8# Create an XPath evaluator
9evaluator = etree.XPathEvaluator(tree, namespaces=None)
10
11# login function
12def login(username, password):
13    if len(re.findall('[^a-zA-Z0-9]', username)) > 0 or ⏎ len(re.findall('[^a-zA-Z0-9]', password))>0:
14        print("Suspected characters detected! Login failed!")
15        return False
16    else:
17        expression = "/users/userinfo[username='" + username + "' and password='" +⏎  password + "']"
18        results = tree.xpath(expression)
19
20        # if no results, which means login details did not match
21        if len(results) < 1:
22            print("Incorrect login details!")
23
24        # if login details match
25        else:
26            for result in results:
27                print("Login successful for user", result[1].text)
28
29login(username="kimkimani", password="6wre1678")

This code uses a regular expression (regex) to detect all the characters that are not numeric or alphabetic. When any of those characters are detected, a “login failed” message appears in the terminal, and we immediately return False to the consuming function to prevent further code execution. We can test the code with the characters used for the XPath injection discussed in the section above.

While this approach enhances our data security, input validation only filters out the characters we remember to include in sanitizing functions. Furthermore, password input values should have special characters, so we don’t want to block these inputs.

So, let’s explore some alternative methods of avoiding XPath injection vulnerabilities.

Use parameterized (prepared) XPath queries

To demonstrate this approach, create the file params.py inside the xpath folder and paste in the following code:

1# Import the required modules
2from lxml import etree
3
4# Parse the XML data
5tree = etree.parse("users_info.xml")
6
7def login(username, password):
8    expression = "/users/userinfo[username=$username and password=$password]"
9
10    results = tree.xpath(expression, username=username, password=password)
11
12    # if no results, which means login details did not match
13    if len(results) < 1:
14        print("Invalid credentials! Check your username or password.")
15        return False
16    # if login details match
17    else:
18        for result in results:
19            print(f'Login successful! username: {result[1].text}, password:⏎ {result[2].text}')
20
21login(username="kimkimani", password="6wreD1678")

Parameterized XPath queries helps us avoid linking the user input to the XPath query — passing it as a parameter instead. Apart from making the queries more secure, parameterized XPath queries are more flexible and reusable. However, they don’t prevent all injections. So, let’s look at another preventative measure.

Use precompiled XPath queries

Precompiled XPath queries aren’t constructed from user-supplied data. Therefore, they’re the only fully secure way to prevent injection attacks. To see how precompiled XPath queries help, create a file named compiled.py and add this code:

1import lxml.etree as et
2
3xml = et.parse("users_info.xml")
4find = et.xpath("/users/userinfo[username=$username and password=$password]")
5results = find(xml, username="'or'1'='1", password="'or'1'='1")
6print(results)

The code uses the etree.xpath function to compile an XPath query, which it then stores in the find variable. We can execute this compiled query multiple times using the find function without having to recompile the query each time. This option is the most effective, as we don’t have to worry about any characters we should have escaped.

Best practices for spotting and avoiding XPath injection vulnerabilities

In addition to using the preventative measures above, of which precompiled queries are the best practice, we should implement the following best practices for spotting and avoiding XPath injection vulnerabilities.

Check packages and code for vulnerabilities

Use a tool like Snyk Open Source Advisor to scan open source packages we use for potential XPath injection vulnerabilities. Providing a comprehensive health overview, Snyk Advisor allows us to feel more confident in the security of our web app.

Similarly, we can use Snyk Code to perform source code scans and identify potential security vulnerabilities, including XPath injections. Snyk Code can quickly and easily identify any potential vulnerabilities in our code and take steps to fix them.

The screenshot below shows the results of performing a source code scan of the project folder using Snyk Code.

Avoid linking user-input and XPath queries

Don’t concatenate user-supplied input directly into XPath queries. This is one of the most common ways XPath injection vulnerabilities occur. Instead, using parameterized XPath queries allows the use of placeholders for user-supplied input. This can make code more readable and maintainable, making it easier to sanitize user-supplied input and prevent XPath injection attacks.

Be conscious of special characters

If we can’t use parameterized queries, properly escape any special characters in user-supplied input before using it in an XPath query.

Implement prepared statements and binding variables

Using prepared statements and binding variables in XPath queries can help prevent an attacker from injecting arbitrary XPath queries into an application's input, as it treats the input as a variable value rather than part of the query itself.

Review code regularly

In addition to using tools like Snyk Advisor and Snyk Code, regularly reviewing and testing code for potential XPath injection vulnerabilities is essential. This can help identify and fix potential vulnerabilities before attackers exploit them. We can use manual code review, automated code analysis tools like Snyk Code, and penetration testing to identify and correct code vulnerabilities.

Conclusion

XPath injection can severely harm websites, data, and organizations’ reputations and risk access to future breaches. However, with a little extra effort, we can protect our applications by using tools like Snyk Advisor and Snyk Code, following best practices to spot and avoid XPath injection vulnerabilities in our code, and patching vulnerabilities with care.

We can secure our applications against XPath injection attacks by properly sanitizing user-supplied input, using parameterized XPath queries, and especially by using precompiled XPath queries — which aren’t constructed from user-supplied data making them the only fully secure way to prevent injection attacks. These preventative measures are essential and can’t be overlooked, given the consequences of XPath injection attacks.

Get started in capture the flag

Learn how to solve capture the flag challenges by watching our virtual 101 workshop on demand.

Watch now