Skip to main content

Secure Java URL encoding and decoding

Escrito por:
Jura Gorohovsky

Jura Gorohovsky

wordpress-sync/security-champions-guide

14 de agosto de 2023

0 minutos de leitura

URL encoding is a method that ensures your URL only contains valid characters so that the receiving server can correctly interpret it. According to the RFC 3986 standard, URIs (which are a superset of URLs) only contain a limited set of characters consisting of digits, letters, and a few graphic symbols, all within the ASCII character set.

If a URL contains characters outside this limited set, the characters must be percent-encoded. Percent-encoding means a character is converted into a two-digit hexadecimal representation of eight bits with the % escape character ahead of them. The same process should be applied to delimiters (for example &, /, ?, or #) present in ASCII when used outside their expected structural positions in the URL.

In comparison, URL decoding is a method that converts a percent-encoded URL back to its original form, restoring any nonstandard characters along the way.

It’s important to understand that encoding is not encryption. Encryption is about modifying information using a secret key so the original information isn't available to anyone except the party it's sent to. In contrast, the purpose of URL encoding is not to hide parts of a URL from an outside observer but rather to ensure that the URL is easily and unequivocally interpretable by the receiving server; and to prevent manipulation of the URL by the user of the client that is constructing and sending the URL.

Failure to encode a URL can result in various issues. For instance, your application may be unable to compose the URL to send it to the server. Additionally, the server receiving the URL may be unable to parse it correctly, leading to an error response. Another risk is that an unencoded URL can be tampered with, exposing your application to potential security threats.

Each programming language provides one or more APIs for encoding and decoding URLs. This article discusses Java, why URL encoding and decoding are important, and how to approach it properly.

What is URL encoding and decoding in Java?

When talking specifically about Java, URL encoding and decoding is important for the following use cases:

  • Processing free-form data that a visitor enters in an HTML form, such as a search form.

  • Constructing calls to an external API from code by adding query parameters to a base URL.

  • Constructing calls to an API gateway used for further request routing to internal services.

A URL has the following structure:

blog-java-encoding-url-structure

Normally, you don't need to encode the entire URL. Of course, there are cases when the path section may contain spaces from user-uploaded files — there's even such a thing as Punycode for hostnames. However, in most cases, you control the host and the path sections, which means you only need to encode the parts of a URL that represent variable data (i.e. parameter values in a query string) one by one.

One specific instance when you do need to encode the entire URL is when that URL is passed as a parameter in a query string of a different URL.

Implementing safe and secure URL encoding and decoding in Java

To better understand URL encoding and decoding in Java, look at a pair of classes that are commonly used in Java applications to encode and decode query string parameters.

How to encode URLs in Java

To apply percent-encoding to values of query string parameters in a Java application, you normally use the java.net.URLEncoder class and its encode() method.

Here's exactly what the encode() method does:

  • It ensures all alphanumeric characters — such as a through z, A through Z, 0 through 9, and special characters ., -, *, and _ — remain intact.

  • It converts the space character into a plus sign +.

  • All other characters are percent-encoded.

This method was created to prepare data from HTML forms for submission by converting it to the application/x-www-form-urlencoded MIME format, which works for encoding URL query parameter values.

There are three overloads of the encode() method:

1. encode(String s, String enc) allows you to explicitly set the encoding scheme as a string (UTF-8 is recommended). You can use this overload, but note that it throws a checked UnsupportedEncodingException, which means your code needs to handle it using an @throws declaration or a try/catch block. It's also important to note that using string literals comes with the risk of introducing typos:

1 String url;
2  try {
3      url = "https://example.com/search?q=" +
4            URLEncoder.encode(parameterValue, "UTF-8");
5  } catch (UnsupportedEncodingException e) {
6      throw new RuntimeException(e);
7  }

2. encode(String s, Charset charset) has been available since Java 10 and is the best overload so far. You use a constant definition for UTF-8 (StandardCharsets.UTF_8), which eliminates the risk of typos in specifying encoding and doesn't throw any checked exceptions. This means you don't need to handle them to compile your code:

1String url = "https://example.com/search?q=" +
2              URLEncoder.encode(parameterValue, StandardCharsets.UTF_8);

3. encode(String s) is the oldest overload and is marked as deprecated in OpenJDK 17. You shouldn't use this overload, as it uses the default encoding of the platform that the Java virtual machine (JVM) is running on, which is not guaranteed to be UTF-8.

The URLEncoder.encode() has a quirk in that it decodes a space as the plus character instead of %20, probably due to following a description of query strings in an older standard. For this reason, developers sometimes modify the output of encode() to replace the plus character with %20 to represent space:

1return URLEncoder.encode(parameter, StandardCharsets.UTF_8).replaceAll("\\+", "%20");

For instance, say you need to search for repositories using the GitHub REST API. GitHub has extensive search qualifiers that help filter search results by language, repository size, and visibility. For example, a search for "user:defunkt forks:>100" returns all repositories by the user defunkt that have one hundred or more forks. You can use all these filters in API calls, but you need to wrap them in the q query parameter:

1String searchQuery = "language:Java stars:100..1000 pushed:>2018-01-01 is:public";
2String url = "https://api.github.com/search/repositories?q=" +
3           URLEncoder.encode(searchQuery, StandardCharsets.UTF_8) +
4           "&per_page=10&sort=stars&order=desc";
5HttpResponse<String> response = sendGetRequest(url);

In this code, searchQuery holds a set of search qualifiers that helps you find all public Java repositories with 100 to 1,000 stars updated since 2018. The value in this sample is hardcoded, but it could also come from reading a file, database, or direct user input through a web or mobile application.

When constructing url, this code concatenates three strings:

  1. The API's base URL, the query string delimiter ?, the q parameter, and its assignment delimiter =: "https://api.github.com/search/repositories?q=".

  2. The searchQuery, which is the value of the parameter q, is percent-encoded using URLEncode().

  3. The remaining set of query parameters doesn't need to be encoded because they're hardcoded and do not contain any illegal characters.

When executing this code, the resulting URL is https://api.github.com/search/repositories?q=language%3AJava+stars%3A100..1000+pushed%3A%3E2018-01-01+is%3Apublic&per_page=10&sort=stars&order=desc. The value of q is encoded to la nguage%3AJava+stars%3A100..1000+pushed%3A%3E2018-01-01+is%3Apublic, where : is replaced with %3A, spaces are replaced with +, and > is replaced with %3E.

How to decode URLs in Java

Explicitly decoding URL query parameters occurs less often because many frameworks, including Spring Boot, handle decoding automatically.

If you're not relying on a framework, then the process should depend on what you're going to do next.

Chances are that you're receiving the URL to decide what actions to perform, such as query data or rerouting the request to a different service. If so, your processing logic probably involves analyzing each query parameter separately. In this case, you may want to start with analyzing the URL, extracting the query string, and decoding parameter values individually.

For decoding, java.net.URLDecoder.decode() can be used to decode percent-encoded characters:

1String encodedUrl = "https://www.google.com/search?q=it%27s+my+party&newwindow=1&sxsrf=APwXEdeEqrxGIrZCgLpZFvGUSzgPweokog%3A1682563238731";
2URI uri = URI.create(encodedUrl);
3List<Map.Entry<String, String>> queryParamsAndValues = Arrays.stream(uri.getRawQuery().split("&"))
4      .map(param -> Map.entry(param.split("=")[0], URLDecoder.decode(param.split("=")[1], StandardCharsets.UTF_8)))
5      .toList();

Here, encodedUrl contains a Google search URL with query parameters percent-encoded by the browser. This code creates a new uri object of type URI to extract the entire query string. This object provides a method called getRawQuery() that returns only the query string with all the parameter values still encoded:

q=it%27s+my+party&newwindow=1&sxsrf=APwXEdeEqrxGIrZCgLpZFvGUSzgPweokog%3A1682563238731

Then the code splits the raw query by the & delimiter, resulting in an array of individual parameter/value pairs. Each pair is transformed so the parameter is left as is while the value is decoded. Finally, all the transformed pairs are collected into a list:

1q -> it's my party
2newwindow -> 1
3sxsrf -> APwXEdeEaxbGIrZCzLpZFvGUSzgPweokog:1682563238731

Once you have all the parameters separated from each other but mapped to their respective values, you can apply whatever logic you need to validate them and define your application's next steps.

Best practices for URL handling in Java

When working with URLs in Java, there are several best practices to keep in mind to ensure proper handling and avoid potential issues.

Don't skip URL encoding

If you skip encoding your URLs, expect the unexpected — be it runtime exceptions or confused responses from the servers you're trying to reach.

For example, take another look at our GitHub API call scenario and see what happens if you don't encode your search parameters:

1String searchQuery = "language:Java stars:100..1000 pushed:>2018-01-01 is:public";
2String url = "https://api.github.com/search/repositories?q=" +
3           searchQuery +
4           "&per_page=10&sort=stars&order=desc";
5HttpResponse<String> response = sendGetRequest(url);

It's important to note that in this code sample, sendGetRequest(url) wraps building an HTTP request with Java 11's request builder API:

1String auth = getAuthToken();
2HttpRequest request = HttpRequest.newBuilder()
3       .uri(new URI(url))
4       .version(HttpClient.Version.HTTP_2)
5       .header("Content-Type", "application/json")
6       .header("Authorization", auth)
7       .timeout(Duration.of(30, SECONDS))
8       .GET()
9       .build();
10
11HttpClient client = HttpClient.newHttpClient();
12return client.send(request, HttpResponse.BodyHandlers.ofString());

If you execute this code without encoding searchQuery, it will fail at runtime because the URI constructor can't create a URI object from a string that contains a non-encoded space:

1java.net.URISyntaxException: Illegal character in query at index 58: https://api.github.com/search/repositories?q=language:Java stars:100..1000 pushed:>2018-01-01 is:public&per_page=10&sort=stars&order=desc
2   at java.base/java.net.URI$Parser.fail(URI.java:2974)
3   at java.base/java.net.URI$Parser.checkChars(URI.java:3145)
4   at java.base/java.net.URI$Parser.parseHierarchical(URI.java:3233)
5   at java.base/java.net.URI$Parser.parse(URI.java:3175)
6   at java.base/java.net.URI.<init>(URI.java:623)
7   at org.example.CallGitHubAPI.sendGetRequest(CallGitHubAPI.java:67)

But what if you're stubborn and want to send this request anyway? You could try the Scanner API that takes a URL object (as opposed to URI) and uses that to read an input stream:

1String searchQuery = "language:Java stars:100..1000 pushed:>2018-01-01 is:public";
2String url = "https://api.github.com/search/repositories?q=" +
3           searchQuery +
4           "&per_page=10&sort=stars&order=desc";
5
6URL urlFromNonEncodedString;
7Scanner inputStream = null;
8try {
9  urlFromNonEncodedString = new URL(url);
10  inputStream = new Scanner(urlFromNonEncodedString.openConnection().getInputStream());
11  System.out.println(inputStream.useDelimiter("\\A").next());
12} catch (IOException e) {
13  throw new RuntimeException(e);
14}
15finally {
16  if (inputStream != null) inputStream.close();
17}

This actually works. The API request travels to its destination, but it returns the error 400 Bad Request, indicating a suspected client side error, such as a malformed request syntax:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Server returned HTTP response code: 400 for URL: https://api.github.com/search/repositories?q=language:Java stars:100..1000 pushed:>2018-01-01

Rotem Bar does a great job showing how non-encoded URLs can be tampered with in interservice communication using REST.

One of the scenarios that Rotem describes where URL tampering is possible is when you have a service that does the following:

  1. Takes parameters from a URL coming from the client.

  2. Creates a new URL to call a different service.

Here's a code sample reproducing this scenario:

1is:public&per_page=10&sort=stars&order=desc
2   at org.example.CallGitHubAPI.callWithoutEncoding(CallGitHubAPI.java:65)
3   at org.example.CallGitHubAPI.callGitHubAPI(CallGitHubAPI.java:37)
4   at org.example.Main.main(Main.java:16)
5Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: https://api.github.com/search/repositories?q=language:Java stars:100..1000 pushed:>2018-01-01 is:public&per_page=10&sort=stars&order=desc
6   at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1997)
7   at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
8   at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
9   at org.example.CallGitHubAPI.callWithoutEncoding(CallGitHubAPI.java:62)

As you can see, given a nontrivial value like the one used in the q parameter, skipping URL encoding will come back to haunt you at one step or another. Even if you're lucky enough to actually get an okay response from the server, you can't be sure that it contains what you expected or if you've just accidentally taken advantage of a server vulnerability.

Speaking of vulnerabilities, not encoding URL parameters can easily cause them — which we discuss further in the article "Developers, Please encode your URLs".

1private static URI forwardRequestToAnotherService(String key, String user) {
2    if (!validateUser(user)) return null;
3    String newUrl = new StringBuilder()
4            .append("https://my.internal.api.com/info?key=")
5            .append(key)
6            .append("&user=")
7            .append(user)
8            .toString();
9    return URI.create(newUrl);
10}

The service takes two parameters, key and user, and does a good job of validating the user. However, key isn't validated and is just passed over as is when creating a new URL. None of the query parameters in this new URL are encoded.

When this service receives legitimate input, it creates a new URL as expected:

1forwardRequestToAnotherService("55s72502010a", "legituser")
2// Output: https://my.internal.api.com/info?key=55s72502010a&user=legituser

But what if a malicious actor submits a URL with a tampered key parameter:

forwardRequestToAnotherService("55s72502010a&user=admin#", "legituser")
// Output: https://my.internal.api.com/info?key=55s72502010a&user=admin#&user=legituser

Here's what happens:

  1. The key value passed in from the client reads 55s72502010a&user=admin#. The service appends the key to the URL without encoding.

  2. The service validates the user parameter as usual, and legituser is still considered legit because, well, it is legit.

  3. Because the original key parameter includes a & character that acts as a separator of parameters in the URL query string, the resulting URL receives two parameters — key=55s72502010a and user=admin — which are two substrings of the client-submitted key. As you can see, even though the service has validated the user legituser, the resulting URL impersonates a different user with potentially elevated permissions, and that user has bypassed validation.

  4. Because the original key parameter ends with a # character, which serves as a fragment separator in URLs, the &user=legituser part of the original client-submitted URL gets pushed away to the fragment section and is effectively ignored.

As a result, you get a URL representing an unrelated and unverified user, which qualifies as a privilege escalation attack.

So how do you defend against this attack? One way would be to validate the key value in addition to validating the user. However, you may want to delegate that validation to a different service down the request chain. If so, the easier way would be to simply encode parameter values when constructing a new URL:

1private static URI forwardRequestToAnotherService(String key, String user) {
2    if (!validateUser(user)) return null;
3    String newUrl = new StringBuilder()
4            .append("https://my.internal.api.com/info?key=")
5            .append(URLEncoder.encode(key, StandardCharsets.UTF_8))
6            .append("&user=")
7            .append(URLEncoder.encode(user, StandardCharsets.UTF_8))
8            .toString();
9    return URI.create(newUrl);
10}

With this modification in place, see what happens if someone tries to submit the malicious key value:

forwardRequestToAnotherService("55s72502010a&user=admin#", "legituser")
// Output: https://my.internal.api.com/info?key=55s72502010a%26user%3Dadmin%23&user=legituser

The encoded key value (key=55s72502010a%26user%3Dadmin%23) prevents the attempted privilege escalation because & is percent-encoded as %26 and # as %23. The resulting URL goes off to the next service with the correct username. The next service may return an error after trying and failing to make sense of the tampered key parameter, but an error response beats an attack anytime.

Use standard libraries to encode and decode URLs

Specifically in Java, for the purposes of percent-encoding query parameters, you are probably using URLEncoder.encode() and URLDecoder.decode() most of the time, as these are reliable tools. They're available in any Java project out of the box.

Another option would be to use the Open Web Application Security Project (OWASP) Java Encoder library. Although its core purpose is input validation for various contexts, it also contains the Encode.forUriComponent() method for encoding URL parameter values and REST paths. This library is a good choice if your Java application needs to safely display client-submitted URLs and their parts in various UI controls.

That said, depending on the framework you're using, you may not need to explicitly decode at all or may have to do it differently:

  • In Spring or Spring Boot, the @RequestParam annotation applied to a controller parameter enables automatic decoding of that parameter's value.

  • Inside a GWT application, you probably use methods defined in GWT's own `com.google.gwt.http.client.URL` class.

  • Android has a separate URL builder API that includes query parameter encoding.

If you're using third-party libraries to encode and decode URLs, it's a good idea to scan them for known vulnerabilities, and if vulnerabilities are found, update the libraries as soon as security fixes become available.

Validate and sanitize URLs before interacting with them in your app

URL encoding is just a part of the equation. If your web application takes data from users via URL parameters and then passes that data back to the web application, you should ensure that the user data is harmless before putting it back in the browser context.

This means you should validate user data, including user data coming via URLs on the server-side, concerning both syntax and semantics on user data, and use whitelists because blacklisting is more prone to errors and omissions.

Additionally, if your application allows users to submit HTML code, you should implement HTML sanitization. One library commonly used for this purpose in Java is the OWASP Java HTML Sanitizer.

Finding insecure URLs with Snyk Security Extension for IntelliJ IDEA

URL encoding and decoding should be part of a wider set of security practices, such as encoding and escaping data and input validation. Writing secure code is vital, and revealing security flaws as early as possible in the development cycle minimizes both the cost of fixing them and the impact of security incidents.

As a Java developer who works in IntelliJ IDEA, you may benefit from installing the Snyk Security extension.

Snyk Security finds issues, highlights them in the code editor, and helps you fix known security vulnerabilities in the following:

  • Your own code

  • Direct and transitive open source dependencies that you're pulling into your project

  • Your Docker images

  • Your infrastructure-as-code templates

The Snyk Security extension helps you identify issues like unsanitized input from URL parameters, potentially leading to cross-site scripting (XSS), command injection, server-side request forgery (SSRF), or open redirect vulnerabilities. For each detected vulnerability, it shows how various open source Java projects fixed similar issues in the past:

blog-java-encoding-xss-vuln

Here's the full list of security inspections that Snyk Security runs on your Java code.

Conclusion

Failing to encode URL parameters may seem like a minor omission, but as you've seen from this article, it may have a serious impact on both the reliability and security of your Java applications. Percent-encoding URL parameters help deliver expected results to legitimate users and block malicious actors from bypassing access controls and performing attacks.

While URL encoding and decoding are just parts of a wider range of best security practices, sticking with stable APIs and smart developer tools like the Snyk Security extension for IntelliJ IDEA helps you get it right, consistently ship secure code, and avoid spending painful hours resolving and following up on production security incidents.

wordpress-sync/security-champions-guide

Quer experimentar?

See the process for assessing, selecting, and implementing a modern SAST solution based on a four phase process and find the best fit for your specific security needs.