We use cookies to ensure you get the best experience on our website.Read moreRead moreGot it

close
  • Products
    • Products
      • Snyk Open Source
        Avoid vulnerable dependencies
      • Snyk Code
        Secure your code as it’s written
      • Snyk Container
        Keep your base images secure
      • Snyk Infrastructure as Code
        Fix misconfigurations in the cloud
    • Platform
      • What is Snyk?
        See Snyk’s developer-first security platform in action
      • Developer Security Platform
        Secure all the components of the modern cloud native application in a single platform
      • Security Intelligence
        Access our comprehensive vulnerability data to help your own security systems
      • License Compliance Management
        Manage open source license usage in your projects
    • Self-paced security education with Snyk Learn
  • Resources
    • Using Snyk
      • Documentation
      • Vulnerability intelligence
      • Product training
      • Customer success
      • Support portal & FAQ’s
    • learn & connect
      • Blog
      • Community
      • Events & webinars
      • DevSecOps hub
      • Developer & security resources
    • Self-paced security education with Snyk Learn
  • Company
    • About Snyk
    • Customers
    • Partners
    • Newsroom
    • Snyk Impact
    • Contact us
    • Jobs at Snyk We are hiring
  • Pricing
Log inBook a demoSign up
All articles
  • Application Security
  • Cloud Native Security
  • DevSecOps
  • Engineering
  • Partners
  • Snyk Team
  • Show more
    • Vulnerabilities
    • Product
    • Ecosystems
Application SecurityEcosystems

Preventing YAML parsing vulnerabilities with snakeyaml in Java

Brian VermeerMarch 30, 2021

What is YAML?

YAML is a human-readable language to serialize data that’s commonly used for config files. The word YAML is an acronym for “YAML ain’t a markup language” and was first released in 2001. You can compare YAML to JSON or XML as all of them are text-based structured formats.

How are YAML, JSON, and XML different?

While similar to those languages, YAML is designed to be more readable than JSON and less verbose than XML. For example, all three languages have different syntax to handle structure and nesting, but YAML uses indentation with whitespaces for this.

YAML files are often used to configure applications, application servers, or clusters. It is a very common format in Spring Boot applications and, of course, to configure Kubernetes. However, similarly to JSON and XML, you can use YAML to serialize and deserialize data.

Although YAML looks like an excellent alternative for XML and JSON, many people aren’t a big fan of the structure. Since the language is line-based and uses indentation to represent structure and nesting, indentation often causes problems when parsing complex data structures. A single missing (or extra) whitespace in a complex, data-heavy structure will cause failures when parsing YAML. This causes unexpected problems, and finding the problem in a YAML file is difficult.

Most importantly to note, manually importing YAML in your Java application with an outdated version of snakeyaml might get you into trouble.

TL;DR

The outdated version of snakeyaml contains a Denial of Service vulnerability.
We highly recommend that you update snakeyaml to version 1.26 or higher to prevent this problem.

Parsing YAML files in Java with snakeyaml.

To parse YAML files in your Java application, you can use the well-known library snakeyaml. This is a lightweight straightforward library that you can use to convert YAML to objects and the other way around.

Let’s focus on reading YAML into our Java program. You can basically do this in two different ways. The first way is the generic way of reading YAML input with snakeyaml. In the snippet below I will read a YAML file from my resources folder that is on my classpath.

InputStream is = getClass().getClassLoader().getResourceAsStream(filename);
Yaml yaml = new Yaml();
var lhm = (LinkedHashMap) yaml.load(is);

By loading the YAML like this, the result will be a LinkedHashMap<Object> representing the YAML file in a structured way. This means it can contain anything from any type, because it is a generic structure not bound to a specific type.

The second way of reading YAML is more specific. You can parse your YAML input to a particular object. Snakeyaml will try to bind the YAML variables to the object’s field by naming convention. This will end in an exception if the YAML file doesn’t fit the object structure or the deserialized target object. In the snippet below I will parse my YAML input to a type Person.

InputStream is = getClass().getClassLoader().getResourceAsStream("person.yaml");
Yaml yaml = new Yaml(new Constructor(Person.class));
Person person = yaml.load(is);

Person.java

public class Person {
   private String firstname;
   private String lastname;
//getters and setters
}

Person.yaml

firstname: "Matt"
lastname: "Murdock"

Both ways of parsing YAML to an object work perfectly fine. If you are absolutely sure about what the input should be you can convert your YAML input to a specific object. If this is not the case you might prefer the more generic way and search the list manually.

Billion laughs attack in YAML

One feature of YAML is that you can create anchors. You can reuse these anchors in different places so you do not have to repeat yourself. In the simplified example below, I create two variables: var1 and var2. By using anchors, var2 has the same value as var1.

var1: &anchor value
var2: *anchor

Let’s take this to the extreme and create the famous billion laughs attack for YAML. By applying this concept in a nested way, I can actually make a billion laughs.

lol1: &lol1 ["lol","lol","lol","lol","lol","lol","lol","lol","lol"]
lol2: &lol2 [*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1]
lol3: &lol3 [*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2]
lol4: &lol4 [*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3]
lol5: &lol5 [*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4]
lol6: &lol6 [*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5]
lol7: &lol7 [*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6]
lol8: &lol8 [*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7]
lol9: &lol9 [*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8]
lolz: &lolz [*lol9]

As you can see, lol1 is a list of 10 strings "lol". The variable lol2 is a list of 10 times lol1. By repeating this principle several times, we end up with lolz = 10^9 times "lol". Better said, a billion laughs.

With anchors, you can create a YAML bomb! The tremendous amount of (nested) objects that such a YAML input creates will cause a memory overload.

When looking at snakeyaml and specifically versions below 1.26, this can be a problem. If you parse YAML in the generic way like described in the first example, this YAML bomb will end up in a java.lang.OutOfMemoryError on the Java heap space. This typically means your application crashes and is not available anymore, so a Denial of Service.

However, if you parse your YAML to a specific object like in the second example, this might seem less of an issue. The snakeyaml library tries to match the variable name to a field in your object. Because this is impossible, you will get a YAML exception. Although it might seem like a good solution, it is not foolproof. 

Say we have a type person with a firstname and lastname like before. But besides that, it can also contain children represented by a collection of type person.

public class Person {
   private String firstname;
   private String lastname;
   private List<Person> children;
   //getters and setters
}

Now I have the same problem as the original billion laughs attack. I can create a similar YAML file using anchors that goes through several layers. The YAML file could look like this. Note that you are not obligated to fill in all the fields of a type.

firstname: X
lastname: XX
children:
   - children: &a [{firstname: a},{firstname: a ..]
   - children: &b [{children: *a},{children: *a} ..]
   - children: &c [{children: *b},{children: *b} ..]
   - children: &d [{children: *c},{children: *c} ..]
   - children: &e [{children: *d}, {children: *d} ..]
...

I created multiple children of the root parent, and all the children’s children are pointing to a previous anchor creating a snowball effect. So regardless of how I parse the YAML file generically or specifically parse to the Person object, I will end up with a heap overload.

Fixing a billion laughs YAML attacks in Java

The solution to this problem is way easier than you think and much less painful than finding the missing whitespace in a YAML file. You only have to update your snakeyaml version to 1.26 or higher. The folks at snakeyaml did a great job by fixing this issue by limiting the number of aliases for non-scalar nodes to a maximum of 50. When parsing a YAML bomb like described earlier with the newer version of snakeyaml you will just get an exception containing this message. This also means your heap will not overflow, and your application keeps running.

This once again shows how important it is to keep track of the libraries you depend on. Updating to the newer version in this case solves the problem. As you might know, Snyk can help you with this if you connect your code repository. Next to that, keep scanning the libraries you depend on with Snyk Open Source so you will not be surprised by such a vulnerability.

Use Snyk for free

Keep track of vulnerabilities in the libraries you depend on with Snyk Open Source.

Sign up for free

Log4Shell resource center

We’ve created an extensive library of Log4Shell resources to help you understand, find and fix this Log4j vulnerability.

Browse Resources
Footer Wave Top
Patch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo Segment
Develop Fast.
Stay Secure.
Snyk|Open Source Security Platform
Sign up for freeBook a demo

Product

  • Developers & DevOps
  • Vulnerability database
  • Pricing
  • Test with GitHub
  • API status
  • IDE plugins
  • What is Snyk?

Resources

  • Snyk Learn
  • Blog
  • Security fundamentals
  • Resources for security leaders
  • Documentation
  • Snyk API
  • Disclosed vulnerabilities
  • Open Source Advisor
  • FAQs
  • Website scanner
  • Japanese site
  • Audit services
  • Web stories

Company

  • About
  • Snyk Impact
  • Customers
  • Jobs at Snyk
  • Snyk for government
  • Legal terms
  • Privacy
  • Press kit
  • Events
  • Security and trust
  • Do not sell my personal information

Connect

  • Book a demo
  • Contact us
  • Support
  • Report a new vuln

Security

  • JavaScript Security
  • Container Security
  • Kubernetes Security
  • Application Security
  • Open Source Security
  • Cloud Security
  • Secure SDLC
  • Cloud Native Security
  • Secure coding
  • Python Code Examples
  • JavaScript Code Examples
Snyk|Open Source Security Platform

Snyk is a developer security platform. Integrating directly into development tools, workflows, and automation pipelines, Snyk makes it easy for teams to find, prioritize, and fix security vulnerabilities in code, dependencies, containers, and infrastructure as code. Supported by industry-leading application and security intelligence, Snyk puts security expertise in any developer's toolkit.

Resources

  • Snyk Learn
  • Blog
  • Security fundamentals
  • Resources for security leaders
  • Documentation
  • Snyk API
  • Disclosed vulnerabilities
  • Open Source Advisor
  • FAQs
  • Website scanner
  • Japanese site
  • Audit services
  • Web stories

Track our development

© 2022 Snyk Limited
Registered in England and Wales
Company number: 09677925
Registered address: Highlands House, Basingstoke Road, Spencers Wood, Reading, Berkshire, RG7 1NT.
Footer Wave Bottom