How to effectively detect and mitigate Trojan Source attacks in JavaScript codebases with ESLint
2021年11月10日
0 分で読めますOn November 1st, 2021, a public disclosure of a paper titled Trojan Source: Invisible Vulnerabilities described how malicious actors may employ unicode-based bidirectional control characters to slip malicious source code into an otherwise benign codebase. This attack relies on reviewers confusing the obfuscated malicious source code with comments.
What is a Trojan Source attack?
Traditional code editors and code review practices miss detecting bidirectional characters present in source code. This allows actors to inject malicious code that looks benign. This vulnerability was made public on November 1st, 2021 and assigned CVE-2021-42574.
The following is a snippet from VS Code of a Trojan Source attack as is employed in a JavaScript source code:
1// running internal logic for privileged users:
2var accessLevel = "user";
3if (accessLevel != "user // Check if admin ") {
4 console.log("You are an admin.");
5}
What about now with this screenshot:
Did you catch the issue with the above source code? If not, try examining that code snippet a little closer.
Here's what’s happening — this is a case of a Stretched String type of attack. The code in line 3 makes it looks like the conditional expression checks whether the accessLevel
variable is equal to the value of user
. There’s some comment at the end of the line about logic checks, and it may look harmless but the truth is quite different.
In fact, the use of unicode bidirectional characters on line 3 hides the actual string value of the accessLevel
variable check. Here is the real line 3 as the compiler would run it:
1If (accessLevel != "user // Check if admin") {
The paper describes several types of abusing bidirectional control characters to inject malicious code into source: Commenting-Out, Stretched String, Invisible Functions, and Homoglyph Function. The researchers have provided JavaScript examples of all of these attacks being employed via this trojan-source
repository on GitHub.
Although the use of bidirectional control characters is a novel approach, this sort of attack isn’t actually new and has been cited in prior mailing lists and discussion boards. For example, some references are this Golang issue from back in 2017 about disallowing RTL/LTR characters, or even this Bugzilla entry from 2011, titled [BiDi] Misleading display of bidirectional strings when RLO, LRO or PDF is used (note the use of Google Cache to access it).
How do you fix Trojan Source attacks?
The authors of the academic paper suggest that the issue lies with code editors and IDE software that should be fixed to make such unicode characters visually visible, as well as compilers that should warn users against it.
Detecting Trojan Source attacks in source code
Your code editing and code review processes may be on platforms or tools that do not support highlighting of these dangerous bidirectional unicode characters. This means you may already have those bidirectional characters in your codebase.
So how do you find out if you have source code with bidirectional unicode characters? To help with that, I created an npm package called anti-trojan-source
that scans a directory, or reads input from standard input (STDIN
) and scans it for any such unicode characters that may be present in the text.
You can use npx to scan files as follows:
1npx anti-trojan-source --files='src/**/*.js'
Or if you’d like to use it as a library in a JavaScript project:
1import { hasTrojanSource } from 'anti-trojan-source'
2const isDangerous = hasTrojanSource({
3 sourceText: 'if (accessLevel != "user) {' // Check if admin
4})
Preventing Trojan Source attacks in JavaScript with ESLint
Editor's note: Since this post's original publication, Trojan Source rules have been added to Snyk Code. Learn more by reading our How to prevent Trojan Source attacks with Snyk Code blog.
But even better than just finding existing issues is to proactively safeguard your codebase to ensure that no Trojan Source attacks make their way into your source code at all. In the JavaScript community, we often rely on ESLint and its various plugins to enable control code quality and code style standards.
And so, with the use of eslint-plugin-anti-trojan-source
, now you can also include an ESLint plugin to make sure that none of your developers or continuous integration and build systems are mistakenly merging code that is potentially malicious due to bidirectional unicode characters.
Here is an example ESLint configuration for a JavaScript project:
1"eslintConfig": {
2 "plugins": [
3 "anti-trojan-source"
4 ],
5 "rules": {
6 "anti-trojan-source/no-bidi": "error"
7 }
8}
And an example output for a vulnerable snippet of code that slipped into the codebase:
1$ npm run lint
2
3/Users/lirantal/projects/repos/@gigsboat/cli/index.js
4 1:1 error Detected potential trojan source attack with unicode bidi introduced in this comment: ' begin admins only ' anti-trojan-source/no-bidi if (isAdmin) {
5 1:1 error Detected potential trojan source attack with unicode bidi introduced in this comment: ' end admin only anti-trojan-source/no-bidi }
6
7/Users/lirantal/projects/repos/@gigsboat/cli/lib/helper.js
8 2:1 error Detected potential trojan source attack with unicode bidi introduced in this code: '"user" // Check if admin
How is the ecosystem mitigating Trojan Source attacks?
IDEs such as VS Code have released versions to highlight these unicode characters so programmers would take note of them and act with proper context when reviewing and editing code. Similarly, GitHub published warnings so that visualized code bases will now highlight the use of these potentially dangerous trojan on GitHubs if they use bidirectional characters:
However, note that not all types of trojan malware attacks are being highlighted by GitHub. For example, consider the following case that the paper represents and dubs Invisible Functions:
As you can see in the JavaScript code snippet above, there aren’t any warnings from GitHub when reviewing this code. What’s actually going on there?
The function declaration on line 7 is actually written with the use of a zero-width space unicode control character identified as U200B
, which makes it look visually as if this is a case of a legitimate function isAdmin()
function.
We can verify this if we print out the code using a tool like bat, which is a clone of the UNIX cat tool, with better syntax highlighting and Git integration:
Should compilers and runtimes mitigate Trojan Source attacks?
What about compilers and language runtimes? Most languages, including Node.js, have decided against updating their compiler from denying unicode characters. Effectively transitioning the risk to code editors and humans who need to be more careful when reading code and performing code review processes.
That said, some language runtimes like Zig have positively considered to employ a compiler error when detecting the use of unicode bidirectional characters in source code, and allow to bypass the errors with an explicit comment.
Resources on Trojan Source attacks
I hope you found this post useful for understanding these Trojan Source attacks and how they can appear in them JavaScript ecosystem. To learn more about these attacks, I recommend checking out the following resources:
How to prevent Trojan Source attacks with Snyk Code blog post
The official Trojan Source website: https://www.trojansource.codes
The official Trojan Source repository with code examples and proof-of-concepts: https://github.com/nickboucher/trojan-source
The official Trojan Source announcement blog article: https://www.lightbluetouchpaper.org/2021/11/01/trojan-source-invisible-vulnerabilities