How to prevent Trojan Source attacks with Snyk Code
Frank Fischer
November 17, 2021
0 mins readEarlier this month, a group of researchers at the University of Cambridge published an academic paper, with an accompanying website, on a new type of potential vulnerability that could appear in source code. They called it Trojan Source.
The basic idea of the vulnerability is the use of unicode characters within code, while adding nice options for variable names or comments (e.g., using emojis to express your feelings in comments), unfortunately allows for potentially malicious semantics to be hidden in plain (human) sight within code. For example, an attacker might make code appear as if it contains valid authentication logic, while in reality (or for the compiler / interpreter) it performs a completely different action. Or what seems to be active code is in reality only comments that are therefore ignored at runtime. And it does all this by using unicode characters to fool the human eye.
If you are interested in the intricacies of this vulnerability, we recommend checking out the original website or the paper. And if you want to see the vulnerability in action in JavaScript, my colleague Liran Tal wrote a blog article on how to effectively detect and mitigate Trojan Source attacks in JavaScript codebases with ESLint.
In this post, we’re going to use Snyk Code to find and fix Trojan Source in source code. Let’s go!
“Can Snyk Code fix Trojan Source?”
This was the question that we received the same day the website was published (a nod to how well-informed and connected Snyk customers tend to be). Let me paraphrase the internal discussion we had: After asking the team, the first reaction was that (1) it seemed like a feasible supply chain attack, but less so for your original code as it needs source code access and (2) this attack works on a syntactical level.
Note: As you can see in the article from Liran, Trojan Source is an attack on the syntactic level, hence ESLint helps JavaScript developers. Unfortunately other ecosystems such as Java, Ruby, C#, and more may not have the benefit of new tooling to mitigate this threat vector, hence making Snyk Code an even more important tool to add to your toolbox.
Given the reaction we saw in the community and since our customers requested it, we agreed to add rules to address Trojan Source issues. As this is not a high severity issue (only high visibility due to the recent press), we did not expedite a fix, but instead added it as a task to the current sprint. But we did not want to stop there. While we were adding this rule, there is a class of related confusable UTF vulnerabilities for which we also added coverage. And finally, we made sure the rule set covered all supported languages in Snyk Code.
It’s important to note that it only took 10 days to develop and release the Trojan Source (and related) rules for all supported languages within Snyk Code — and that was without using an expedited process, just the current sprint. It is a perfect example of the agility of Snyk Code: decision, coding, testing on hundreds of thousands of projects in all supported languages, optimizing, and releasing in 10 days. It is also a testimony to the power of the Snyk Code ML-enhanced engine that within a few development-hours we not only covered Trojan Source, but even handled some related issues.
Snyk Code finds Trojan Source and other dangerous variants
Trojan Source would likely present itself as an open source supply chain attack, but it could also be executed via copy-pasting directly in the application code (as seen in Liran’s blog). Therefore, Snyk Code includes scans for Trojan Source type of issues in your codebase.
Note: Snyk is also a powerful tool for securing the open source supply chain, allowing you to easily monitor and fix issues around security of dependencies and license compliance.
As mentioned above, it took the team a few days to develop the detection for Trojan Source, but Snyk Code can detect more variants of the attack than what most other tools offer. For example, Snyk Code does not only detect bidirectional control characters, but can also detect confusable UTF characters used in method and variable names that can potentially hide the true semantics of the code. Which means, today, Snyk Code detects a larger set of problems described by Trojan Source than any other checker we know of.
Note: Snyk Code supports Java, JavaScript, TypeScript, PHP, Python, and as public beta C#, Ruby and Go. Trojan Source could attack other languages (e.g. Bash scripts), but Snyk Code can’t scan in unsupported languages.
Snyk Code is the Achilles’ heel of Trojan Source
As you can see above, Snyk Code provides a strong engine capable of a wide variety of scans. In general, it learns from the knowledge of the global developer community using an unique human guided algorithm. On top we can react quickly to market changes and evolving issues such as Trojan Source. As a result, Snyk Code finds not only Trojan Source but also related vulnerabilities plus the existing knowledge base providing an industry leading accuracy. The issues are explained by using the original source code and providing additional help up to examples of how open source projects solved a similar issue to help developers understand and fix.
All this, with an unparalleled speed and from the comfort of your favorite IDE. Oh, and you can use it for free, which seems like more than enough reason to give it a try!
Secure your code with cutting edge intel
Learn about the full range of Snyk Code SAST functionality in only 30 minutes.