Defending Against Glassworm: The Invisible Malware That's Rewriting Supply Chain Security
In October 2025, security researchers at Koi Security discovered Glassworm, the first self-propagating worm targeting VS Code extensions that employs invisible Unicode characters to conceal malicious code in plain sight. With 35,800+ installations compromised and active C2 infrastructure still operational, Glassworm represents another step in the evolution of supply chain attacks that breaks our traditional code review security model.
This article examines the Glassworm attack, tracing its lineage to the original Trojan Source vulnerability (CVE-2021-42574), and demonstrates how developers and security teams can detect and prevent these invisible character attacks using anti-trojan-source, an open source command-line detection tool that now includes category-based Unicode analysis and enhanced reporting capabilities.
Key takeaways:
Invisible Unicode characters can hide executable malicious code that's literally invisible to code reviewers
Traditional code review, some diff views, and syntax highlighting completely fail to detect these attacks
The attack is self-propagating through stolen credentials, turning each victim into a new infection vector
Detection requires specialized tools that analyze Unicode characters by category, not just explicit lists
The evolution of invisible character attacks
The aptly named academic paper Trojan Source is where it all began, although if being factually correct on the history of security incidents in computing, Unicode characters and glyphs were used in previous decades as a way to hide and confuse systems and users, and this technique, at its core, is not novel.
In November 2021, researchers Nicholas Boucher and Ross Anderson from the University of Cambridge published their seminal paper Trojan Source: Invisible Vulnerabilities, revealing a critical vulnerability (CVE-2021-42574) that affects virtually every modern programming language.
The core problem was inherent to Unicode bidirectional (bidi) text control characters that can visually reorder source code in a way that deceives human reviewers while preserving the logical execution order that compilers and interpreters follow.
Consider this C code:
bool isAdmin = false;
/*
begin admins only
*/ if (isAdmin) {
printf("You are an admin.\n");
/*
end admins only
*/ }What you see above appears to be code protected by an access check. However, hidden Unicode control characters can cause the comment to appear as if it were code, thereby altering the logic flow completely. The compiler sees one thing; the human reviewer sees another.
Attack techniques:
Early Returns: Make a `return` statement appear to be within a comment
Commenting-Out: Make comments visually appear as executable code
Stretched Strings: Make string literals visually appear as code
Homoglyphs: Use visually identical characters from different scripts (CVE-2021-42694)
The supply chain amplification effect
The Trojan Source researchers identified the critical multiplier effect in modern software development:
"If an adversary successfully commits targeted vulnerabilities into open source code by deceiving human reviewers, downstream software will likely inherit the vulnerability."
This is exactly what we're seeing play out in real-world attacks today.
Glassworm: The next evolution (2025)
Four years after Trojan Source was disclosed, attackers have weaponized these techniques in the wild. In October 2025, Glassworm emerged as the first self-propagating worm using invisible Unicode characters to compromise VS Code extensions on the OpenVSX marketplace.
What makes Glassworm different:
Invisible code injection: Uses Unicode variation selectors that produce zero visual output. Not obfuscated, not minified, but literally invisible to the human eye
Self-propagating: Steals NPM, GitHub, and OpenVSX credentials to automatically compromise additional packages and extensions, creating exponential growth
Unkillable infrastructure:
Solana blockchain as primary C2 (immutable, can't be taken down)
Google Calendar as backup C2 (legitimate service, bypasses security controls)
Direct IP connections with dynamic encryption
Complete RAT capabilities: Turns infected developer workstations into criminal infrastructure with SOCKS proxies, hidden VNC servers, and P2P command channels
Understanding the Glassworm attack
Stage 1: The invisible payload
When security researchers examined the compromised CodeJoy VS Code extension (version 1.8.3), they found what appeared to be empty lines between line 2 and line 7:
import * as vscode from 'vscode';
// [MASSIVE GAP HERE - APPEARS EMPTY]
export function activate(context: vscode.ExtensionContext) {But that gap wasn't empty. It contained executable JavaScript code encoded in unprintable Unicode characters — specifically, Unicode variation selectors that don't produce any visual rendering.
To a developer doing code review: blank lines or whitespace.
To static analysis tools: nothing to analyze.
To the JavaScript interpreter: executable malicious code.
Stage 2: Blockchain-Based C2
The invisible code connects to the Solana blockchain and reads transaction memos from a hardcoded wallet address. Inside the memo? A JSON object with a base64-encoded link to download the next stage:
{"link":"aHR0cDovLzIxNy42OS4zLjIxOC9xUUQlMkZKb2kzV0NXU2s4Z2dHSGlTdg=="}Decoded: http://217.69.3.218/qQD%2FJoi3WCWSk8ggGHiSv%3D%3D
Note, Blockchain transactions are immutable and can't be deleted or modified. Crypto wallets are pseudonymous; there’s no hosting provider or registrar involved, and attackers can post new transactions with new payload URLs for minimal costs.
Stage 3: Credential harvesting
The downloaded payload (encrypted with AES-256-CBC, keys passed in HTTP response headers) targets the following sensitive credentials:
NPM authentication tokens — to publish malicious packages
GitHub tokens — to compromise repositories
OpenVSX credentials — to inject more extensions
Git credentials — to push malicious code
49 different cryptocurrency wallet extensions — MetaMask, Phantom, Coinbase, etc.
But there's more: a backup C2 using Google Calendar. The malware fetches a calendar event with a base64-encoded URL in the title:
aHR0cDovLzIxNy42OS4zLjIxOC9nZXRfem9tYmlfcGF5bG9hZC9xUUQlMkZKb2kzV0NXU2s4Z2dHSGlUdg==Notice the path: /get_zombi_payload/ — yes, "zombi" as in zombie botnet.
The next and final stage referred to ZOMBI — The Full RAT (Remote Access Trojan). The final payload transforms infected developer workstations into nodes in a criminal infrastructure network, relying on SOCKS proxy to route traffic inside corporate networks, and other networking mechanisms like WebRTC, BitTorrent, and hidden VNC.
The attacker can use your browser with your logged-in sessions, access your email and Slack, read your source code, steal additional credentials, and pivot to other systems on your network — all while you see nothing happening on your screen.
Detecting invisible characters with anti-trojan-source
The anti-trojan-source project is an open source security tool specifically designed to detect Unicode-based attacks, including Glassworm-style invisible character injection. Originally created to detect Trojan Source attacks, it has evolved to provide comprehensive protection through both explicit character detection and category-based Unicode analysis.
What anti-trojan-source detects
277 Explicit confusable characters:
Bidirectional Unicode controls (U+202A-U+202E, U+2066-U+2069)
Zero-width characters (U+200B, U+200C, U+200D)
Variation selectors (U+FE00-U+FE0F) — the base 16 selectors
Extended Variation Selectors (U+E0100-U+E01EF) — 240 additional characters used in Glassworm
No-break space (U+00A0)
Word joiner, soft hyphen, and other invisible characters
Category-based detection (Future-proof):
All Unicode Format characters (Cf category) — catches invisible formatting characters by category
All Unicode Control characters (Cc category) — except commonly-used whitespace (TAB, LF, CR)
This category-based approach means anti-trojan-source can detect new invisible characters added to Unicode in the future, without requiring explicit updates to the character list.
Detection capabilities
1. Simple boolean detection
import { hasConfusables } from 'anti-trojan-source'
const code = readFileSync('suspicious-file.js', 'utf-8')
const isDangerous = hasConfusables({ sourceText: code })
if (isDangerous) {
console.error('⚠️ Invisible characters detected!')
}2. Detailed findings
import { hasConfusables } from 'anti-trojan-source'
const findings = hasConfusables({
sourceText: code,
detailed: true
})
findings.forEach(finding => {
console.log(`Line ${finding.line}:${finding.column}`)
console.log(` Character: ${finding.codePoint} ${finding.name}`)
console.log(` Category: ${finding.category}`)
console.log(` Context: ${finding.snippet}`)
})Example output:
Line 3:45
Character: U+E0100 VARIATION SELECTOR-17
Category: Variation Selector
Context: const value = getUserInput()
Line 12:8
Character: U+200B ZERO WIDTH SPACE
Category: Cf (Format)
Context: if (isAdmin) {3. CLI usage example with multiple output modes
# Simple mode (exit code 1 if found)
npx anti-trojan-source --files='src/**/*.js'
# Verbose mode with detailed information
npx anti-trojan-source --files='src/**/*.js' --verbose
# JSON mode for programmatic processing
npx anti-trojan-source --files='src/**/*.js' --jsonAnti-trojan-source implementation guide
Install the command-line tool to your project:
1npm install -D anti-trojan-sourceOr use directly with npx (no installation required):
npx anti-trojan-source --files='**/*.js'Then, scan a single file:
npx anti-trojan-source src/index.jsOr scan multiple files with shell-notation globbing:
npx anti-trojan-source --files='src/**/*.{js,ts,jsx,tsx}'The anti-trojan-source CLI supports verbosemode for detailed output. Useful when you need to understand exactly what was detected and where:
npx anti-trojan-source --files='src/**/*.js' --verboseExample output:
[x] Detected cases of trojan source in the following files:
|
- src/utils.js
Line 12:34 - U+200B ZERO WIDTH SPACE [Cf (Format)]
Snippet: const value = getUserInput()
Line 45:10 - U+202E RIGHT-TO-LEFT OVERRIDE [Cf (Format)]
Snippet: if (isAdmin) { // Check permissions
Line 78:22 - U+E0100 VARIATION SELECTOR-17 [Variation Selector]
Snippet: const token = process.env.API_KEYA JSON output mode is also supported, allowing for programmatic analysis. Useful for automation, CI/CD integration, and custom reporting:
npx anti-trojan-source --files='src/**/*.js' --jsonExample output:
[
{
"file": "src/utils.js",
"findings": [
{
"line": 12,
"column": 34,
"codePoint": "U+200B",
"name": "ZERO WIDTH SPACE",
"category": "Cf (Format)",
"snippet": "const value = getUserInput()"
},
{
"line": 45,
"column": 10,
"codePoint": "U+202E",
"name": "RIGHT-TO-LEFT OVERRIDE",
"category": "Cf (Format)",
"snippet": "if (isAdmin) { // Check permissions"
}
]
}
]CI/CD integration
The following is an example of how to create a GitHub Action that you can embed as part of your CI process:
Create the file .github/workflows/security-scan.yml:
name: Unicode Security Scan
on:
pull_request:
push:
branches: [main, master, develop]
jobs:
scan-invisible-chars:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Scan for invisible Unicode attacks
run: npx anti-trojan-source --files='**/*.{js,ts,jsx,tsx,py,java,go,rs}' --json
- name: Fail on detection
if: failure()
run: |
echo "::error::Invisible Unicode characters detected in source code"
echo "::error::This could indicate a Trojan Source or Glassworm-style attack"
exit 1Advanced category-based detection
Understanding nicode categories
anti-trojan-source doesn't just look for specific characters but also analyzes characters by their Unicode category, making it future-proof against new attacks.
Format category (Cf):
Characters that affect formatting but don't produce visual output
Includes bidirectional controls, invisible separators, and zero-width characters
Examples: U+200B (ZERO WIDTH SPACE), U+202E (RIGHT-TO-LEFT OVERRIDE)
Control category (Cc):
Non-printable control characters
Ranges: U+0000-U+001F and U+007F-U+009F
Exception: TAB (U+0009), LF (U+000A), and CR (U+000D) are on an allow-list as legitimate
Why Category-Based Detection Matters?
A traditional approach would be similar to the following JavaScript code:
const dangerousChars = ['\u200B', '\u200C', '\u200D', /* ... explicit list */];
// Problem: New Unicode versions add new characters
// Problem: Attackers can use unlisted charactersWhereas a Category-based approach would benefit from:
Detecting ALL Format (Cf) characters, even ones not explicitly listed
Detecting ALL Control (Cc) characters except whitelisted whitespace
Going forward
Do not solely rely on visual code review, even though some IDEs like VS Code and GitHub online diff viewer have added some support for indicating Unicode in the text. Do not trust “it looks fine” and don’t assume your tooling automatically shows invisible characters; instead, you should proactively scan for them.
We recommend you scan all code before merging pull requests. Integrate with Snyk (it’s free), which automatically runs static code analysis and continuously monitors your project from GitHub-sourced repositories and other integrations.
Utilize the anti-trojan-source command-line utility to specifically detect potentially dangerous characters in your source code.
Ready to future-proof your development pipeline against the next evolution of supply chain threats? Download the white paper: Navigating the modern software supply chain: Securing open source, AI-generated code, & SBOM compliance.
Resources
Tools:
The anti-trojan-source CLI: https://github.com/lirantal/anti-trojan-source
ESLint Plugin: https://github.com/lirantal/eslint-plugin-anti-trojan-source
Research:
Trojan Source Paper: https://trojansource.codes/trojan-source.pdf
Glassworm Analysis: https://www.koi.ai/blog/glassworm-first-self-propagating-worm-using-invisible-code-hits-openvsx-marketplace
CVE-2021-42574: Trojan Source vulnerability disclosure
CVE-2021-42694: Homoglyph attacks
Standards:
Unicode Standard: https://unicode.org/standard/standard.html
Unicode Categories: https://www.unicode.org/reports/tr44/#General_Category_Values
Secure your supply chain with Snyk
87% of our respondents were impacted by supply chain security issues. Keep yours secure with Snyk.