Defending Against Glassworm: The Invisible Malware That's Rewriting Supply Chain Security

Written by

0 mins read

In October 2025, security researchers at Koi Security discovered Glassworm, the first self-propagating worm targeting VS Code extensions that employs invisible Unicode characters to conceal malicious code in plain sight. With 35,800+ installations compromised and active C2 infrastructure still operational, Glassworm represents another step in the evolution of supply chain attacks that breaks our traditional code review security model.

This article examines the Glassworm attack, tracing its lineage to the original Trojan Source vulnerability (CVE-2021-42574), and demonstrates how developers and security teams can detect and prevent these invisible character attacks using anti-trojan-source, an open source command-line detection tool that now includes category-based Unicode analysis and enhanced reporting capabilities.

Key takeaways:

Invisible Unicode characters can hide executable malicious code that's literally invisible to code reviewers
Traditional code review, some diff views, and syntax highlighting completely fail to detect these attacks
The attack is self-propagating through stolen credentials, turning each victim into a new infection vector
Detection requires specialized tools that analyze Unicode characters by category, not just explicit lists

The evolution of invisible character attacks

The aptly named academic paper Trojan Source is where it all began, although if being factually correct on the history of security incidents in computing, Unicode characters and glyphs were used in previous decades as a way to hide and confuse systems and users, and this technique, at its core, is not novel.

In November 2021, researchers Nicholas Boucher and Ross Anderson from the University of Cambridge published their seminal paper Trojan Source: Invisible Vulnerabilities, revealing a critical vulnerability (CVE-2021-42574) that affects virtually every modern programming language.

The core problem was inherent to Unicode bidirectional (bidi) text control characters that can visually reorder source code in a way that deceives human reviewers while preserving the logical execution order that compilers and interpreters follow.

Consider this C code:

bool isAdmin = false;
/*
 begin admins only 
*/ if (isAdmin) {
    printf("You are an admin.\n");
/*
 end admins only 
*/ }

What you see above appears to be code protected by an access check. However, hidden Unicode control characters can cause the comment to appear as if it were code, thereby altering the logic flow completely. The compiler sees one thing; the human reviewer sees another.

Attack techniques:

Early Returns: Make a `return` statement appear to be within a comment
Commenting-Out: Make comments visually appear as executable code
Stretched Strings: Make string literals visually appear as code
Homoglyphs: Use visually identical characters from different scripts (CVE-2021-42694)

The supply chain amplification effect

The Trojan Source researchers identified the critical multiplier effect in modern software development:

"If an adversary successfully commits targeted vulnerabilities into open source code by deceiving human reviewers, downstream software will likely inherit the vulnerability."

This is exactly what we're seeing play out in real-world attacks today.

Glassworm: The next evolution (2025)

Four years after Trojan Source was disclosed, attackers have weaponized these techniques in the wild. In October 2025, Glassworm emerged as the first self-propagating worm using invisible Unicode characters to compromise VS Code extensions on the OpenVSX marketplace.

What makes Glassworm different:

Invisible code injection: Uses Unicode variation selectors that produce zero visual output. Not obfuscated, not minified, but literally invisible to the human eye
Self-propagating: Steals NPM, GitHub, and OpenVSX credentials to automatically compromise additional packages and extensions, creating exponential growth
Unkillable infrastructure:
1. Solana blockchain as primary C2 (immutable, can't be taken down)
2. Google Calendar as backup C2 (legitimate service, bypasses security controls)
3. Direct IP connections with dynamic encryption
Complete RAT capabilities: Turns infected developer workstations into criminal infrastructure with SOCKS proxies, hidden VNC servers, and P2P command channels

Understanding the Glassworm attack

Stage 1: The invisible payload

When security researchers examined the compromised CodeJoy VS Code extension (version 1.8.3), they found what appeared to be empty lines between line 2 and line 7:

import * as vscode from 'vscode';

// [MASSIVE GAP HERE - APPEARS EMPTY]

export function activate(context: vscode.ExtensionContext) {

But that gap wasn't empty. It contained executable JavaScript code encoded in unprintable Unicode characters — specifically, Unicode variation selectors that don't produce any visual rendering.

To a developer doing code review: blank lines or whitespace.

To static analysis tools: nothing to analyze.

To the JavaScript interpreter: executable malicious code.

Stage 2: Blockchain-Based C2

The invisible code connects to the Solana blockchain and reads transaction memos from a hardcoded wallet address. Inside the memo? A JSON object with a base64-encoded link to download the next stage:

{"link":"aHR0cDovLzIxNy42OS4zLjIxOC9xUUQlMkZKb2kzV0NXU2s4Z2dHSGlTdg=="}

Decoded: http://217.69.3.218/qQD%2FJoi3WCWSk8ggGHiSv%3D%3D

Note, Blockchain transactions are immutable and can't be deleted or modified. Crypto wallets are pseudonymous; there’s no hosting provider or registrar involved, and attackers can post new transactions with new payload URLs for minimal costs.

Stage 3: Credential harvesting

The downloaded payload (encrypted with AES-256-CBC, keys passed in HTTP response headers) targets the following sensitive credentials:

NPM authentication tokens — to publish malicious packages
GitHub tokens — to compromise repositories
OpenVSX credentials — to inject more extensions
Git credentials — to push malicious code
49 different cryptocurrency wallet extensions — MetaMask, Phantom, Coinbase, etc.

But there's more: a backup C2 using Google Calendar. The malware fetches a calendar event with a base64-encoded URL in the title:

aHR0cDovLzIxNy42OS4zLjIxOC9nZXRfem9tYmlfcGF5bG9hZC9xUUQlMkZKb2kzV0NXU2s4Z2dHSGlUdg==

Notice the path: /get_zombi_payload/ — yes, "zombi" as in zombie botnet.

The next and final stage referred to ZOMBI — The Full RAT (Remote Access Trojan). The final payload transforms infected developer workstations into nodes in a criminal infrastructure network, relying on SOCKS proxy to route traffic inside corporate networks, and other networking mechanisms like WebRTC, BitTorrent, and hidden VNC.

The attacker can use your browser with your logged-in sessions, access your email and Slack, read your source code, steal additional credentials, and pivot to other systems on your network — all while you see nothing happening on your screen.

Detecting invisible characters with anti-trojan-source

The anti-trojan-source project is an open source security tool specifically designed to detect Unicode-based attacks, including Glassworm-style invisible character injection. Originally created to detect Trojan Source attacks, it has evolved to provide comprehensive protection through both explicit character detection and category-based Unicode analysis.

What anti-trojan-source detects

277 Explicit confusable characters:

Bidirectional Unicode controls (U+202A-U+202E, U+2066-U+2069)
Zero-width characters (U+200B, U+200C, U+200D)
Variation selectors (U+FE00-U+FE0F) — the base 16 selectors
Extended Variation Selectors (U+E0100-U+E01EF) — 240 additional characters used in Glassworm
No-break space (U+00A0)
Word joiner, soft hyphen, and other invisible characters

Category-based detection (Future-proof):

All Unicode Format characters (Cf category) — catches invisible formatting characters by category
All Unicode Control characters (Cc category) — except commonly-used whitespace (TAB, LF, CR)

This category-based approach means anti-trojan-source can detect new invisible characters added to Unicode in the future, without requiring explicit updates to the character list.

Detection capabilities

1. Simple boolean detection

import { hasConfusables } from 'anti-trojan-source'

const code = readFileSync('suspicious-file.js', 'utf-8')
const isDangerous = hasConfusables({ sourceText: code })

if (isDangerous) {
  console.error('⚠️ Invisible characters detected!')
}

2. Detailed findings

import { hasConfusables } from 'anti-trojan-source'

const findings = hasConfusables({
  sourceText: code,
  detailed: true
})

findings.forEach(finding => {
  console.log(`Line ${finding.line}:${finding.column}`)
  console.log(`  Character: ${finding.codePoint} ${finding.name}`)
  console.log(`  Category: ${finding.category}`)
  console.log(`  Context: ${finding.snippet}`)
})

Example output:

Line 3:45
  Character: U+E0100 VARIATION SELECTOR-17
  Category: Variation Selector
  Context: const value = getUserInput()

Line 12:8
  Character: U+200B ZERO WIDTH SPACE  
  Category: Cf (Format)
  Context: if (isAdmin) {

3. CLI usage example with multiple output modes

# Simple mode (exit code 1 if found)
npx anti-trojan-source --files='src/**/*.js'

# Verbose mode with detailed information
npx anti-trojan-source --files='src/**/*.js' --verbose

# JSON mode for programmatic processing
npx anti-trojan-source --files='src/**/*.js' --json

Anti-trojan-source implementation guide

Install the command-line tool to your project:

1npm install -D anti-trojan-source

Or use directly with npx (no installation required):

npx anti-trojan-source --files='**/*.js'

Then, scan a single file:

npx anti-trojan-source src/index.js

Or scan multiple files with shell-notation globbing:

npx anti-trojan-source --files='src/**/*.{js,ts,jsx,tsx}'

The anti-trojan-source CLI supports verbosemode for detailed output. Useful when you need to understand exactly what was detected and where:

npx anti-trojan-source --files='src/**/*.js' --verbose

Example output:

[x] Detected cases of trojan source in the following files:
| 
 - src/utils.js
   Line 12:34 - U+200B ZERO WIDTH SPACE [Cf (Format)]
   Snippet: const value = getUserInput()
   Line 45:10 - U+202E RIGHT-TO-LEFT OVERRIDE [Cf (Format)]
   Snippet: if (isAdmin) { // Check permissions
   Line 78:22 - U+E0100 VARIATION SELECTOR-17 [Variation Selector]
   Snippet: const token = process.env.API_KEY

A JSON output mode is also supported, allowing for programmatic analysis. Useful for automation, CI/CD integration, and custom reporting:

npx anti-trojan-source --files='src/**/*.js' --json

Example output:

[
  {
    "file": "src/utils.js",
    "findings": [
      {
        "line": 12,
        "column": 34,
        "codePoint": "U+200B",
        "name": "ZERO WIDTH SPACE",
        "category": "Cf (Format)",
        "snippet": "const value = getUserInput()"
      },
      {
        "line": 45,
        "column": 10,
        "codePoint": "U+202E",
        "name": "RIGHT-TO-LEFT OVERRIDE",
        "category": "Cf (Format)",
        "snippet": "if (isAdmin) { // Check permissions"
      }
    ]
  }
]

CI/CD integration

The following is an example of how to create a GitHub Action that you can embed as part of your CI process:

Create the file .github/workflows/security-scan.yml:

name: Unicode Security Scan

on:
  pull_request:
  push:
    branches: [main, master, develop]

jobs:
  scan-invisible-chars:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Scan for invisible Unicode attacks
        run: npx anti-trojan-source --files='**/*.{js,ts,jsx,tsx,py,java,go,rs}' --json

      - name: Fail on detection
        if: failure()
        run: |
          echo "::error::Invisible Unicode characters detected in source code"
          echo "::error::This could indicate a Trojan Source or Glassworm-style attack"
          exit 1

Advanced category-based detection

Understanding nicode categories

anti-trojan-source doesn't just look for specific characters but also analyzes characters by their Unicode category, making it future-proof against new attacks.

Format category (Cf):

Characters that affect formatting but don't produce visual output
Includes bidirectional controls, invisible separators, and zero-width characters
Examples: U+200B (ZERO WIDTH SPACE), U+202E (RIGHT-TO-LEFT OVERRIDE)

Control category (Cc):

Non-printable control characters
Ranges: U+0000-U+001F and U+007F-U+009F
Exception: TAB (U+0009), LF (U+000A), and CR (U+000D) are on an allow-list as legitimate

Why Category-Based Detection Matters?

A traditional approach would be similar to the following JavaScript code:

const dangerousChars = ['\u200B', '\u200C', '\u200D', /* ... explicit list */];
// Problem: New Unicode versions add new characters
// Problem: Attackers can use unlisted characters

Whereas a Category-based approach would benefit from:

Detecting ALL Format (Cf) characters, even ones not explicitly listed
Detecting ALL Control (Cc) characters except whitelisted whitespace

Going forward

Do not solely rely on visual code review, even though some IDEs like VS Code and GitHub online diff viewer have added some support for indicating Unicode in the text. Do not trust “it looks fine” and don’t assume your tooling automatically shows invisible characters; instead, you should proactively scan for them.

We recommend you scan all code before merging pull requests. Integrate with Snyk (it’s free), which automatically runs static code analysis and continuously monitors your project from GitHub-sourced repositories and other integrations.

Utilize the anti-trojan-source command-line utility to specifically detect potentially dangerous characters in your source code.

Ready to future-proof your development pipeline against the next evolution of supply chain threats? Download the white paper: Navigating the modern software supply chain: Securing open source, AI-generated code, & SBOM compliance.

Resources

Tools:

The anti-trojan-source CLI: https://github.com/lirantal/anti-trojan-source
ESLint Plugin: https://github.com/lirantal/eslint-plugin-anti-trojan-source

Research:

Trojan Source Paper: https://trojansource.codes/trojan-source.pdf
Glassworm Analysis: https://www.koi.ai/blog/glassworm-first-self-propagating-worm-using-invisible-code-hits-openvsx-marketplace
CVE-2021-42574: Trojan Source vulnerability disclosure
CVE-2021-42694: Homoglyph attacks

Standards:

Unicode Standard: https://unicode.org/standard/standard.html
Unicode Categories: https://www.unicode.org/reports/tr44/#General_Category_Values

Secure your supply chain with Snyk

87% of our respondents were impacted by supply chain security issues. Keep yours secure with Snyk.

Book a live demo

Patched & Dispatched

Want to try it for yourself?

Defending Against Glassworm: The Invisible Malware That's Rewriting Supply Chain Security

The evolution of invisible character attacks

The supply chain amplification effect

Glassworm: The next evolution (2025)

Understanding the Glassworm attack

Stage 1: The invisible payload

Stage 2: Blockchain-Based C2

Stage 3: Credential harvesting

Detecting invisible characters with anti-trojan-source

What anti-trojan-source detects

Detection capabilities

1. Simple boolean detection

2. Detailed findings

3. CLI usage example with multiple output modes

Anti-trojan-source implementation guide

CI/CD integration

Advanced category-based detection

Understanding nicode categories

Going forward

Resources

Secure your supply chain with Snyk