Buffer overflow attacks in C++: A hands-on guide
Snyk Security Research Team
28 de julho de 2022
0 minutos de leituraA buffer overflow is a type of runtime error that allows a program to write past the end of a buffer or array — hence the name overflow— and corrupt adjacent memory. Like most bugs, a buffer overflow doesn’t manifest at every program execution. Instead, the vulnerability is triggered under certain circumstances, such as unexpected user input.
A buffer overflow attackis the exploitation of a buffer overflow vulnerability — typically by a malicious actor who wants to gain access or information. In this post, we'll explain how a buffer overflow occurs and show you how to protect your C++ code from these attacks.
Buffer overflow attack example
To understand how a buffer overflow occurs, let’s look at the following code, which performs a simple password check, and is susceptible to a buffer overflow attack:
1#include <cstdio>
2#include <cstring>
3#include <iostream>
4
5const char *PASSWORD_FILE = "rictro";
6
7int main()
8{
9 char input[8];
10 char password[8];
11
12 std::sscanf(PASSWORD_FILE, "%s", password);
13
14 std::cout << "Enter password: ";
15 std::cin >> input;
16
17 // Debug prints:
18 // std::cout << "Address of input: " << &input << "\n";
19 // std::cout << "Address of password: " << &password << "\n";
20 // std::cout << "Input: " << input << "\n";
21 // std::cout << "Password: " << password << "\n";
22
23 if (std::strncmp(password, input, 8) == 0)
24 std::cout << "Access granted\n";
25 else
26 std::cout << "Access denied\n";
27
28 return 0;
29}
The code snippet prompts the user to enter a password (line 14). It then compares this input against the stored password (line 23) that it previously loaded (line 12). If they match, the user is granted access.
In practice, we would read the password from a file via std::fscanf
, but to keep this example simple, we’ll read it from a string constant instead. Furthermore, we would ideally store a salted and hashed version of our password instead of the original one, but for simplicity, we’ll use a cleartext password.
A mechanism like this could be used to unlock the shareware — or trial — version of an application, or grant the user access to information and features inside the application by entering an administrator’s password.
Let’s run the program and see what happens:
1Enter password: rictro
2Access granted
"As expected, entering the correct password grants us access. Conversely, entering an incorrect password denies access:
1Enter password: hello
2Access denied
So far, everything works properly. However, entering the correct password isn’t the only way to gain access to this particular application. Let’s see what happens if we enter “sunshinesunshine” instead:
1Enter password: sunshinesunshine
2Access granted
That’s unexpected. We entered something very different from the correct password “rictro” — and were still granted access.
The explanation is that we successfully executed a buffer overflow attack against the program in question. To understand how this happened, let’s uncomment the debug prints in lines 18–21 and rerun the application.
First, let’s run it with the proper input:
1Enter password: rictro
2Address of input: 0x7ffc5581e4a8
3Address of password: 0x7ffc5581e4b0
4Input: rictro
5Password: rictro
6Access granted
There’s nothing notable about the output.
Let’s try an incorrect password:
1Enter password: hello
2Address of input: 0x7ffc5581e4a8
3Address of password: 0x7ffc5581e4b0
4Input: hello
5Password: rictro
6Access denied
Once again, the application behaves as expected.
However, notice what happens when we enter our special password, “sunshinesunshine.”
1Enter password: sunshinesunshine
2Address of input: 0x7ffc5581e4a8
3Address of password: 0x7ffc5581e4b0
4Input: sunshinesunshine
5Password: sunshine
6Access granted
The password no longer has the value “rictro” — it now contains “sunshine.”
We use two 8-byte arrays: password
stores the application password and input
stores the user input. The compiler we use in this example (GCC 10.3) places password
eight bytes after input
(0x7ffc5581e4b0
- 0x7ffc5581e4a8
= 8), so the arrays are adjacent in memory. Note that a different compiler may produce different results.
Passwords shorter than eight characters produce memory blocks that look something like this:
If, however, we enter “sunshinesunshine,” the memory looks like this:
The null-terminator \0
is written past the end of password
, overriding whatever happens to reside in the stack at the time.
This happens because std::cin
(line 15) doesn’t perform any bounds checks. It reads from the console until it encounters a new line — until the user presses Enter — without ensuring that the receiving buffer is large enough to hold the user’s input.
Because we only compare the first eight characters of both password
and input
using std::strncmp
(line 23) to avoid reading past the end of either array, we get a match where there shouldn’t be one.
How to prevent buffer overflows in C++
Compared to other high-level programming languages, C++ is especially susceptible to buffer overflows because large parts of its ecosystem, including parts of the C++ standard library, still use raw pointers. We might use managed buffers like std::vector
or std::string
inside our own code, but we lose our bounds-checking abilities as soon as we interface with C-style APIs that force us to pass vector::data
or string::c_str
.
Implement C++ coding best practices
The best way to prevent buffer overflows is to use APIs that aren’t vulnerable. In C++, this means using managed buffers and strings rather than raw arrays and pointers.
Prefer | Avoid |
---|---|
|
|
|
|
|
|
|
|
We can use std::string
to fix our example application.
Let’s look at the corrected version. Note the changes to lines 4, 10, and 24:
1#include <cstdio>
2#include <cstring>
3#include <iostream>
4#include <string>
5
6const char *PASSWORD_FILE = "rictro";
7
8int main()
9{
10 std::string input;
11 char password[8];
12
13 std::sscanf(PASSWORD_FILE, "%s", password);
14
15 std::cout << "Enter password: ";
16 std::cin >> input;
17
18 // Debug prints:
19 std::cout << "Address of input: " << &input << "\n";
20 std::cout << "Address of password: " << &password << "\n";
21 std::cout << "Input: " << input << "\n";
22 std::cout << "Password: " << password << "\n";
23
24 if (std::strncmp(password, input.c_str(), 8) == 0)
25 std::cout << "Access granted\n";
26 else
27 std::cout << "Access denied\n";
28
29 return 0;
30}
std::string
overloads the extraction (>>
) operator, allowing it to safely read from the streams. It does this by querying the stream for the input size, allocating enough memory, and only then reading from the stream. This makes it impossible for us to overflow the buffer, no matter how long our input is. At worst, we might run out of memory.
1Enter password: hellooo…ooo
2Address of input: 0x7ffc5581e4a8
3Address of password: 0x7ffc5581e4c8
4Input: hellooo…ooo
5Password: rictro
6Access denied
If you’re required to use C-style APIs, try to use their “safe” counterparts if they’re available. These are bounds-checked versions that accept an additional size
parameter and refuse to read or write beyond it.
Prefer | Avoid |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Sanitize addresses to prevent buffer overflow
In addition to good coding practices, there are automated tools that can help detect buffer overflows. AddressSanitizer (ASan) is among the most popular. It’s supported by all major compilers, including Visual Studio (v16.9 and later), GCC (v4.8 and later), Clang (v3.1 and later), and Xcode (v7.0 and later). Address sanitization can be turned on using the /fsanitize=address
option in Visual Studio and the -fsanitize=address
option in GCC/Clang.
ASan performs two functions:
It pads all stack objects and all heap allocations with a few bytes of “poisoned memory” by replacing
malloc
with a modified version.The compiler injects code into your application to detect whether it tries to access any of the poisoned memory.
We previously saw how our input
and password
arrays are laid out in memory when compiled with GCC 10.3. If we enable address sanitization, the layout may change to something resembling the following:
The ✗ symbols represent the poisoned memory that was placed around our arrays.
The code injected by the compiler contains logic to detect if we try to access poisoned memory. Consider a simple code snippet before transformation:
1array[i] = 5;
After ASan processes it, the code now looks like this:
1if (is_poisoned(&array[i]))
2{
3 print_error();
4 std::abort();
5}
6array[i] = 5;
To determine whether the application tries to access any poisoned memory — by implementing the is_poisoned
function — ASan uses “shadow memory.” This is a separate region of memory that stores metadata about the actual application memory.
In practice, the algorithm balances more factors than the scope of this article permits, but this simple native implementation demonstrates the idea.
Consider a simple bitset in which every byte the application accesses has a corresponding bit in shadow memory that indicates whether that byte is poisoned.
We might see simple declaration like this one:
1char password[6];
To implement shadow memory, the declaration may be transformed into something like this:
1char temp[8];
2char* password = &temp[1];
3shadow_memory[&temp...&temp+7] = 0b10000001;
The user declares a char array
of 6 bytes on the stack. The compiler instead creates an 8-byte large array and returns a pointer to the middle, padding it with 1 byte on each side. It then writes the bit pattern 10000001
into the bitset shadow_memory
using the temp
address as an index, indicating that the 6 bytes contained in password
are clean, while the two surrounding bytes are poisoned.
The compiler can now easily determine whether we’re trying to access poisoned memory. Consider the following snippet:
1*pointer = 5;
2
3After the compiler injects code, the snippet now looks like this:
4
5if (shadow_memory[pointer] == 1)
6{
7 print_error();
8 std::abort();
9}
10*pointer = 5;
"Now that you have a basic understanding of address sanitization, let’s see what happens if we feed the malicious input “sunshinesunshine” into our original, vulnerable application — this time, with address sanitization enabled:
1Enter password: sunshinesunshine
2Address of input: 0x7ffc5581e4a8
3Address of password: 0x7ffc5581e4c8
4=============================
5==15857==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc5581e4b0
Notice two things:
The gap between our
input
andpassword
arrays increased from 8 bytes to 32 bytes (0x7ffc5581e4c8
-0x7ffc5581e4a8
= 32) to make room for the poisoned memory.The compiler detects a memory corruption at address
0x7ffc5581e4b0
, exactly 8 bytes afterinput
, corresponding to the first poisoned byte it encounters.
Protecting your C++ projects
In this post, we’ve covered the basics on buffer overflow attacks in C++ and how to best protect your projects. To recap, a buffer overflow is a type of vulnerability that allows a program to write past the end of a buffer, resulting in memory corruption — which can then be exploited to gain access to restricted applications or information.
Among high-level languages, C++ is especially susceptible to buffer overflows because many APIs still use raw pointers and don’t perform bounds checks. This can be mitigated by using managed buffers and strings instead of C-style APIs, or, if a C-style API is required, by using their safe versions, which accept an additional size parameter. Buffer overflow attacks can also be prevented with tools that enable address sanitization to detect memory defects or overruns.
For more information on C++ security, check out our unintimidating intro to C/C++ vulnerabilities and learn about directory traversal vulnerabilities in C/C++.
Primeiros passos com Capture the Flag
Saiba como resolver desafios de Capture the Flag assistindo ao nosso workshop virtual de conceitos básicos sob demanda.