Sigreturn Labs Blog

Building the smallest ELF program

contact@sigreturn.com (Adam Taguirov) — Sun, 16 Jun 2024 12:00:00 +0000

In this post we will have fun trying to create the smallest possible 64 bits Linux program (ELF binary) that simply outputs “Hello world!” when it is executed.

The idea here is to understand the compilation process, linking, how loader works, how ELF file format is structured, and so on.

State of the art

So let’s simply create a program in C that outputs our string. In this default case we will not optimize anything nor try to reduce our binary size.

#include 

void main(void)
{
    printf("Hello world!");
}

Let’s compile it with GCC and run it:

$ gcc smallest_elf.c -o smallest_elf.bin
$ ./smallest_elf.bin
Hello world!

Initial size: 16704 bytes.

The default compiled binary is quite big for only 65 bytes of written code. Why is that? Let’s analyse out binary and check what we can remove to reduce its size.

Too many sections

.interp
.note.gnu.propert
.note.gnu.build-i
.note.ABI-tag
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt
.plt.got
.plt.sec
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.dynamic
.got
.data
.bss
.comment
.symtab
.strtab
.shstrtab

Well first of all, our binary has 30 sections inside, we don’t need all of them. We do not need relocations, symbols, or even PLT/GOT and a lot of other stuff. The compiler produced the default binary it would produce even for longer code.

Use readelf to see the ELF’s sections: readelf -S smallest_elf.bin

Too many symbols

0000000000003dc8 d _DYNAMIC
0000000000003fb8 d _GLOBAL_OFFSET_TABLE_
0000000000002000 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
000000000000215c r __FRAME_END__
0000000000002014 r __GNU_EH_FRAME_HDR
0000000000004010 D __TMC_END__
0000000000004010 B __bss_start
                 w __cxa_finalize@@GLIBC_2.2.5
0000000000004000 D __data_start
0000000000001100 t __do_global_dtors_aux
0000000000003dc0 d __do_global_dtors_aux_fini_array_entry
0000000000004008 D __dso_handle
0000000000003db8 d __frame_dummy_init_array_entry
                 w __gmon_start__
0000000000003dc0 d __init_array_end
0000000000003db8 d __init_array_start
00000000000011e0 T __libc_csu_fini
0000000000001170 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
0000000000004010 D _edata
0000000000004018 B _end
00000000000011e8 T _fini
0000000000001000 t _init
0000000000001060 T _start
0000000000004010 b completed.8061
0000000000004000 W data_start
0000000000001090 t deregister_tm_clones
0000000000001140 t frame_dummy
0000000000001149 T main
                 U printf@@GLIBC_2.2.5
00000000000010c0 t register_tm_clones

Our program has symbols, that’s additional information we don’t need to display our string.

Use nm to see the ELF’s symbols: nm smallest_elf.bin

Too much code

First of all, the only executable section we need is .text, that’s where our main code is. But we notice there are instructions outside this section:

0000000000001000 <.init>:
    1000:   f3 0f 1e fa             endbr64
    1004:   48 83 ec 08             sub    rsp,0x8
    1008:   48 8b 05 d9 2f 00 00    mov    rax,QWORD PTR [rip+0x2fd9]
    100f:   48 85 c0                test   rax,rax
    1012:   74 02                   je     1016 <__cxa_finalize@plt-0x2a>
    1014:   ff d0                   call   rax
    1016:   48 83 c4 08             add    rsp,0x8
    101a:   c3                      ret

Also, there are 388 bytes of instructions in .text section, that’s a lot considering we just want to output “Hello world!”.

Use objdump to see the ELF’s executable section’s instructions: objdump -d smallest_elf.bin

Too much empty space

We also notice something interesting in our binary, there is a lot of empty space, filled with zeroes.

00000600: 0000 0000 0000 0000 0000 0000 0000 0000
00000610: 0000 0000 0000 0000 0000 0000 0000 0000
00000620: 0000 0000 0000 0000 0000 0000 0000 0000
00000630: 0000 0000 0000 0000 0000 0000 0000 0000
00000640: 0000 0000 0000 0000 0000 0000 0000 0000
00000650: 0000 0000 0000 0000 0000 0000 0000 0000
00000660: 0000 0000 0000 0000 0000 0000 0000 0000
00000670: 0000 0000 0000 0000 0000 0000 0000 0000
[...]

For example the space above has 2544 bytes of zeroes in total. There are several empty spaces like this.

Use xxd to see a file’s hexadecimal data: xxd smallest_elf.bin

Quick optimizations

We will go ahead to try and reduce our executable’s size, we will implement several methods so you can get an idea of what can be done to produce the smallest possible binary by manipulating compiled binary.

Strip symbols

First of all, let’s remove all the symbols and relocation information from the executable.

$ nm smallest_elf.bin
nm: smallest_elf.bin: no symbols

Use strip to strip an executable from all its symbols and relocation information: strip -s smallest_elf.bin

After the operation, the size of the binary goes from to 16704 to 14472.

New size: 14472 bytes.

Remove unnecessary sections

We can also remove some sections that are unnecessary to the main task of our program, for example .data, or .gnu.version.

Indeed, we do not need those sections, for example our string “Hello world!” is already stored in .rodata section :

00002000: 0100 0200 4865 6c6c 6f20 776f 726c 6421  ....Hello world!

Use objcopy to remove a specific section from an ELF executable: objcopy --remove-section .data smallest_elf.bin

Major modifications

We will go ahead to try and reduce our executable’s size even more, we will implement several methods so you can get an idea of what can be done to produce the smallest possible binary while still keeping its initial function : displaying a string.

Keep in mind that we’re doing this for fun, and for the technical challenge. In real life, you should not release programs that you have modified that way.

Get rid of programming language

We all now programming languages are converted to assembly language by the compiler during the compilation process and the code can even be optimized automatically. The output may result in more instructions than needed for our task.

Let’s re-write our code in assembly language!

section .data
    msg:    db "Hello world", 33, 10, 0
    format: db "%s", 10, 0

section .text
    global main

main:
    extern printf
    push rbp
    mov rbp, rsp
    mov rdi, msg
    call printf
    pop rbp
    ret

We assemble the code with nasm then link the object with gcc then run it.

$ nasm -f elf64 smallest_elf.asm && gcc smallest_elf.o -o smallest_elf.bin -no-pie
$ ./smallest_elf.bin
Hello world!

New size: 14368 bytes.

We only reduced our file size by 104 bytes by completely rewriting it in assembly. Why?

Well by giving the assembled code object to gcc we only told it what the .text content should look like, but all the other sections and additional data are still here. In order to get rid of it, we will have to link our binary ourselves, getting rid of gcc routines.

Getting straight to the point

We have rewritten the whole file in assembly and compiling it with GCC, but we’re kind of stuck here. How do we reduce the size even more? Maybe trying another compiler? Reducing code even more?

Let’s get straight to the point: we need the program to display “Hello world!”, that’s it. We don’t want external dependencies like the printf() function.

Let’s rewrite the whole assembly code and remove all external references and symbols!

Removing external references

We will make the following changes to our assembly code:

Removing any call to external functions like printf(). Instead, we’ll use direct system calls like write() and exit().
Removing references to a “main” function, we don’t need that, we don’t need “functions” in our program.
Removing prologues, epilogues, and stack frames: yes, those useless bytes at the beginning and end of our code, why would we need them here?
The whole code will be strictly about printing our buffer and exiting the program.

global _start

section .data
        msg:    db "Hello world", 33, 10, 0

section .text

_start:
        mov rdi, 1      ; standard output
        mov rsi, msg    ; buffer to print
        mov rdx, 14     ; size of the buffer

        mov rax, 1      ; set write syscall

        syscall         ; call write

        mov rdi, 0      ; value to return
        mov rax, 0x3C   ; set exit syscall

        syscall         ; call exit

You can notice that I’m using the exit system call to properly stop the program after printing the buffer. Otherwise, the program would crash, but the buffer will still be printed. Up to you to decide if you consider the crash important or not in this exercise.

In my case, I chose to consider the program should always properly exit.

Get rid of compilers

We don’t have any C code anymore, why would we even need a compiler? Let’s get rid of gcc and directly link the code ourselves.

$ nasm -f elf64 smallest_elf.asm
$ ld -m elf_x86_64 smallest_elf.o -o smallest_elf.bin

Let’s run it and check:

$ ./smallest_elf.bin
Hello world!

With the rewritten assembly code and linking without using any compiler, we reduced the size to 8488 bytes.

New size: 8488 bytes.

Get rid of the data section

We initially put our “Hello world!” string in the .data section, but at this point we’re not following any convention and we’ll just remove the .data section to put our string directly inside the .text code section. Yeah it’s a bit weird but don’t worry, it will work.

global _start

section .text

_start:
        mov rdi, 1      ; standard output
        mov rsi, msg    ; buffer to print
        mov rdx, 14     ; size of the buffer

        mov rax, 1      ; set write syscall

        syscall         ; call write

        mov rdi, 0      ; value to return
        mov rax, 0x3C   ; set exit syscall

        syscall         ; call exit
msg:
        db      "Hello world", 33, 10, 0

Doing this small manipulation, we manage to divide by two the last size of the binary!

New size: 4360 bytes.

Analysing the situation

We did pretty much everything we could to reduce the binary size:

Writing directly assembly code
No external function, no stack frames, only code section
No compiler, directly linking
Stripping the symbols

At this point, there isn’t much more we can do in a conventional way to reduce the binary size. By the way, why is it still that big?

We can notice through readelf command that our binary still has a lot of stuff inside of it. We have the .shstrtab section header, and a huge amount of empty space, because some tables and sections have been encoded as “empty spaces” filled with null bytes in the binary.

Nearly 92% of our binary is filled with useless empty spaces.

Check the binary composition with readelf -a smallest_elf.bin and the actual data in hexadecimal with xxd smallest_elf.bin. Notice all the zero bytes.

Going further

Some step in the linking process will produce this kind of ELF binary filled with a lot of empty space, that will simply increase our binary size.

Now we will have to build our binary ourselves, manually, without relying on the assembler or the linker.

Identifying the needed information

There is a lot of useless information in our binary so let’s start by identification strictly what we need:

The ELF header, otherwise it would not be considered an an ELF by the system and could not be loaded
Our actual code

This portion at the beginning is our header:

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 0010 4000 0000 0000  ..>.......@.....
00000020: 4000 0000 0000 0000 4810 0000 0000 0000  @.......H.......
00000030: 0000 0000 4000 3800 0200 4000 0300 0200  ....@.8...@.....
00000040: 0100 0000 0400 0000 0000 0000 0000 0000  ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000                      ........

And this portion is our code:

00001000: bf01 0000 0048 be27 1040 0000 0000 00ba  .....H.'.@......
00001010: 0e00 0000 b801 0000 000f 05bf 0000 0000  ................
00001020: b83c 0000 000f 0548 656c 6c6f 2077 6f72  .<.....Hello wor
00001030: 6c64 210a 00                             ld!..

And that’s it, we don’t really care what all the remaining is.

Let’s manually construct our new binary with only these two blocks of data. Use any method you like to do that, I used simple Linux commands.

$ head -c 120 smallest_elf.bin > new_smallest_elf.bin.header # extract header
$ tail -c 264 smallest_elf.bin > tmp.bin # extract end of file starting from our code
$ head -c 53 tmp.bin > new_smallest_elf.bin.code # extract our code from it
$ cat new_smallest_elf.bin.header new_smallest_elf.bin.code > new_smallest_elf.bin # assemble both blocks into one final ELF executable

So this is what we get:

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 0010 4000 0000 0000  ..>.......@.....
00000020: 4000 0000 0000 0000 4810 0000 0000 0000  @.......H.......
00000030: 0000 0000 4000 3800 0200 4000 0300 0200  ....@.8...@.....
00000040: 0100 0000 0400 0000 0000 0000 0000 0000  ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000 bf01 0000 0048 be27  .............H.'
00000080: 1040 0000 0000 00ba 0e00 0000 b801 0000  .@..............
00000090: 000f 05bf 0000 0000 b83c 0000 000f 0548  .........<.....H
000000a0: 656c 6c6f 2077 6f72 6c64 210a 00         ello world!..

Obviously, a lot of information from the headers is inaccurate since we modified the whole structure of the file and the program will not execute:

$ ./new_smallest_elf.bin
-bash: ./new_smallest_elf.bin: cannot execute binary file: Exec format error

Let’s check what’s happening with readelf:

$ readelf -a new_smallest_elf.bin
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          4168 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         2
  Size of section headers:           64 (bytes)
  Number of section headers:         3
  Section header string table index: 2
readelf: Error: Reading 192 bytes extends past end of file for section headers
readelf: Error: Section headers are not available!
readelf: Error: Reading 112 bytes extends past end of file for program headers

There is no dynamic section in this file.
readelf: Error: Reading 112 bytes extends past end of file for program headers

Several issues identified here:

Entry point address incorrect: our new code starts at offset 0x78, not 0x1000.
Start of section headers incorrect: we do not have any section header, this should be zero.
Number of program headers incorrect: we only have 1 program header and not 2.
Size of section headers incorrect: we do not have any section header, this should be zero.
Number of section headers incorrect: we do not have any section header, this should be zero.
Section header string table index: we do not have any section header, this should be zero.

We also need to adjust several stuff in the program header:

Virtual address of program needs to be changed from 0x400000 to 0x400078 because this is where our program starts. Not aligned? We don’t care.
Permissions of the segment in the program header is read-only (0x004) and needs to be readable, writable and executable for simplicity (0x007).

We manually apply all those modification directly through a hexadecimal editor and run readelf again:

$ readelf -a new_smallest_elf.bin
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400078
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         1
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000078 0x0000000000400078 0x0000000000400000
                 0x00000000000000b0 0x00000000000000b0  RWE    0x1000

There is no dynamic section in this file.

There are no relocations in this file.
No processor specific unwind information to decode

Dynamic symbol information is not available for displaying symbols.

No version information found in this file.

This time, no error. But we still need to adjust one small detail inside our actual code. Indeed, we assembled the code before making all those modifications and we are calling the write function: write(1, buffer, 13);

Indeed, the “Hello world!” buffer is no longer located at offset 0x1027, the new offset is 0x9f.

Here is the final modified binary (modified bytes versus the previous dump):

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7800 4000 0000 0000  ..>.....x.@.....
00000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 0000 0000 0000  ....@.8.........
00000040: 0100 0000 0700 0000 7800 0000 0000 0000  ........x.......
00000050: 7800 4000 0000 0000 0000 4000 0000 0000  x.@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000 bf01 0000 0048 be9f  .............H..
00000080: 0040 0000 0000 00ba 0e00 0000 b801 0000  .@..............
00000090: 000f 05bf 0000 0000 b83c 0000 000f 0548  .........<.....H
000000a0: 656c 6c6f 2077 6f72 6c64 210a            ello world!.

Let’s test it now:

./new_smallest_elf.bin
Hello world!

New size: 172 bytes.

We have hit a new record by reducing our initial program size from 16704 to only 172 bytes.

We could call it a day, but hey, can we actually do better?

Going even further

Let’s try to shrink even more our executable. But in order to do that, let’s modify a little bit the initial exercise. We no longer need to display “Hello world!” string, but just compile any ELF executable, smallest as possible.

In order to be considered a valid executable:

It must execute at least one assembly instruction
It must not crash

Let’s take our functional header and remove all the custom code at offset 0x78. We will append new code there.

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7800 4000 0000 0000  ..>.....x.@.....
00000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 0000 0000 0000  ....@.8.........
00000040: 0100 0000 0700 0000 7800 0000 0000 0000  ........x.......
00000050: 7800 4000 0000 0000 0000 4000 0000 0000  x.@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000                      ........

Smallest possible code

Considering the previous conditions, our new code must include a routine to properly exit the program. We could try something like this:

mov rax, 0x3C   ; set exit syscall
syscall         ; call exit

Yes, we did omit the rdi register containing the value to be returned by the program. We don’t really care, the return value is not a condition. We’ll let the program return whatever will be in the register.

Once converted to opcodes we get b8 3c 00 00 00 0f 05, so 7 bytes. Instead of using a mov instruction, let’s use push and pop for the same result.

push 0x3C       ; set exit syscall
pop rax
syscall         ; call exit

This gets us the opcodes 6a 3c 58 0f 05 (5 bytes) which is slightly better, we’ll stick with that one. Let’s append it to our header and run it!

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7800 4000 0000 0000  ..>.....x.@.....
00000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 0000 0000 0000  ....@.8.........
00000040: 0100 0000 0700 0000 7800 0000 0000 0000  ........x.......
00000050: 7800 4000 0000 0000 0000 4000 0000 0000  x.@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000 6a3c 580f 05         ........j


We notice that the program runs fine and even returns the default zero value.
$ ./smallest_elf_v2.bin
$ echo $?
0

New size: 125 bytes.
Going beyond the documentation
Actually we can still save a few bytes by taking advantage of the fact that some portions of the header will not be verified upon execution. For example the 7-bytes “padding” after the magic byte or the last elements of the ELF header.
First, let’s move our actual code, from the end of the program, directly inside the padding of the ELF header, and update the offsets accordingly. It will no longer be located at 0x78, but 0x08.
Then, let’s overlap the ELF header and the program header at the very end of the ELF header, by starting the program header at offset 0x38 instead of 0x40. This works because the original overwritten data is 0100 0000, and our program header starts with 0100 0000 as well.
Which gives us the following binary:
00000000: 7f45 4c46 0201 0100 6a3c 580f 0500 0000  .ELF....j.......@.....
00000020: 3800 0000 0000 0000 0000 0000 0000 0000  8...............
00000030: 0000 0000 4000 3800 0100 0000 0700 0000  ....@.8.........
00000040: 0800 0000 0000 0000 0800 4000 0000 0000  ..........@.....
00000050: 0000 4000 0000 0000 b000 0000 0000 0000  ..@.............
00000060: b000 0000 0000 0000 0010 0000 0000 0000  ................

New size: 112 bytes.
Tricks and more tricks
The previous idea of overlapping the two headers can actually be applied to a larger scale.
The range from 0x18 to 0x40 can actually contain both ELF header and program header overlapped. The values that can be modified without impacting the program’s functionality are in bold.



Original ELF header
Original program header
New overlapped header




08
01
01


00
00
00


40
00
00


00
00
00


00
07
01


00
00
00


00
00
00


00
00
00


38
08
18


00
00
00


00
00
00


00
00
00


00
00
00


00
00
00


00
00
00


00
00
00


00
08
18


00
00
00


00
40
00


00
00
00


00
00
01


00
00
00


00
00
00


00
00
00


00
00
00


00
40
01


00
00
00


40
00
00


00
00
00


38
00
38


00
00
00


01
B0
01


00
00
00


00
00
00


00
00
00


07
00
00


00
00
00


00
00
00


00
00
00



By modifying the image address of our program and relocating our code right after the magic number, we get this executable of 80 bytes:
00000000: 7f45 4c46 6a3c 580f 0500 0000 0000 0000  .ELFj.............
00000020: 1800 0000 0000 0000 1800 0000 0100 0000  ................
00000030: 0000 0100 0000 3800 0100 0000 0000 0000  ......8.........
00000040: 0100 0000 0000 0000 0000 0000 0000 0000  ................

What we notice first is that most tools are lost with this binary. The Linux file command can only tell that this is an ELF, and readelf doesn’t like it either.
$ file smallest_elf.bin
smallest_elf.bin: ELF (AROS Research Operating System), unknown class 106

$ readelf -a smallest_elf.bin
ELF Header:
  Magic:   7f 45 4c 46 6a 3c 58 0f 05 00 00 00 00 00 00 00
  Class:                             
  Data:                              
  Version:                           88 
  OS/ABI:                            AROS
  ABI Version:                       5
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1
  Start of program headers:          1 (bytes into file)
  Start of section headers:          24 (bytes into file)
  Flags:                             0x0
  Size of this header:               24 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         1
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 1 
readelf: Warning: possibly corrupt ELF file header - it has a non-zero section header offset, but no section headers

There are no sections to group in this file.

There is no dynamic section in this file.

Same thing for GDB debugger, it doesn’t recognize this file and refuses to debug it: not in executable format: file format not recognized.
But all things considered, this program actually runs fine and respects all our conditions:
# Normal run
$ ./smallest_elf.bin
$ echo $?
0

# Checking with strace
$ strace ./smallest_elf.bin
execve("./smallest_elf.bin", ["./smallest_elf.bin"], 0x7fffd6fb2730 /* 25 vars */) = 0
exit(0)                                 = ?
+++ exited with 0 +++

New size: 80 bytes.
Just for the art, let’s clean up the executable by setting to zero all bytes that are not needed.
00000000: 7f45 4c46 6a3c 580f 0500 0000 0000 0000  .ELFj.............
00000020: 1800 0000 0000 0000 1800 0000 0100 0000  ................
00000030: 0000 0000 0000 3800 0100 0000 0000 0000  ......8.........
00000040: 0100 0000 0000 0000 0000 0000 0000 0000  ................

Is it the end?
We have probably reached the limits of the ELF 64 bits format, we produced the smallest 64 bits ELF possible that does not crash upon execution and correctly exits with a 0 status code.
Final size: 80 bytes.
Final binary:
7f454c46 6a3c580f 05000000 00000000 02003e00 00000000 01000000
01000000 18000000 00000000 18000000 01000000 00000000 00003800
01000000 00000000 01000000 00000000 00000000 00000000

Original ELF header	Original program header	New overlapped header
08	01	01
00	00	00
40	00	00
00	00	00
00	07	01
00	00	00
00	00	00
00	00	00
38	08	18
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	08	18
00	00	00
00	40	00
00	00	00
00	00	01
00	00	00
00	00	00
00	00	00
00	00	00
00	40	01
00	00	00
40	00	00
00	00	00
38	00	38
00	00	00
01	B0	01
00	00	00
00	00	00
00	00	00
07	00	00
00	00	00
00	00	00
00	00	00



Javascript engine exploitation methodology
contact@sigreturn.com (Adam Taguirov) — Thu, 25 May 2023 12:00:00 +0000
JavaScript engines are now one of the most attacked surfaces of modern operating systems. They run untrusted code from arbitrary websites the moment a tab opens, sit on top of multi-million-line JIT compilers (V8, JavaScriptCore, SpiderMonkey), and have access to a sandbox that, once broken out of, often leads straight to remote code execution on the host. The bug classes that dominate browser CVE lists today (typer mistakes in JIT optimisation, type confusion on object shapes, edge cases in property accessors and bounds elimination) all live inside this layer.
The talk below walks through the general methodology of approaching such an engine for offensive research: how to read the relevant parts of a multi-million-line C++ codebase, how to recognise the primitive shapes that lead to addrof / fakeobj, and how those primitives compose into a renderer-RCE chain.
It was given in French at the Quarks in the Shell 2023 conference, organised by Quarkslab.



Vulnerability research and ActiveX controller exploitation
contact@sigreturn.com (Adam Taguirov) — Sat, 28 May 2022 12:00:00 +0000

CVE-2011-4187
Stack buffer overflow in IppGetDriverSettings2 (nipplib.dll, Novell iPrint Client < 5.78). Reachable from a web page via the iPrint ActiveX controller (CLSID 36723F97-7AA0-11D4-8919-FF2D71D0D32C) on Windows XP. No public exploit at the time of research.

Target
The starting point was a CVE number and a one-line summary on cvedetails:

Buffer overflow in the GetDriverSettings function in nipplib.dll in Novell iPrint Client before 5.78 on Windows allows remote attackers to execute arbitrary code via a long realm field, a different vulnerability than CVE-2011-3173.

No public exploit, almost no third-party write-up. The client only installs on Windows XP, so the whole engagement runs on a Windows XP SP3 VM with a Windows 10 SDK box for tooling.
The iPrint client ships an ActiveX controller. The CLSID can be retrieved by searching the registry for Novell iPrint:

The controller is implemented in ienipp.ocx (in C:\Windows\system32\), with the heavy lifting delegated to nipplib.dll in the same directory. Browsing ienipp.ocx with the OLE/COM Object Viewer from the Windows 10 SDK lists the public methods, including GetDriverSettings (the named CVE target) and a GetDriverSettings2 variant.

The controller can be instantiated and invoked from an HTML page in Internet Explorer:






A quick sanity check using the controller’s ShowMessageBox method confirmed the CLSID and the call convention before going further.
Reaching the vulnerable function
Loading ienipp.ocx in IDA (it auto-pulls nipplib.dll) and following xrefs to IppGetDriverSettings2 lands on a single call site at ienipp.ocx:0x1000AE54. The block leading to that call enforces several conditions:

The first gate is a length check on each of the four method parameters (printerUri, realm, userName, password). Anything above 0x100 bytes per parameter aborts before the call. The second gate is a function called sub_1000FBD0 (renamed important_check) whose return value selects the next jump:

important_check is a thin wrapper around IppMgmtGetServerVersion2, exported by nipplib.dll. The wrapper returns 0 (which the callsite treats as success and proceeds to the vulnerable call) when IppMgmtGetServerVersion2 itself returns 0.
IppMgmtGetServerVersion2 is a one-line forwarder to sub_5C04B514, where the actual logic lives:

A first-pass reading suggests this function performs the IPP server handshake on port 631 and only succeeds when a real server replies correctly. That would mean emulating an IPP server before any vulnerability work is possible. Reading the CFG without that assumption reveals a more useful structure.
The first conditional jump branches on IppCreateServerRef’s return value:

If IppCreateServerRef returns NULL, control flow lands directly on a mov eax, 0; ret block. The function returns 0, which is the success code for IppMgmtGetServerVersion2. An allocation/setup failure is being treated as a successful version probe. The IPP handshake never runs, no port 631, no negotiation. The vulnerable call site is reached as long as IppCreateServerRef fails, which is the opposite of what the rest of the function is trying to achieve.
Forcing IppCreateServerRef to fail
IppCreateServerRef calls a helper sub_50022960 and propagates its return value: non-zero from the helper means failure for IppCreateServerRef, which is what is needed.
sub_50022960 performs two length checks on the printerUri argument. The first is on the total URL length (capped at 0x200), but that ceiling is already enforced upstream by the per-parameter 0x100 cap, so it cannot be tripped here without violating the upstream gate. The second check, located further into the function, validates the length of the substring preceding ://:

If that prefix exceeds 0x100 bytes, the function fails. The constraint is therefore:

prefix before :// must be longer than 0x100 bytes (to fail sub_50022960),
total URL length must stay under 0x200 bytes (to pass the ienipp.ocx per-parameter gate).

A URL of the form <260-byte garbage>:// satisfies both. With this, IppCreateServerRef returns NULL, IppMgmtGetServerVersion2 returns 0, important_check returns 0, and IppGetDriverSettings2 is invoked with attacker-controlled arguments.
The buffer overflow
IppGetDriverSettings2 itself contains one more gate before any vulnerable code: an strstr looking for the literal iPrint-driver-profile-hiddenPA inside the URL.

Including that substring in the URL passes the check. There is presumably a legitimate reason for it inside the driver profile flow; for the purpose of reaching the bug it is enough to embed it in the suffix.
Past that gate, the realm parameter is fed unchecked into a strcpy whose destination is a fixed-size stack buffer:

A realm of, say, 0x180 bytes overflows the buffer well into the saved return address territory.
Exploitation
Crashing on the saved EIP
The first crash, with realm filled with As up to the upstream cap, lands inside a strlen:

The overflow happened, but execution has already corrupted a pointer (EBX) used by a later function in the same frame, before the function returns. The crash is on a downstream consumer, not on the saved EIP.
The remedy is a shorter realm. The buffer is overrun precisely up to the saved EIP, no further:

EIP is now under control. There are no DEP/ASLR concerns to discuss on Windows XP SP3 in this configuration.
Placing a shellcode
The realm field is too constrained to host both the EIP overwrite and a shellcode. The other method parameters, however, are pushed on the stack ahead of realm and are not subject to the same downstream processing. userName is the natural carrier.
A pop-calc shellcode for Windows XP SP3 EN (shell-storm 739):
"\x31\xC9"             // xor ecx,ecx
"\x51"                 // push ecx
"\x68\x63\x61\x6C\x63" // push 0x636c6163  ('calc')
"\x54"                 // push esp
"\xB8\xC7\x93\xC2\x77" // mov  eax, 0x77c293c7
"\xFF\xD0"             // call eax

xxd -p -r is enough to splice the bytes into the HTML payload’s userName argument.
Reading the shellcode address
Running a non-malicious payload (long realm prefix to satisfy the bypass, but no overflow on realm) and breakpointing on IppGetDriverSettings2 exposes its arguments on the stack. The third argument (userName) holds the buffer address:

0x02843728 for this run. The Windows XP SP3 process layout is stable enough across launches that this address holds for the next call as long as the process is not restarted.
Final payload
Replacing the 0x41414141 filler at the saved-EIP offset with 0x02843728 (little-endian) redirects execution into the shellcode after the ret:






Loading the page in Internet Explorer:

Arbitrary code execution from a single HTML page, no IPP server.
Failed paths
Two earlier attempts did not reach the result and are worth recording.
Emulating the IPP server
Before noticing the IppCreateServerRef-fails-as-success path, the obvious approach was to make IppMgmtGetServerVersion2 succeed legitimately by serving the requests it issues to port 631. Capturing the request with nc -lvp 631 showed:
POST /ipp/IppSrvr HTTP/1.1
Accept: application/ipp
User-Agent: Novell iPrint Client - v05.74.00
Content-type: application/ipp
...

@G..attributes-charset.utf-8.H..attributes-natural-language.en-us.D.operation-name.get-server-version.server-version.1.1

Reverse of the response-validation function (nipplib.5C0450B3) listed the constraints: a 2-byte version-number that must be 0x100 or 0x101, a valid IPP HTTP header (taken verbatim from a CUPS server’s reply), and a server-version attribute located via IppFindAttributeInSet. Encoding the attribute group correctly required reading RFC 8010. The reply parsed up to a point, but every iteration crashed inside a strlen on a NULL argument, suggesting another mandatory attribute or data field that was not being supplied. The path was abandoned when the logic-bug shortcut surfaced.
Overflowing the ciphertext, not the cleartext
While searching for the right realm length, a shorter input did not overflow the saved EIP directly but did corrupt it through a second buffer. A function downstream of the strcpy runs the realm value through an internal block cipher (8-byte blocks, key in .data, output written via sprintf("%02hhX", b) into a separate stack buffer that is twice the input length). When the input is short enough to bypass the first overflow and long enough to overrun the ciphertext buffer, EIP is controlled, but only via the hex-string output of sprintf.
The cipher was small enough to port to C and run offline, with the seed key recovered from memory:
unsigned int shift_on_key(unsigned int tmp_bloc) {
    unsigned int idx;
    unsigned int s1, s2, s3, s4;

    idx = ((tmp_bloc >> 24) & 0xff) * 4 + 0x048;
    s1  = *((unsigned int *)the_key + idx / sizeof(unsigned int));
    idx = ((tmp_bloc >> 16) & 0xff) * 4 + 0x448;
    s2  = *((unsigned int *)the_key + idx / sizeof(unsigned int));
    idx = ((tmp_bloc >>  8) & 0xff) * 4 + 0x848;
    s3  = *((unsigned int *)the_key + idx / sizeof(unsigned int));
    idx =  (tmp_bloc        & 0xff) * 4 + 0xc48;
    s4  = *((unsigned int *)the_key + idx / sizeof(unsigned int));
    return (((s2 + s1) ^ s3) + s4);
}

/* get_new_key derives (key_part1, key_part2) from the previous key
   through 18 rounds of shift_on_key + xor with the static key blocs. */

int main(int argc, char **argv) {
    char *entry = argv[1];
    int i = 0;
    get_new_key(key, the_key);
    while (entry[i]) {
        for (int b = 0; b < 8; b++) {
            unsigned int kpart = (b < 4) ? key_part1 : key_part2;
            unsigned int sh    = (3 - (b & 3)) * 8;
            if (entry[i]) newbuf[i] = entry[i] ^ ((kpart >> sh) & 0xff);
            i++;
        }
        /* feed back the swapped output bloc as the next "old key" and re-derive */
        ...
        get_new_key(key, the_key);
    }
    /* hex-encode newbuf into realbuf with sprintf("%02hhX", ...) */
}

A search produced an input ending in \xAA\xAA, whose hex encoding is "AAAA", giving EIP = 0x41414141:
./a.out $(python -c 'print "B"*132 + "\x43\x90"')
... 3CCAF8EFDA95CFDA49177C2EAAAA

EIP control via the ciphertext path is real, but the path is dead. sprintf("%02hhX", b) emits two ASCII hex digits per byte, so each EIP byte is constrained to 0x30..0x39 or 0x41..0x46. No address in the loaded modules or on the stack falls in that alphabet, and the shellcode has no leverage to encode arbitrary bytes through the cipher. The longer-payload approach above sidesteps this entirely.
Takeaways

Error and allocation paths are often the most valuable to read carefully. The IppCreateServerRef-returns-NULL branch was a complete bypass of the protocol-handshake gate, and it was visible in the CFG without any dynamic analysis.
Length caps distributed across binaries can be played against each other. The upstream 0x200 cap on the URL is what made the inner 0x100 prefix check trippable.
A failed exploitation path is still worth reproducing far enough to understand why it fails. Recovering the realm-cipher in C produced a clean structural reason to drop the path, rather than a vague “didn’t work”.
On systems without DEP/ASLR, the gap between EIP control and arbitrary code execution is mostly bookkeeping. The harder problem in this CVE was reaching the vulnerable function at all.

Original ELF header	Original program header	New overlapped header
08	01	01
00	00	00
40	00	00
00	00	00
00	07	01
00	00	00
00	00	00
00	00	00
38	08	18
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	08	18
00	00	00
00	40	00
00	00	00
00	00	01
00	00	00
00	00	00
00	00	00
00	00	00
00	40	01
00	00	00
40	00	00
00	00	00
38	00	38
00	00	00
01	B0	01
00	00	00
00	00	00
00	00	00
07	00	00
00	00	00
00	00	00
00	00	00

Original ELF header	Original program header	New overlapped header
08	01	01
00	00	00
40	00	00
00	00	00
00	07	01
00	00	00
00	00	00
00	00	00
38	08	18
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	08	18
00	00	00
00	40	00
00	00	00
00	00	01
00	00	00
00	00	00
00	00	00
00	00	00
00	40	01
00	00	00
40	00	00
00	00	00
38	00	38
00	00	00
01	B0	01
00	00	00
00	00	00
00	00	00
07	00	00
00	00	00
00	00	00
00	00	00

Original ELF header	Original program header	New overlapped header
08	01	01
00	00	00
40	00	00
00	00	00
00	07	01
00	00	00
00	00	00
00	00	00
38	08	18
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	00	00
00	08	18
00	00	00
00	40	00
00	00	00
00	00	01
00	00	00
00	00	00
00	00	00
00	00	00
00	40	01
00	00	00
40	00	00
00	00	00
38	00	38
00	00	00
01	B0	01
00	00	00
00	00	00
00	00	00
07	00	00
00	00	00
00	00	00
00	00	00