Firmware Fuzzing 101
(with additional contribution from Richard Bae)
Introduction
Embedded applications are some of the most prolific software out there in the world. Whether it be routers, IoT devices or SCADA systems, they are very varied in architecture, use case, and purpose. Very few of these devices have security in mind when they were built. And even fewer of them have ever been fuzzed. These make them the prime target for fuzzing campaigns.
Prerequisites
This is a blog post for advanced users with binary analysis experience. For this post, you will need:
- Mayhem and the Mayhem CLI
- Docker
- Netgear N300 MIPS firmware image
- Binary Ninja (or other disassembler) and a strong knowledge of reverse engineering. We will be looking at MIPS assembly code using Binary Ninja's high level intermediate language (HLIL)
Want to Learn More About Zero-Days?
Catch the FASTR series to see a technical proof of concept on our latest zero-day findings. Episode 2 features uncovering memory defects in cereal.
Watch EP 02 See TV Guide
What's Special about Firmware?
Fuzzing firmware presents a specific set of challenges that are not often present together in other targets. Furthermore, source code is not readily available and therefore harnessing must be performed at the binary level. This requires an increased level of expertise and know-how to deal with efficiently:
- Dependency on specific hardware features present on the physical device
- Non-x86 processor architecture
- Non-glibc C standard library
- Lack of available source code or documentation
In this post, we will cover how to deal with each one of these challenges in the firmware fuzzing context.
Example: Netgear N300 a.k.a. DGN2200v4
For this blog post we will be looking at the Netgear N300 (henceforth referred to as DGN2200v4) router firmware image. This is a good target to look at because while it is a Linux firmware binary, it presents all of the challenges listed above. Specifically, this firmware:
- Relies on specific hardware features for synchronization used by it's programs
- Is a MIPS Linux firmware
- Uses uClibc instead of glibc C standard library
- Has no source code and very few debug symbols available in binaries of interest
This presents quite a challenge for fuzzing but we will cover how to extract, harness, package, and fuzz this firmware.
Environment Setup
For this post, we have set up a docker image containing all the tools and files necessary. Run the following to start the docker container:
Alternatively, you can install the following tools manually:
- QEMU user static (ex: apt-get install -y qemu)
- Binwalk (ex: python -m pip install git+https://github.com/ReFirmLabs/binwalk)
- Jefferson (ex: python -m pip install git+https://github.com/sviehb/jefferson)
- MIPS cross compiler - https://uclibc.org/downloads/binaries/0.9.30.1/cross-compiler-mips.tar.bz2
- Mayhem CLI
- GDB multiarch (ex: apt-get install -y gdb-multiarch)
In addition to the docker or above setup environment, ensure you have access to Binary Ninja or similar disassembler.
Extracting Firmware
Extracting firmware can sometimes be difficult due to custom firmware layouts and encryption. Luckily many firmwares, including this one, are just compressed file systems. This means that off-the-shelf tools such as binwalk can easily extract them. Run the following to extract:
Picking a target
Now that we have extracted the firmware, we need to identify a binary for harnessing. Two good options for router targets are:
- User facing web servers
- Custom internal binaries such as database managers, etc.
We can also find interesting binaries by getting another similar firmware (such as a similar model by another manufacturer) and comparing which binaries are unique to each system with a script. While this can generate some noise (such as two routers using sqlite vs postgres), it helps massively narrow down the amount of binaries to look through and is a good first step.
For this post, we will be looking at DGN2200v4's httpd web server. Web Servers on embedded systems (especially routers) are particularly interesting because they tend to control many functions besides just being a web server including device bring-up, authentication, and process management. For this same reason, they can also be tricky to get running.
First look at httpd
First, let's figure out what kind of binary httpd is:
Based on the output from -strace, we can tell that the reason for the failure might be something to do with the IPC and IOCTL calls related to shared memory operations (shm = shared memory) – a poorly supported feature in QEMU.
The only way we can find out for sure is in a disassembler. Let's open root/usr/sbin/httpd and its dependency root/lib/libnvram.so in Binary Ninja and have a look.
Using high level IL view (HLIL) in main at 0x4130d4, we can see that it does indeed look like we're failing after a failed call to sub_408f78 which in turn calls semget.
Subroutine at 0x408f78
Semget
semget likely makes the failed ipc call (since these semaphores are used for interprocess communication) and we should therefore avoid or fix it. One may be tempted to immediately resort to binary patching but since this binary was actually dynamically linked, we can use LD_PRELOAD instead to influence the behavior of the program without modifying the binary at all.
Another thing we might notice from our disassembler is that this main function does a lot more than just serve http requests. It looks like it brings a lot of the router up as well!
Router bring-up code
This means that we might want to enter the program at a different place besides main. Thankfully, if we can find the HTTP parse code, LD_PRELOAD can also be used to directly harness an internal function.
Harnessing functions with LD_PRELOAD
First, let's focus on directly targeting the parse http request function. Through reverse engineering, we can find that this function is located at 0x408f90. Additionally, through more reverse engineering, we can find that this functions signature looks something like this:
As you can see, this harness is just like normal in almost every way. There are a couple differences:
- Instead of using main, start at the libc main (the function that calls the real main).
- Because of this we need to exit at the end instead of returning (libc main calls exit with what the real main returns).
- Instead of being able to call the function directly, we need to make a function pointer to it and call that.
A couple of notes about this harness:
- We are fuzzing both the request and the connecting address. This will check if there are any special cases or mishandled addresses.
- We print the fuzzed request to stderr and pass stderr to parse_http_req as the output FD. This will allow us to view results visually on the commandline when testing and in Mayhem.
Since LD_PRELOAD works by overriding shared library loads with a provided shared object, we need to compile hook.c to a shared object as well:
Now that we have our LD_PRELOAD harness compiled, let's run it and see what happens! But first we need to make hook.so available in the chroot environment by moving it into the root folder. Additionally, since our harness takes in a file now, let's create a test file. Now we can run httpd as before except we add the LD_PRELOAD environment variable and the test.txt argument.
$ mv hook.so root
$ echo AAAABBBBCCCC > root/test.txt
$ chroot root /qemu-mips-static -E LD_PRELOAD=/hook.so -E LD_LIBRARY_PATH=/lib/public/ /usr/sbin/httpd
Usage: usr/sbin/httpd <fuzz-file>
$ chroot root /qemu-mips-static -E LD_PRELOAD=/hook.so -E LD_LIBRARY_PATH=/lib/public/ /usr/sbin/httpd test.txt
Request: BBBBCCCC
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault
Since the crash happened somewhere in acosNvramConfig_match, overriding this function should fix the crash. Let's give it a shot.
Overriding functions with LD_PRELOAD
With LD_PRELOAD we can override any dynamically linked function. In this case, we found that our program crashes in acosNvramConfig_match. So if we override this function by skipping it or reimplementing it ourselves, we should be able to avoid the crash all together. Let's add an override for the match function to our harness hook.so:
Mayhem Uncovers Defects At Speed, Scale, and Accuracy.
Find out how ForAllSecure can scale your security and development efforts with autonomous fuzz testing.
Request Demo Learn More
Conclusion
In this blog, we covered a generalized concept for analyzing firmware images, handling their weird eccentricities, and enabling them to be fuzzed. While we did not post results here, this is a widely applicable methodology for tackling these types of targets that were traditionally deemed ‘hard’. Given the incredible amount of new embedded devices being produced regularly, this opens up a wide aperture of targets to start analyzing and looking for bugs in places people have not started to look yet. Also if you want more on embedded security, check out this project. And for a nice list of LD_PRELOAD tricks, this project. We wish you well on your bug hunting adventures.
Happy Fuzzing.
Add Mayhem to Your DevSecOps for Free.
Get a full-featured 30 day free trial.