CVE-2022-35922: Network Applications with Some Mayhem

Evan Richter

September 22, 2022

Most out-of-the-box fuzzing examples and tutorials walk you through fuzzing a program that accepts input over STDIN, or a file. But what if your application doesn't take input that way, is it still fuzzable? In this post, I will explain the steps I took to fuzz an inherently network-centric target with Mayhem.

The Target

I harnessed the websocket crate, introducing fuzz testing to the almost 8 years old codebase. This crate predates Rust 1.0 by 6 months! So what exactly are websockets?

The WebSocket Protocol enables two-way communication between a client ... to a remote host ... The protocol consists of an opening handshake followed by basic message framing, layered over TCP. -- RFC 6455

I was particularly interested in testing the error handling and robustness of clients and servers using this crate throughout that handshake and message unpacking.

The Harness

Using the excellent cargo-fuzz tool for templating and running local fuzz campaigns, I created a rudimentary harness for a server accepting a connection:

use websocket::server::upgrade::sync::IntoWs;

fuzz_target!(|data: &[u8]| {
let _ = fuzz(data);
});

/// Wrapper over a byte-slice that implements std::io::Read and Write
struct FuzzInput<'a>(&'a [u8]);

/// Read consumes input from the bytes held
impl Read for FuzzInput<'_> {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result {
self.0.read(buf)
}
}

/// Write simply drops any data given to it
impl Write for FuzzInput<'_> {
fn write(&mut self, buf: &[u8]) -> std::io::Result {
Ok(buf.len())
}

fn flush(&mut self) -> std::io::Result<()> {
Ok(())
}
}

fn fuzz(data: &[u8]) -> Option<()> {
let request = FuzzInput(data).into_ws().ok()?;
let mut client = request.use_protocol("rust-websocket").accept().ok()?;
let _ = client.recv_message();
None
}

‍
Full source code for the harnesses can be found here.

The basic idea here is that we wrap the fuzzer provided input into a FuzzInput which mocks a TCP connection. While Mayhem supports fuzzing over actual network IO, better performance and analysis modes are possible when fuzz input is kept in memory.

The .use_protocol method informs our websocket listener to accept a "rust-websocket" value in the Sec-WebSocket-Protocol header. The actual value specified is not very important, except for when building up a corpus which we will get to later.

After .accept()ing the handshake, the harness will then attemt to read a message, which includes handling multiple dataframes.

The actual outcome of these operations is not to be relied upon (since we are about to send it malformed inputs of course!), so any functions that report an error will simply return early. We are hoping that this error handling is sufficient to keep our simulated websocket server running even when faced with terribly formed, or even well-crafted inputs. If the harness panics, crashes, or gets killed by the OS, we will investigate the root cause and determine if our harness code is to blame, or if there is a flaw in the crate.

A harness allowing us to fuzz client code was also written:

use websocket::ClientBuilder;

fuzz_target!(|data: &[u8]| {
let client = ClientBuilder::new("ws://127.0.0.1:2998")
.unwrap()
.add_protocol("rust-websocket")
.connect_on(FuzzInput(data));

if let Ok(mut client) = client {
let _ = client.recv_message();
}
});

Performance of the First Harness

As we can see from the coverage graph, progress flatlines around 60 testcases that have unique coverage:

coverage graph of first run. the coverage increases steadily until about 3 minutes, then stops around 60

Since coverage seemed low, I checked what the generated testcases roughly looked like:

cat corpus/server/* | less

...and as expected, none of the inputs contained anything that looked like a client's handshake, so exploration isn't going deep enough to test message receiving.

Curating a Corpus

Since cargo-fuzz and Mayhem are mutational fuzzers, they build off of example inputs, and any inputs that found new coverage in the target. Fuzzing with an empty corpus did not produce much of anything useful, so next I wanted to provide some example websocket streams. I build the example client and server, and confirmed they exchange messages with each other. Then I created a simple TCP proxy that records each half of a TCP connection to separate files. I modified the example client to connect to my proxy instead of directly to the server, and began recording sessions:

use websocket::client::ClientBuilder;
use websocket::{Message, OwnedMessage};

-const CONNECTION: &'static str = "ws://127.0.0.1:2794";
+const CONNECTION: &'static str = "ws://127.0.0.1:2998";

fn main() {
println!("Connecting to {}", CONNECTION);

let client = ClientBuilder::new(CONNECTION)
.unwrap()
+ .key([0; 16])
.add_protocol("rust-websocket")
.connect_insecure()
.unwrap();

I also set a static key in the example client and the client fuzz harness. This ensures that the recorded handshake response from the server will be correct when we later present it to the client during fuzzing.

Data sent from the client is saved to the client folder, and data from the server to the server folder. This means that files in the client folder should be presented as example inputs to the server fuzz harness, and vice versa. Though in reality, it wouldn't hurt to just add every session as testcases to both harnesses, because of the corpus minimization step.

With those example interactions added to the testcases, we have much better coverage for both harnesses! Here's the coverage graph for the server harness:

better coverage for server. coverage starts at 60, and rises steadily for the whole fuzz run, ending at 229 testcases. Two defects are detected in the middle of the run.

Triaging Crashes

Those two red dots on the coverage graph show detected defects, so I then opened up the crash interface. Mayhem's triage analysis was spot on, and identified the issue as CWE-789: Uncontrolled Memory Allocation, and provided a backtrace.

The root cause in the backtrace was this call to Vec::with_capacity. Pre-allocation is a good strategy to increase performance, but the input should be guarded if an external source can influence or directly provide the input value. header.len is deserialized from the network and can report a length up to u64::MAX, so it definitely should be checked before allocating.

The other defect was the same root cause, but was triggered by a different allocation size. This target was compiled with libfuzzer and ASAN, both of which have allocation size checks, only at different thresholds. A nice thing about working with Mayhem is that it is generally good at grouping crashes, but in this case it had to show one crash that was detected by libfuzzer, and another crash that was detected by ASAN.

The Fix

The fix is present in this commit which checks a threshold before pre-allocating. This is the best mitigation for CWE-789, and the effectiveness will depend on how closely the allocation threshold matches the application-specific usage.

As a side note, Rust's support for fallible allocation here would not be a suitable fix. This is because a crafty attacker can still cause allocation failures by dynamically figuring out what the remote system is capable of allocating by seeing if the connection is dropped or not. Then (perhaps using multiple connections) the server process's allocator will be forced to the brink of running out of memory, until even small allocations would jeopardize aborting the process.

Thanks

I sincerely thank the websockets maintenance team for responding and releasing a fix so quickly. They suggest tungstenite for new websocket applications in Rust, however they clearly value the security of users that haven't switched crates yet.

Links to Advisories

GitHub Security Advisory
RUSTSEC
cargo-audit can report if you are vulnerable to this issue

Try Mayhem yourself on websocket!

Create a free mayhem account, fork the GitHub repo, build the code, and run Mayhem as part of your CI/CD builds.

Share this post