Know the exact crate versions used to build your Rust executable

Audit binaries for known bugs or security vulnerabilities in production, at scale, with zero bookkeeping

rust-audit

. .

This works by embedding data about the dependency tree in JSON format into a dedicated linker section of the compiled executable.

The implementation has gotten to the point where it's time to get some real-world experience with it, but the data format is not yet stable. Linux, Windows and Mac OS are currently supported.

The end goal is to get Cargo itself to encode this information in binaries instead of relying on an external crate. RFC for a proper implementation in Cargo, for which this project paves the way: https://github.com/rust-lang/rfcs/pull/2801

Demo

Clone this repository:

git clone https://github.com/Shnatsel/rust-audit.git
cd rust-audit

Compile the tooling and a sample binary with dependency tree embedded:

cargo build --release

Recover the dependency tree we've just embedded.

target/release/rust-audit-info target/release/hello-auditable

You can also audit the recovered dependency tree for known vulnerabilities using cargo audit:

(cd auditable-serde && cargo build --release --features "toml" --example json-to-toml)
cargo install cargo-audit
target/release/rust-audit-info target/release/hello-auditable > dependency-tree.json
target/release/examples/json-to-toml dependency-tree.json | cargo audit -f -

The auditable-extract crate allows your own tools to easily consume this info.

How to make your crate auditable

Add the following to your Cargo.toml:

build = "build.rs"

[dependencies]
auditable = "0.1"

[build-dependencies]
auditable-build = "0.1"

Create a build.rs file next to Cargo.toml with the following contents:

fn main() {
    auditable_build::collect_dependency_list();
}

Add the following to the beginning your main.rs (or any other file):

static COMPRESSED_DEPENDENCY_LIST: &[u8] = auditable::inject_dependency_list!();

Put the following in some reachable location in the code, e.g. in fn main():

    // Actually use the data to work around a bug in rustc:
    // https://github.com/rust-lang/rust/issues/47384
    // On nightly you can use `test::black_box` instead of `println!`
    println!("{}", COMPRESSED_DEPENDENCY_LIST[0]);

Now you can cargo build and the dependency data will be embedded in the final binary automatically. You can verify that the data is actually embedded using the extraction steps from the demo.

See the auditable "Hello, world!" project for an example of how it all fits together.

FAQ

Doesn't this bloat my binary?

Not really. A "Hello World" on x86 Linux compiles into a ~1Mb file in the best case (recent Rust without jemalloc, LTO enabled). Its dependency tree even with a couple of dependencies is < 1Kb, that's under 1/1000 of the size. We also compress it with zlib to drive the size down further. Since the size of dependency tree info grows linearly with the number of dependencies, it will keep being negligible.

What about embedded platforms?

Embedded platforms where you cannot spare a byte should not add anything in the executable. Instead they should record the hash of every executable in a database and associate the hash with its Cargo.lock, compiler and LLVM version, build date, etc. This would make for an excellent Cargo wrapper or plugin. Since that can be done in a 5-line shell script, writing that tool is left as an exercise to the reader.

Does this impact reproducible builds?

The data format is designed not to disrupt reproducible builds. It contains no timestamps, and the generated JSON is sorted to make sure it is identical between compilations. If anything, this helps with reproducible builds, since you know all the versions for a given binary now.

Is there any tooling to consume this data?

It is interoperable with existing tooling that consumes Cargo.lock via the JSON-to-TOML convertor. You can also write your own tooling fairly easily - auditable-extract and auditable-serde crates handle all the data extraction and parsing for you. See the docs to get started.

What is the data format, exactly?

It is not yet stabilized, so we do not have extensive docs or a JSON schema. However, these Rust data structures map to JSON one-to-one and are extensively commented. The JSON is Zlib-compressed and placed in a linker section with a name that varies by platform.

Can I read this data using a tool written in a different language?

Yes. The data format is designed for interoperability with alternative implementations. You can also use pre-existing platform-specific tools or libraries for data extraction. E.g. on Linux:

objcopy -O binary --only-section=.rust-deps-v0 target/release/hello-auditable /dev/stdout | pigz -zd -

However, don't run legacy tools on untrusted files. Use the auditable-extract crate or the rust-audit-info command-line tool if possible - they are written in 100% safe Rust, so they will not have such vulnerabilities.

Does this disclose any sensitive information?

TL;DR: The list of enabled features is the only newly disclosed information.

All URLs and file paths are redacted, but the crate names, feature names and versions are recorded as-is. At present panic messages already disclose all this info and more, except feature names. Also, chances are that you're legally obligated have to disclose use of specific open-source crates anyway, since MIT and many other licenses require it.

What about recording the compiler version?

It's already there. Run strings your_executable | grep 'rustc version' to see it. Don't try this on files you didn't compile yourself - strings is overdue for a rewrite in safe Rust.

In theory we could duplicate it in the JSON for ease of access, but this can be added later in a backwards-compatible fashion.

What about keeping track of versions of statically linked C libraries?

Good question. I don't think they are exposed in any reasonable way right now. Would be a great addition, but not required for the initial launch. We can add it later in a backwards-compatible way later.

What is blocking uplifting this into Cargo?

  1. Getting some real-world experience with this before committing to a stable data format
  2. https://github.com/rust-lang/rust/issues/47384

Help on these points would be greatly appreciated.

Issues

Collection of the latest Issues

Shnatsel

Shnatsel

1

Apparently there is a number of formats designed to encode package info already: https://gitbom.dev/glossary/sbom/

We need to check if any of them are suitable for our use case. Notably we redact some field such as git repo URLs, and also include information about enabled features, so it might not be 100% compatible.

Also, the degree of adoption of these formats needs to be understood; perhaps we should provide conversion utilities, even if we don't end up using the format internally.

Shnatsel

Shnatsel

1

Right now cargo auditable just has .unwrap() all over the place. That's fine for a prototype, but we'll need proper error handling to show nice error messages.

We should use anyhow crate for error handling, because that's what Cargo already uses, and it will make upstreaming simpler.

Nemo157

Nemo157

9

I was recently thinking about what it would take to integrate something like this into a cargo install process. The biggest issue I see is that it requires modifying the binary sources to have the data added. I think a potentially more useful approach is a way to inject this data into an arbitrary binary build; maybe via something like a cargo wrapper cargo auditable build.

This would also avoid issues #9, #11 and #13 (but probably introduce others 😀).

Is there a reason to prefer the current approach where each binary needs to be configured to include the data?

Shnatsel

Shnatsel

bug
0

After using the json-to-toml example and feeding the data to cargo-audit, it reports that it has succeeded and found no vulnerabilities. However, in practice the presence of vulnerabilities is not reported.

For example, RUSTSEC-2021-0003 is not reported when the bundled hello-world sample depends on a vulnerable SmallVec verison.

Shnatsel

Shnatsel

good first issue
0

There are multiple examples in the docs that are marked ```rust,ignore because they require other crates that are normally not in the dependency tree. We should investigate whether adding extra dependencies in doctest mode only is possible.

Shnatsel

Shnatsel

fixed by upstreaming
0

auditable crate currently adds ~30 seconds to compilation time due to the dependencies on syn and serde.

Since serde-json is what Cargo itself uses, we have to stick to it for this to be a reasonably faithful implementation of the Cargo RFC. This issue is going to disappear once this is functionality is upstreamed into Cargo.

Information - Updated May 13, 2022

Stars: 136
Forks: 6
Issues: 10

serde-json for no_std programs

MIT license (LICENSE-MIT or

serde-json for no_std programs
JSON

588

JSON parser which picks up values directly without performing tokenization in Rust

This JSON parser is implemented based on an abstract that utilizes in memory indexing and parsing

JSON parser which picks up values directly without performing tokenization in Rust
JSON

2.4K

Serde JSON  

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically

Serde JSON  

SIMD JSON for Rust  

Rust port of extremely fast serde compatibility

SIMD JSON for Rust  

JSON-E Rust data-struct paramter crate for lightweight embedded content with objects and much more

What makes JSON-e unique is that it extensive documentation and ease of use

JSON-E Rust data-struct paramter crate for lightweight embedded content with objects and much more

PoloDB is an embedded JSON-based database

Simple/Lightweight/Easy to learn and use

PoloDB is an embedded JSON-based database

JSON-RPC library designed for async/await in Rust

Designed to be the successor to tracking issue for next stable release (0

JSON-RPC library designed for async/await in Rust
JSON

131

json_typegen - Rust types from JSON samples

json_typegen is a collection of tools for generating types from

json_typegen - Rust types from JSON samples

JSON File Parser

A CLI application that reads from a stream of JSON files, and computes some data-quality metrics

JSON File Parser
Facebook Instagram Twitter GitHub Dribbble
Privacy