dimitarvp/json-log-histogram-rust

This is a Rust command line tool that calculates a histogram of the separate types...

This is a Rust command line tool that calculates a histogram of the separate types of JSON records in an input JSON log file (one...

Introduction

This is a Rust command line tool that calculates a histogram of the separate types of JSON records in an input JSON log file (one JSON object per line).

A sample input file would be:

{"type":"B","foo":"bar","items":["one","two"]}
{"type": "A","foo": 4.0  }
{"type": "B","bar": "abcd"}

The output histogram would report a count of 2 for type B and 1 for type A. It would also report total of 73 bytes for type B and 26 for type A.

How to compile and use

Git clone:

git clone https://github.com/dimitarvp/json-log-histogram-rust.git
cd json-log-histogram-rust

Compile:

RUSTFLAGS="-C target-cpu=native" cargo build --release

To test, you can use one of the files in the inputs/ folder. They have been added to the repository through Git LFS so you have to fetch and checkout them first, and then decompress:

# Install LFS in the current repo and download the files.
git lfs install
git lfs fetch
git lfs checkout

# Decompress.
gunzip -v inputs/*.gz

# Run the tool with one of: 1MB.json, or 10MB.json, or 100MB.json, or 1000MB.json
./target/release/jlh -f inputs/1000MB.json

The tool prints an aligned text table and a total runtime at the bottom.

Benchmarks

CPU File size Time in seconds
Xeon W-2150B @ 3.00GHz 1MB 0.11091947
Xeon W-2150B @ 3.00GHz 10MB 0.62043929
Xeon W-2150B @ 3.00GHz 100MB 0.643637170
Xeon W-2150B @ 3.00GHz 1000MB 5.175781744
i7-4870HQ @ 2.50GHz 1MB 0.07234297
i7-4870HQ @ 2.50GHz 10MB 0.68889124
i7-4870HQ @ 2.50GHz 100MB 0.670027735
i7-4870HQ @ 2.50GHz 1000MB 6.659739416
i3-3217U @ 1.80GHz 1MB 0.14369994
i3-3217U @ 1.80GHz 10MB 0.49248859
i3-3217U @ 1.80GHz 100MB 0.535957719
i3-3217U @ 1.80GHz 1000MB 3.773678079

Implementation details and notes

  • Using Rust 1.43.1.
  • Using the rayon crate for transparent parallelization of the histogram calculation.
  • Using the clap crate to parse the command line options (only one, which is the input JSON log file).
  • Using the prettytable-rs crate to produce a pretty command line table with the results.
  • Using serde_json to read each JSON record to a struct.
  • Skipped the ability to pipe files to the tool so it can read from stdin. The motivation was that rayon does not provide its .par_bridge function to polymorphic Box<dyn BufRead> objects (which is the common denominator of std::io::stdin().lock() and std::fs::File.open(path)). I could have probably made it work but after 2 hours of attempts I realized that it might take a long time so I cut it short.
  • Used the .lines() function on the BufReader even though that allocates a new String per line. I am aware of the better BufReader.read_line idiom with a single String buffer (which is cleared after every line is consumed) and my initial non-parallel version even used it -- see this commit. But I couldn't find a quick way to translate this idiom to simply having something with the .lines() function (rayon expects an Iterator). I could have implemented Iterator for a wrapping struct or enum but, same as above, I was not sure if it will not take me very long. IMO even with that caveat the tool is very fast (see performance results table below).
  • The commit history got slightly botched because I had to use bfg to remove the 1MB / 10MB / 100MB / 1000MB JSON files that I added earlier (which I replaced with gzipped variants later).

Information - Updated Nov 24, 2021

Stars: 0
Forks: 0
Issues: 0

Tensorflow rust bindings with loads of easy to use documentation

bindings for documentation can be found in github or on our docs page

Tensorflow rust bindings with loads of easy to use documentation

Wasm template for Rust hosting without npm-deploy on github pages using Travis script

It automatically hosts your wasm projects on gh-pages using a travis script on the latest commit

Wasm template for Rust hosting without npm-deploy on github pages using Travis script

Roctogen: a rust client library for the GitHub v3 API

This client API is generated from the Isahc HTTP client

Roctogen: a rust client library for the GitHub v3 API

Rust crate for logging to LogDNA

Bug reports and pull requests are welcome on GitHub at

Rust crate for logging to LogDNA

A rust github template for ease of use

Install the rust toolchain in order to have cargo installed by following

A rust github template for ease of use

Rust-generated WebAssembly GitHub action template

A template to bootstrap the creation of a Rust-generated WebAssembly GitHub action

Rust-generated WebAssembly GitHub action template

Template for Rust lib/bin module with built-in GitHub Action to build and test

You will want to change the lib name and bin name in Cargo

Template for Rust lib/bin module with built-in GitHub Action to build and test

cargo_auto_github_lib

Library for cargo-auto automation tasks written in rust language with functions for github

cargo_auto_github_lib

A Rust library to connect to AxonServer

See the GitHub project Command / Query Responsibility Segregation (CQRS)

A Rust library to connect to AxonServer

Rust client implementation for Brasil API

Rust client implementation for BrasilAPI GitHub Project

Rust client implementation for Brasil API
Facebook Instagram Twitter GitHub Dribbble
Privacy