quickwit-oss/search-benchmark-game

Welcome to Search Benchmark, the Game!

This repository is standardized benchmark for comparing the speed of various

aspects of search engine technologies.

The results are available here.

This benchmark is both

  • for users to make it easy for users to compare different libraries
  • for library developers to identify optimization opportunities by comparing their implementation to other implementations.

Currently, the benchmark only includes Lucene and tantivy. It is reasonably simple to add another engine.

You are free to communicate about the results of this benchmark in a reasonable manner. For instance, twisting this benchmark in marketing material to claim that your search engine is 31x faster than Lucene, because your product was 31x on one of the test is not tolerated. If this happens, the benchmark will publicly host a wall of shame. Bullshit claims about performance are a plague in the database world.

The benchmark

Different search engine implementation are benched over different real-life tests. The corpus used is the English wikipedia. Stemming is disabled. Queries have been derived from the AOL query dataset (but do not contain any personal information).

Out of a random sample of query, we filtered queries that had at least two terms and yield at least 1 hit when searches as a phrase query.

For each of these query, we then run them as :

  • intersection
  • unions
  • phrase queries

with the following collection options :

  • COUNT only count documents, no need to score them
  • TOP 10 : Identify the 10 documents with the best BM25 score.
  • TOP 10 + COUNT: Identify the 10 documents with the best BM25 score, and count the matching documents.

We also reintroduced artificially a couple of term queries with different term frequencies.

All tests are run once in order to make sure that

  • all of the data is loaded and in page cache
  • Java's JIT already kicked in.

Test are run in a single thread. Out of 5 runs, we only retain the best score, so Garbage Collection likely does not matter.

Engine specific detail

Lucene

  • Query cache is disabled.
  • GC should not influence the results as we pick the best out of 5 runs.
  • JVM used was openjdk 10.0.1 2018-04-17

Tantivy

  • Tantivy returns slightly more results because its tokenizer handles apostrophes differently.
  • Tantivy and Lucene both use BM25 and should return almost identical scores.

Reproducing

These instructions will get you a copy of the project up and running on your local machine.

Prerequisites

The lucene benchmarks requires java and Gradle. This can be installed from the Gradle website. The tantivy benchmarks and benchmark driver code requires Cargo. This can be installed using rustup.

Installing

Clone this repo.

git clone git@github.com:tantivy-search/search-benchmark-game.git

Running

Checkout the Makefile for all available commands. You can adjust the ENGINES parameter for a different set of engines.

Run make corpus to download and unzip the corpus used in the benchmark.

make corpus

Run make index to create the indices for the engines.

make index

Run make bench to build the different project and run the benches. This command may take more than 30mn.

make bench

The results are outputted in a results.json file.

You can then check your results out by running:

make serve

And open the following in your browser: http://localhost:8000/

Adding another search engine

See CONTRIBUTE.md.

Issues

Collection of the latest Issues

fulmicoton

fulmicoton

Comment Icon0

Capture relevant hardware information (cpu type, Cache sizes, ...) and display it in the benchmark.

amallia

amallia

Comment Icon1

We need to use a bigger collection. Some options are Gov2, ClueWeb and CC-NEWS

amallia

amallia

Comment Icon2

Please add some correctness tests. One simple idea to do so would be to compute Kendall's Tau between Lucene and the other engines.

@JMMackenzie @elshize any other ideas?

amallia

amallia

enhancement
Comment Icon1

It would be nice display the engines as link, so if you click on it you go to the engine's homepage

petr-tik

petr-tik

Comment Icon3

Continuous benchmarking

Add a CI-like job to run the benchmark automatically.

It will help developers, potential users and tantivy-curious people to track performance numbers continuously. Automating also means less stress and hassle for the maintainers/developers of tantivy.

Granularity

We can choose to either run a benchmark on every commit or on every release.

On every commit

Integrate benchmarking suite into CI on the main tantivy repo. Using travisCI's after_success build stage, run the benchmark, append results to results.json on search-benchmark repo.

Pros:

Commit-specific perf numbers - easier to triage perf regressions. Will create a more detailed picture of the hot path for the future. Automated - don't have to fiddle, re-run benchmarking locally.

Costs/cons:

Too much noise - some commits are WIP or harm perf for the sake of a refactor. Is it really necessary to keep that data? Makes every CI job run longer. Benchmarking should be done on a dedicated machine to guarantee similar conditions. CI jobs runs inside uncontrolled layers of abstraction (docker inside VM, inside VM). To control the environment and keep it automated, we would need to dedicate a VPS instance. This is an expense, potential security vulnerability and needs administration.

On every release

Same as above, only use git-tags to tell if this commit has a new release.

Pros:

Fewer runs - cheaper on HW, doesn't slow builds down. Releases are usually semantically important points in history, where we are interested in perf.

Cons/costs:

Still needs dedicated HW to run consistently. Needs push access to tantivy-benchmark repo.

Presentation

Showing data from every commit might be unnecessarily overwhelming. The current benchmark front-end is clean (imho) and makes it easy to compare results across queries and versions.

On the front-end, we can show 0.6, 0.7, 0.8, 0.9 and latest commit or release.

Power-users or admins can be given the choice to massively extend the table to every commit.

Implementation

A VPS that watches the tantivy main repo, builds a benchmark and commits new results at a decided frequency.

Thoughts?

usamec

usamec

Comment Icon0

Schemas are outdated (in dropbox json there are fields like body, title and url).

lambdaupb

lambdaupb

good first issue
Comment Icon3

Index size is a very significant metric.

The BuildIndex code of each engine should do any applicable compaction at the end, then the index size is measured.

Information - Updated Sep 17, 2022

Stars: 47
Forks: 27
Issues: 19

Roctogen: a rust client library for the GitHub v3 API

This client API is generated from the Isahc HTTP client

Roctogen: a rust client library for the GitHub v3 API

A rust github template for ease of use

Install the rust toolchain in order to have cargo installed by following

A rust github template for ease of use

📓 Relnotes: Automatic GitHub Release Notes

Tera templates for release notes format

📓 Relnotes: Automatic GitHub Release Notes

Rust-generated WebAssembly GitHub action template

A template to bootstrap the creation of a Rust-generated WebAssembly GitHub action

Rust-generated WebAssembly GitHub action template

cargo_auto_github_lib

Library for cargo-auto automation tasks written in rust language with functions for github

cargo_auto_github_lib

Renote is a CLI to extend GitHub operation experience, which is a complementary tool to...

Renote is a CLI to extend GitHub operation experience, which is a complementary tool to use with gh advanced search options

Renote is a CLI to extend GitHub operation experience, which is a complementary tool to...

Droid is a GitHub based package manager for unix systems

The goal is to create a way

Droid is a GitHub based package manager for unix systems

Droid is a GitHub based package manager for unix systems

The goal is to create a way

Droid is a GitHub based package manager for unix systems

git clone [email protected]

でコンテナに入ってからやると良いです

git clone git@github
Facebook Instagram Twitter GitHub Dribbble
Privacy