twixes/metrobaza

Database management system for fast similarity search within metric spaces, written in Rust

number of milliseconds B+ tree

Emdrive

Database management system for fast similarity search within metric spaces, written in Rust.

Data types

Name Description Size on disk Value bounds
UINT8 unsigned 8-bit integer 1 byte ≥ 0 and < 2⁸
UINT16 unsigned 16-bit integer 2 bytes ≥ 0 and < 2¹⁶
UINT32 unsigned 32-bit integer 4 bytes ≥ 0 and < 2³²
UINT64 unsigned 64-bit integer 8 bytes ≥ 0 and < 2⁶⁴
UINT128 unsigned 128-bit integer 16 bytes ≥ 0 and < 2¹²⁸
BOOL boolean value 1 byte either TRUE (non-zero) or FALSE (zero)
TIMESTAMP number of milliseconds since Unix epoch, saved in a signed 64-bit integer 8 bytes ≥ 2⁶³ ms before Unix epoch and < 2⁶³ ms after Unix epoch (around 292 million years in either direction)
UUID UUID-like value 16 bytes any sequence of 128 bits
STRING(n) UTF-8 string 2+n bytes n characters, where n ≤ 2048

Emdrive types are non-nullable by default. They can made so simply by wrapping them in NULLABLE(). For instance, a nullable string of maximum length 20 is NULLABLE(STRING(20)).

Indexes

Name Category Description Data types Supported operators
btree general B+ tree all = (equality)
emtree metric EM-tree depending on chosen metric @ (distance)

Metrics

Name Description Column types
hamming Hamming distance UINT*

Story

Let's imagine you're running an image search engine. As a fan of geese you called it Gaggle.
Being a search engine operator, you run a bot which crawls pages on the internet. Every time the bot sees an image, it computes a perceptual hash of it and saves it, along with some other metadata, to an Emdrive instance.

We'll be using database gaggle. A relevant table schema here may be:

Note that column hash is marked with METRIC KEY USING hamming!
While a primary key is B+ tree-based and allows for quick general lookups of rows, it's useless for distance queries. An EM-tree-based metric key does the job very well though. In this case, as we're comparing perceptual hashes in integer form, Hamming distance is the most relevant metric.

Oh, your bot has just seen a new image! Let's register it:

Now, look, a user just uploaded their image to see similar occurences of it from the internet. The search engine calculated that image's hash to be 0b00001011 (binary representation of decimal 11).
Let's check that against Emdrive. We'll be using the @ distance operator, which always returns a number and is exclusively supported for METRIC KEY columns.

It's a match! The image we saved previously has a similar hash, and we can now show it in search results.

url distance
"https://twixes.com/a.png" 3

Data storage

Every table has a data file containing all its, well, data. Such data files are made up of pages.

Launch configuration

The following launch configuration settings are available for Emdrive instances. They are applied on instance launch from environment variables in the format EMDRIVE_${SETTING_NAME_UPPERCASE} (i.e. setting data_directory is set with variable EMDRIVE_DATA_DIRECTORY). If a setting's environment variable is not set, its default value will be used.

Name Type Default value Description
data_directory STRING "/var/lib/emdrive/data" Location of all data, including system tables
http_listen_host STRING "127.0.0.1" Host on which the HTTP server will listen
http_listen_port UINT16 8824 Port on which the HTTP server will listen

Search

SQL

HTTP interface

Benchmarks

Postgres MySQL ClickHouse ⚡️ Emdrive

Autogenerated IDs

Emdrive has no serial or auto-increment data type. For entity IDs, ULID is the recommended solution in Emdrive. It's UUID-like, meaning it fits into the UUID data type, and can be generated with function ULID().

Information - Updated Dec 13, 2021

Stars: 7
Forks: 0
Issues: 0

Repositories & Extras

influxdb provides an asynchronous Rust interface to an InfluxDB database

influxdb provides an asynchronous Rust interface to an Integer 32, sponsored by Stephan Buys of

influxdb provides an asynchronous Rust interface to an InfluxDB database

Rust DataBase Connectivity (RDBC)

Love them or hate them, the JDBC standards have made it easy to use a wide range of desktop and server products with many different...

Rust DataBase Connectivity (RDBC)

Rdb is a relational database implemented in Rust

Unlike databases like PostgreSQL and SQLite, Rdb does not operate on a client-server model

Rdb is a relational database implemented in Rust

Simple Database Implementation written in Rust

This repository uses cargo 1

Simple Database Implementation written in Rust

A graph database written in rust

IndraDB's original design is heavily inspired by homepage

A graph database written in rust

Jin is a small relational database engine written in Rust with the standard library and...

Jin is a small relational database engine written in here or run:

Jin is a small relational database engine written in Rust with the standard library and...

Shappy key-value database created by Rust

Copyright (©) 2021 Sh-Zh-7

Shappy key-value database created by Rust

Mongo Rust Lambda Demo

Provides a demo of an Rust which uses MongoDB database

Mongo Rust Lambda Demo
Facebook Instagram Twitter GitHub Dribbble
Privacy