English | Français | 日本語 | 한국어 | Русский

Hora search everywhere is a crate for using nearest neighbor search algos in many languages

English | Français | 日本語 | 한국어 | Русский | 中文

Hora

[Homepage] [Document] [Examples]

Hora Search Everywhere!

Hora is an approximate nearest neighbor search algorithm (wiki) library. We implement all code in Rust🦀 for reliability, high level abstraction and high speeds comparable to C++.

Hora, 「ほら」 in Japanese, sounds like [hōlə], and means Wow, You see! or Look at that!. The name is inspired by a famous Japanese song 「小さな恋のうた」.

Demos

👩 Face-Match [online demo], have a try!

🍷 Dream wine comments search [online demo], have a try!

Features

  • Performant ⚡️

    • SIMD-Accelerated (packed_simd)
    • Stable algorithm implementation
    • Multiple threads design
  • Supports Multiple Languages ☄️

    • Python
    • Javascript
    • Java
    • Go (WIP)
    • Ruby (WIP)
    • Swift (WIP)
    • R (WIP)
    • Julia (WIP)
    • Can also be used as a service
  • Supports Multiple Indexes 🚀

    • Hierarchical Navigable Small World Graph Index (HNSWIndex) (details)
    • Satellite System Graph (SSGIndex) (details)
    • Product Quantization Inverted File(PQIVFIndex) (details)
    • Random Projection Tree(RPTIndex) (LSH, WIP)
    • BruteForce (BruteForceIndex) (naive implementation with SIMD)
  • Portable 💼

    • Supports WebAssembly
    • Supports Windows, Linux and OS X
    • Supports IOS and Android (WIP)
    • Supports no_std (WIP, partial)
    • No heavy dependencies, such as BLAS
  • Reliability 🔒

    • Rust compiler secures all code
    • Memory managed by Rust for all language libraries such as Python's
    • Broad testing coverage
  • Supports Multiple Distances 🧮

    • Dot Product Distance
    • Euclidean Distance
    • Manhattan Distance
    • Cosine Similarity
  • Productive

    • Well documented
    • Elegant, simple and easy to learn API

Installation

Rust

in Cargo.toml

Python

Javascript (WebAssembly)

Building from source

Benchmarks

by aws t2.medium (CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz) more information

Examples

Rust example [more info]

thank @vaaaaanquish for this complete pure Rust 🦀 image search example, For more information about this example, you can click Pure Rust な近似最近傍探索ライブラリ hora を用いた画像検索を実装する

Python example [more info]

JavaScript example [more info]

Java example [more info]

Roadmap

  • Full test coverage
  • Implement EFANNA algorithm to achieve faster KNN graph building
  • Swift support and iOS/macOS deployment example
  • Support R
  • support mmap

Related Projects and Comparison

  • Faiss, Annoy, ScaNN:

    • Hora's implementation is strongly inspired by these libraries.
    • Faiss focuses more on the GPU scenerio, and Hora is lighter than Faiss (no heavy dependencies).
    • Hora expects to support more languages, and everything related to performance will be implemented by Rust🦀.
    • Annoy only supports the LSH (Random Projection) algorithm.
    • ScaNN and Faiss are less user-friendly, (e.g. lack of documentation).
    • Hora is ALL IN RUST 🦀.
  • Milvus, Vald, Jina AI

    • Milvus and Vald also support multiple languages, but serve as a service instead of a library
    • Milvus is built upon some libraries such as Faiss, while Hora is a library with all the algorithms implemented itself

Contribute

We appreciate your participation!

We are glad to have you participate, any contributions are welcome, including documentations and tests. You can create a Pull Request or Issue on GitHub, and we will review it as soon as possible.

We use GitHub issues for tracking suggestions and bugs.

Clone the repo

Build

Test

Try the changes

License

The entire repository is licensed under the Apache License.

Issues

Collection of the latest Issues

rob-p

rob-p

0

Thanks for the very nice library! I'm interested in using hora for doing nearest neighbor finding in single-cell genomics. The data of interest consist of very high dimensional points (D = 30,000), but for most points, most dimensions have value 0. Therefore, I'd like to avoid (it's not really feasible) to densify the elements before indexing them. Is there some way to provide a custom implementation of the relevant distance metrics for the indexed type such that I don't have to actually insert a dense representation of the points into the index?

generall

generall

1

This line assumes, that dot function returns a simple dot production and negates it to make it work like a distance (smaller = closer) https://github.com/hora-search/hora/blob/a6759f8ae950a10d833735b9bca82d80435e60aa/src/core/metrics.rs#L56

But in reality, dot function is already negated in the SIMDOptmized inplementation: https://github.com/hora-search/hora/blob/ce10e42bacbf29163fcb9a973e1a3f9dc528effb/src/core/simd_metrics.rs#L32

Which leads to double inverting and incorrect use of Dot Production

vchugreev

vchugreev

0

Hello! Thanks for your wonderful library. Please tell me if supported extensible index. Can I adding elements to an already built index?

StanBright

StanBright

2

Hey guys,

This isn't an issue. You can close or delete it whenever you wish. I just wanted to share a fun fact.

I discovered "hora" at the top of a "most trending projects" lists and it attracted my attention as "hora" ("хора" in Cyrillic) means "people". I thought, is it possible that someone will name his trending lib after a Bulgarian word? Maybe there was a hidden meaning... I opened and scrolled the repo - what can I see - a list of faces/people. I was right. Someone named the library after a Bulgarian word... only to realise a few moments later that it's named after a Japanese word.

What a funny collision.

Have a great week!

jumpinvestor

jumpinvestor

question
1

For the Node version, is it possible to run comparisons on arrays of objects?

For example, for using it with map data, the query would be a tuple of a map point coordinates: [latitude, longitude].

The material to compare against would be an array of objects similar to:

[ { objectId: 123, someOtherObjectData: string, coordinates: [latitude, longitude] }, { next object }, { next object} ... ]

If so, would it be possible to provide a few pointers or a basic example?

Thanks much!

vaaaaanquish

vaaaaanquish

bug
7

Hi.

With the latest hora, I get a panic when using hora::core::metrics::Metric::CosineSimilarity.

I think this is an error caused by partial_cmp. partial_cmp expect it to be None when the given value cannot be ordered.

apiszcz

apiszcz

enhancement
6

I'm not sure if this is my problem, (possible). I am testing the index types all work, (SSG had extensive index times > 24 hours so I aborted that). IVFPQIndex

I am doing this with all index types, no issues except this case.

FYI: print(i,d) 15746524 [248, 225, 188, 223, 199, 174, 144, 146]

Error:

apiszcz

apiszcz

enhancement
1

Other ANN libraries return the index array of the closest vectors and optionally the distance metric in a second array. Is that possible or in future plans? Thanks.

apiszcz

apiszcz

enhancement
2

The current index add method adds a single vector. Is it possible to add 2D numpy array of vectors and use the array row position as the index.

apiszcz

apiszcz

enhancement
1

version does is not available.

https://www.python.org/dev/peps/pep-0396/

import horapy

dir(horapy) ['BruteForceIndex', 'HNSWIndex', 'HoraANNIndex', 'HoraBruteForceIndexStr', 'HoraBruteForceIndexUsize', 'HoraHNSWIndexStr', 'HoraHNSWIndexUsize', 'HoraIVFPQIndexStr', 'HoraIVFPQIndexUsize', 'HoraPQIndexStr', 'HoraPQIndexUsize', 'HoraSSGIndexStr', 'HoraSSGIndexUsize', 'IVFPQIndex', 'PQIndex', 'SSGIndex', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'horapy', 'numpy'] horapy.version Traceback (most recent call last): File "", line 1, in AttributeError: module 'horapy' has no attribute 'version'

vadixidav

vadixidav

enhancement
2

Hey, I just saw this on Reddit and I am very excited to try this out on my computer vision datasets.

I just created an ANN search data structure somewhat recently called HGG (https://github.com/rust-cv/hgg), and I would love to be able to add it to your collection of nearest neighbor searches (and especially over at https://github.com/hora-search/ann-benchmarks). It is based on HNSW, and I am currently using it for computer vision purposes. Let me know if you need some help integrating it.

Information - Updated May 13, 2022

Stars: 2.2K
Forks: 46
Issues: 18

Repositories & Extras

Tantivy is a full text search engine library written in Rust

It is closer to Elasticsearch or benchmark break downs

Tantivy is a full text search engine library written in Rust

A Full-Text Search Engine in Rust

Toshi will always target stable Rust and will try our best to never make any use of unsafe Rust

A Full-Text Search Engine in Rust

Rust lang bookmarking tool

Rust and Rocket used bookmarking tool for search bar

Rust lang bookmarking tool

Ternary search tree collection in rust with similar API to std::collections as it possible

Ternary search tree is a type of trie (sometimes called a prefix tree) where nodes are arranged in a manner similar to a binary search...

Ternary search tree collection in rust with similar API to std::collections as it possible

Sonic-channel is a rust client for the sonic search backend

Quick and easy way to get started with search in rust

Sonic-channel is a rust client for the sonic search backend

amber is a code search and replace tool written by Rust

amber is a code search and replace tool written by ack,

amber is a code search and replace tool written by Rust

txtai: AI-powered search engine for Rust

Overview of the functionality provided by txtai

txtai: AI-powered search engine for Rust

recon_metadata, book details and metadata search library written in Rust using

recon_metadata, book details and metadata search library written in reqwest

recon_metadata, book details and metadata search library written in Rust using

Roogle is a Rust API search engine, which allows you to search functions by names...

Roogle is a Rust API search engine, which allows you to search functions by names and type signatures

Roogle is a Rust API search engine, which allows you to search functions by names...

Tantivy is a full text search engine library written in Rust

It is closer to Elasticsearch or benchmark break downs

Tantivy is a full text search engine library written in Rust

Non-official rust library to search Nyaa

si does not provide any APIs so I thought it would be cool to have a way to do so in Rust and that's why...

Non-official rust library to search Nyaa
Facebook Instagram Twitter GitHub Dribbble
Privacy