🎁 cargo-fetcher

Alternative to cargo fetch for use in CI or other "clean" environments that you want to quickly bootstrap with the necessary crates to compile/test etc...

🎁 cargo-fetcher

Alternative to cargo fetch for use in CI or other "clean" environments that you want to quickly bootstrap with the necessary crates to compile/test etc your project(s).

Why?

  • You run many CI jobs inside of a cloud provider such as GCP and you want to quickly fetch cargo registries and crates so that you can spend your compute resources on actually compiling and testing the code, rather than downloading dependencies.

Why not?

  • Other than the fs storage backend, the only supported backends are the 3 major cloud storage backends, as it is generally beneficial to store crate and registry information in the same cloud as you are running your CI jobs to take advantage of locality and I/O throughput.
  • cargo-fetcher should not be used in a typical user environment as it completely disregards various safety mechanisms that are built into cargo, such as file-based locking.
  • cargo-fetcher assumes it is running in an environment with high network throughput and low latency.

Supported Storage Backends

gcs

The gcs feature enables the use of Google Cloud Storage as a backend.

  • Must provide a url to the -u | --url parameter with the gsutil syntax gs://<bucket_name>(/<prefix>)?
  • Must provide GCP service account credentials either with --credentials or via the GOOGLE_APPLICATION_CREDENTIALS environment variable

s3

The s3 feature enables the use of Amazon S3 as a backend.

  • Must provide a url to the -u | --url parameter, it must of the form http(s)?://<bucket>.s3(-<region>).<host>(/<prefix>)?
  • Must provide AWS IAM user via the environment AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY described here.

fs

The fs feature enables use of a folder on a local disk to store crates to and fetch crates from.

  • Must provide a url to the -u | --url parameter with the file: scheme

blob

The blob feature enables the use of Azure Blob storage as a backend.

  • Must provide a url to the -u | --url parameter, it must of the form blob://<container_name>(/<prefix>)?
  • Must provide Azure Storage Account via the environment variables STORAGE_ACCOUNT and STORAGE_MASTER_KEY described here.

Examples

This is an example from our CI for an internal project.

Dependencies

  • 424 crates.io crates: cached - 38MB, unpacked - 214MB
  • 13 crates source from 10 git repositories: db - 27MB, checked out - 38MB

Scenario

The following CI jobs are run in parallel, each in a Kubernetes Job running on GKE. The container base is roughly the same as the official rust:1.39.0-slim image.

  • Build modules for WASM
  • Build modules for native
  • Build host client for native

~ wait for all jobs to finish ~

  • Run the tests for both the WASM and native modules from the host client

Before

All 3 build jobs take around 1m2s each to do cargo fetch --target x86_64-unknown-linux-gnu

After

All 3 build jobs take 3-4s each to do cargo fetcher --include-index mirror followed by 5-7s to do cargo fetch --target x86_64-unknown-linux-gnu.

Usage

cargo-fetcher has only 2 subcommands. Both of them share a set of options, the important inputs for each backend are described in Storage Backends.

In addition to the backend specifics, the only required optional is the path to the Cargo.lock lockfile that you are operating on. cargo-fetcher requires a lockfile, as otherwise the normal cargo work of generating a lockfile requires having a full registry index locally, which partially defeats the point of this tool.

mirror

The mirror subcommand does the work of downloading crates and registry indexes from their original locations and reuploading them to your storage backend.

It does have one additional option however, to determine how often it should take snapshots of the registry index(es).

Custom registries

One wrinkle with mirroring is the presence of custom registries. To handle these, cargo fetcher uses the same logic that cargo uses to locate .cargo/config<.toml> config files to detect custom registries, however, cargo's config files only contain the metadata needed to fetch and publish to the registry, but the url template for where to download crates from is actually present in a config.json file in the root of the registry itself.

Rather than wait for a registry index to be downloaded each time before fetching any crates sourced that registry, cargo-fetcher instead allows you to specify the download location yourself via an environment variable, that way it can fully parallelize the fetching of registry indices and crates.

Example

The environment variable is of the form CARGO_FETCHER_<name>_DL where name is the same name (uppercased) of the registry in the configuration file.

The format of the URL should be the same as the one in your registry's config.json file, if this environment variable is not specified for your registry, the default of /{crate}/{version}/download is just appended to the url of the registry.

sync

The sync subcommand is the actual replacement for cargo fetch, except instead of downloading crates aren registries from their normal location, it downloads them from your storage backend, and splats them to disk in the same way that cargo does, so that cargo won't have to do any actual work before it can start building code.

Contributing

We welcome community contributions to this project.

Please read our Contributor Guide for more information on how to get started.

License

Licensed under either of

  • Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
  • MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Versions

Find the latest versions by id

0.13.0 - May 25, 2022

Added

  • PR#172 added the --timeout | CARGO_FETCHER_TIMEOUT option, allowing control over how long each individual HTTP request is allowed to take. Defaults to 30 seconds, which is the same default timeout as reqwest.

Changed

  • PR#172 split git packages (bare clones and checkouts) and registry packages and downloads them in parallel. In my local tests this reduced overall wall time as typically git packages are an order of magnitude or more larger than a registry package, so splitting them allows the git packages to take up threads and I/O slots earlier, and registry packages can then fill in the remaining capacity. In addition, the git bare clone and checkout for each crate are now downloaded in parallel, as previously the checkout download would wait until the bare clone was downloaded before doing the disk splat, but this was wasteful.
  • PR#172 updated dependencies.

0.12.1 - Feb 28, 2022

Added

  • PR#171 added EC2 credential sourcing from IMDS for the s3 backend, allowing for easier configuration when running in AWS. Thanks @jelmansouri!

0.12.0 - Feb 03, 2022

Changed

  • PR#168 updated all dependencies.
  • PR#168 removed all usage of async/await in favor of blocking HTTP requests and rayon parallelization. This seems to have resulted in noticeable speed ups depending on the size of your workload.
  • PR#168 replaced usage of structopt with clap.
  • PR#168 removed all usage of the unmaintained chrono with time.
  • PR#168 temporarily vendored bloblock for Azure blob storage to reduce duplicate dependencies.

0.11.0 - Jul 22, 2021

Changed

  • PR#161 replaced the bloated auto-generated crates for rusoto with much leaner rusty-s3 crate. Thanks @m0ssc0de!
  • PR#166 replaced the bloated auto-generated crates for the azure SDK with the much leaner bloblock crate. Thanks @m0ssc0de!

0.10.0 - Dec 14, 2020

Added

  • PR#131 and PR#151 added support for registries other than crates.io, resolving #118. Thanks @m0ssc0de!
  • PR#152 added support for creating .cache entries when mirroring/syncing registry indices, resolving #16 and #117.
  • PR#154 added support for mirroring and syncing git submodules, which was the final missing piece for having "perfect" copying of cargo's behavior when fetching crates and registries, resolving #141.

0.9.0 - Jul 28, 2020

Added

  • PR#109 added support for Azure Blob storage, under the blob feature flag. Thanks @m0ssc0de!

0.8.0 - Jun 05, 2020

Added

0.7.0 - Feb 25, 2020

Added

  • Cargo's v2 Cargo.lock format is now supported

Changed

  • Async (almost) all the things!
  • Replaced log/env_logger with tracing

0.6.1 - Nov 14, 2019

Fixed

  • Fetch registry index instead of pull

0.6.0 - Nov 14, 2019

Added

  • Added support for S3 storage behind the s3 feature
  • Integration tests using s3 via minio are now run in CI
  • Git dependencies are now checked out to the git/checkouts folder
  • Git dependencies now also recursively download submodules

Changed

  • Updated dependencies
  • Place all GCS specific code/dependencies behind a gcs feature
  • The url for the storage location is now supplied via -u | --url

Fixed

  • Replaced failure with anyhow
  • Fixed issue where all crates were synced every time due to pruning and removing duplicates only to then completely ignore them and use the original crate list :facepalm:
  • Fixed issue where crates.io packages were being unpacked with an extra parent directory

0.5.1 - Jul 27, 2019

0.5.0 - Jul 26, 2019

0.4.1 - Jul 25, 2019

0.4.0 - Jul 25, 2019

0.3.0 - Jul 25, 2019

0.2.0 - Jul 24, 2019

0.1.1 - Jul 23, 2019

Information - Updated Jun 22, 2022

Stars: 61
Forks: 7
Issues: 0

This is an example of a Rust server that functions as a remote schema for...

Rust + Hasura Rust server that functions as a Hasura

This is an example of a Rust server that functions as a remote schema for...

Newport Engine is a modular 2D and 3D game engine built in Rust for Rust

It is designed to be easily extendable and easy to use

Newport Engine is a modular 2D and 3D game engine built in Rust for Rust

Newport Engine is a modular 2D and 3D game engine built in Rust for Rust

It is designed to be easily extendable and easy to use

Newport Engine is a modular 2D and 3D game engine built in Rust for Rust

liboqs-rust: Rust bindings for liboqs

Qyantum Safe liboqs rust bindings

liboqs-rust: Rust bindings for liboqs

msgflo-rust: Rust participant support for MsgFlo

Flowhub visual programming IDE

msgflo-rust: Rust participant support for MsgFlo

Trojan-rust is a rust implementation for Trojan protocol that is targeted to circumvent GFW

Trojan protocol that is targeted to circumvent tokio-rs to achieve high performance async io

Trojan-rust is a rust implementation for Trojan protocol that is targeted to circumvent GFW
Actix

1.0K

How to be a full stack Rust Developer

Read Rust the Rust blog posts at Steadylearner

How to be a full stack Rust Developer

Rust library translation (rust-src/rust-std/stdlib/rustlib translation)

This is the place to translate Having a documentation in your native language is essential if you don't speak English, and still enjoyable even if...

Rust library translation (rust-src/rust-std/stdlib/rustlib translation)

False Positive for rust-lang/rust#83583

The deprecation lint proc_macro_derive_resolution_fallback is intended to catch proc macro generated code that refers to items from parent modules that should not be in scope:

False Positive for rust-lang/rust#83583

A CHIP-8 &amp; SuperChip interpreter written in Rust using rust-sdl2

If you're getting compile errors it may be because

A CHIP-8 &amp; SuperChip interpreter written in Rust using rust-sdl2

Rust-Svelte-on-Rust

Starter template for Rocket backend server

Rust-Svelte-on-Rust
Facebook Instagram Twitter GitHub Dribbble
Privacy