Stats-rs provides easy use of statistical data in a Rust scientific computing environment

A work in progress port of the C# Math.net lib

statrs

.15.0

Should work for both nightly and stable Rust.

NOTE: While I will try to maintain backwards compatibility as much as possible, since this is still a 0.x.x project the API is not considered stable and thus subject to possible breaking changes up until v1.0.0

Description

Statrs provides a host of statistical utilities for Rust scientific computing. Included are a number of common distributions that can be sampled (i.e. Normal, Exponential, Student's T, Gamma, Uniform, etc.) plus common statistical functions like the gamma function, beta function, and error function.

This library is a work-in-progress port of the statistical capabilities in the C# Math.NET library. All unit tests in the library borrowed from Math.NET when possible and filled-in when not.

This library is a work-in-progress and not complete. Planned for future releases are continued implementations of distributions as well as porting over more statistical utilities

Please check out the documentation here

Usage

Add the most recent release to your Cargo.toml

[dependencies]
statrs = "0.15"

Examples

Statrs comes with a number of commonly used distributions including Normal, Gamma, Student's T, Exponential, Weibull, etc. The common use case is to set up the distributions and sample from them which depends on the Rand crate for random number generation

use statrs::distribution::Exp;
use rand::distributions::Distribution;

let mut r = rand::rngs::OsRng;
let n = Exp::new(0.5).unwrap();
print!("{}", n.sample(&mut r);

Statrs also comes with a number of useful utility traits for more detailed introspection of distributions

use statrs::distribution::{Exp, Continuous, ContinuousCDF};
use statrs::statistics::Distribution;

let n = Exp::new(1.0).unwrap();
assert_eq!(n.mean(), Some(1.0));
assert_eq!(n.variance(), Some(1.0));
assert_eq!(n.entropy(), Some(1.0));
assert_eq!(n.skewness(), Some(2.0));
assert_eq!(n.cdf(1.0), 0.6321205588285576784045);
assert_eq!(n.pdf(1.0), 0.3678794411714423215955);

as well as utility functions including erf, gamma, ln_gamma, beta, etc.

use statrs::statistics::Distribution;
use statrs::distribution::FisherSnedecor;

let n = FisherSnedecor::new(1.0, 1.0).unwrap();
assert!(n.variance().is_none());

Contributing

Want to contribute? Check out some of the issues marked help wanted

How to contribute

Clone the repo:

git clone https://github.com/boxtown/statrs

Create a feature branch:

git checkout -b <feature_branch> master

After commiting your code:

git push -u origin <feature_branch>

Then submit a PR, preferably referencing the relevant issue.

Style

This repo makes use of rustfmt with the configuration specified in rustfmt.toml. See https://github.com/rust-lang-nursery/rustfmt for instructions on installation and usage and run the formatter using rustfmt --write-mode overwrite *.rs in the src directory before committing.

Commit messages

Please be explicit and and purposeful with commit messages.

Bad

Modify test code

Good

test: Update statrs::distribution::Normal test_cdf
Issues

Collection of the latest Issues

h4x3rotab

h4x3rotab

1

It would be nice if we can add no_std support for at least the parts don't need rng. It's a good idea to define a rng trait and allow the user to pass it to the library.

jonathanstrong

jonathanstrong

0

gdb output:

I think my input slice to quantile had NaN values in it. from experimenting, I was able to trigger similar crashes (on v0.12 codebase) if I put a bunch of NaN values into the slice.

relevant convo about use of unsafe: https://github.com/statrs-dev/statrs/pull/109

on a local codebase, I swapped in the safer replacement code that was made in the pull request linked above, and was able to get this panic output:

which corresponds to

I am using v0.12 (in a bunch of places actually) because I have code that uses the old api. would you be open to a pull request to swap in the safer version of select_inplace and issue another v0.12 release?

(here is the test that triggered the above btw)

Bi-Modal

Bi-Modal

1

Hello everyone,

I believe I've spotted a bug with the skewness for a beta distribution.

In distribution::beta the skewness function returns incorrect values when shape_a or shape_b are infinite. Currently this function gives -2.0 for infinite shape_a and 2.0 for infinite shape_b but these are not the correct limits for the skewness of a beta distribution.

The skewness of a beta distribution with shape parameters α, β is: [ 2(β - α)/(α + β + 2) ] * sqrt( (α + β + 1) / (αβ) )

The limit of this as α tends to infinity is 2/sqrt(β). And the limit of this as β tends to infinity is -2/sqrt(α). I've attached a short proof of these. beta_skewness_limits.pdf

If I have access I'll make a PR to fix this.

references: https://en.wikipedia.org/wiki/Beta_distribution#Skewness https://mathworld.wolfram.com/BetaDistribution.html

troublescooter

troublescooter

1

statrs would better match the Rust ecosystem if it would dual-licence. This issue tracks this.

This probably necessitates contacting all previous collaborators, and is probably sufficient.

boxtown

boxtown

help wanted
3

See https://github.com/boxtown/statrs/commit/6b2c726aa924b4187565a381690f7d345c16ca4c#diff-8f4094da35079696cbcdd4a269333ad0R101 for a little more context.

We're currently allowing the negative binomial distribution to accept positive real r values as it seems to be a common use case. However, this causes the CDF to behave continuously rather than like in other discrete distributions which breaks our internal check_discrete_distribution test.

I'm not familiar enough with this particular distribution to provide much guidance but there are a couple options:

  1. simply treat the distribution as continuous, allowing real value inputs for the "probability mass functions"
  2. Create a special case test for negative binomial (not sure how often this case will arise in practice)
  3. Create a special case distribution type

Option 1 seems a vastly superior choice but possibly semantically wrong (although also possibly to the point of pedantry)

FelixBenning

FelixBenning

4

There seems to be a list of traits in the trait.rs file, which are grouped at least to some degree in the Statistics trait. This causes some strange things like: The Statistics trait is implemented for IntoIterator, and thus implements a mean, but Mean is not implemented for IntoIterator. If both were implemented this might actually cause problems, when calling the methods. Since the compiler won't know which one to call.

I think having individual traits for all these functions is probably the right idea, since that allows things to implement some but not all of them and they are not interdependent. If you really wanted to group those traits you could always create a supertrait which inherits from all the other traits. But I currently can not think of a reason why this would be necessary.

saona-raimundo

saona-raimundo

2

Hi there!!

Thank you very much for your great work!!

I am using Rust for numerical computations and was wondering about the best place to take special functions from.

I came across the module statrs::function and saw also the puruspe crate, which is used in the peroxide crate.

So, my question is: Would it not be better to join efforts and have and have the "best" source for special functions in one place?

I sent a similar message to the peroxide author. Hope this turns out great! :D

ghost

ghost

20

In some places, like the Gamma function, some effort is made to deal with degenerate parameters like infinity. I think this is too much effort for a minute use case, as I don't think users will insert these values to query the library to see what degenerate distributions look like.

Having to deal with the special cases reduces clarity and complicates the API considerably, as it forces to contend with mathematical details difficult to convey in a numerical library. As the limit α -> inf is approached, the pointwise limit of the pdf and the cdf is zero, so these aren't pdfs and cdfs anymore. As the limit β -> inf is approached, the continuous distribution degenerates to a discrete one: P(X = 0) = 1.

I propose to remove all special cases and document why they aren't handled. This would deal with #57 and #98.

fcannini

fcannini

1

Hi there!

It would be really helpful to have something like this, so people interested in contributing (like me ;) ) can have an idea where to focus efforts.

Cheers.

dhardy

dhardy

1

@boxtown should we make it a policy to document the PDF of all distributions, where possible?

I think the GSL documentation is a good model for what documentation of distributions should be.

Specifically, as noted in https://github.com/rust-random/rand/pull/621, your implementation of the Geometric distribution does not specify which variant is implemented, aside from the adjective "shifted" in the constructor. I recommend something more like this.

akonneker

akonneker

7

Do you have any plans to implement the distributions in the title? I'm working on implementing some things from Simon Prince's Computer Vision, and those are the only two of the common conjugate-pair distributions he uses that aren't already implemented in statrs, as far as I can tell.

rohitjoshi

rohitjoshi

10

Some of the distributions are slower compared to Randomkit. Randomkit is a FFI wrapper around numpy's randomstat implementation.

e.g . See the below benchmark. Any thoughts?

statrs:

randomkit:

Versions

Find the latest versions by id

Information - Updated Jun 22, 2022

Stars: 350
Forks: 50
Issues: 32

Repositories & Extras

Rust library for Self Organising Maps (SOM)

Add rusticsom as a dependency in Cargo

Rust library for Self Organising Maps (SOM)

Rust library for parsing configuration files

The 'option' can be any string with no whitespace

Rust library for parsing configuration files

Rust library for the Pimoroni Four Letter pHAT

This library aims to port ht16k33 (or rather a fork, as of right now) so credit goes to ht16k33-diet

Rust library for the Pimoroni Four Letter pHAT

Rust library for emulating 32-bit RISC-V

This library can execute instructions against any memory and register file that implements

Rust library for emulating 32-bit RISC-V

Rust library for connecting to the IPFS HTTP API using Hyper/Actix

You can use actix-web as a backend instead of hyper

Rust library for connecting to the IPFS HTTP API using Hyper/Actix

Rust library to manipulate file system access control lists (ACL) on macOS, Linux, and FreeBSD

This module provides two high level functions, getfacl and setfacl

Rust library to manipulate file system access control lists (ACL) on macOS, Linux, and FreeBSD

Rust library translation (rust-src/rust-std/stdlib/rustlib translation)

This is the place to translate Having a documentation in your native language is essential if you don't speak English, and still enjoyable even if...

Rust library translation (rust-src/rust-std/stdlib/rustlib translation)

Rust library for using Infrared hardware decoders (For example a Vishay TSOP* decoder),

enabling remote control support for embedded project

Rust library for using Infrared hardware decoders (For example a Vishay TSOP* decoder),

Rust library for interaction with the OriginTrail Decentralized Knowledge Graph

open up an issue on this repository and let us know

Rust library for interaction with the OriginTrail Decentralized Knowledge Graph

Rust library for parsing COLLADA files

Notice: This library is built around files exported from Blender 2

Rust library for parsing COLLADA files

Rust library for low-level abstraction of MIPS32 processors

This project is licensed under the terms of the MIT license

Rust library for low-level abstraction of MIPS32 processors
Facebook Instagram Twitter GitHub Dribbble
Privacy