Statistics#

A module dedicated to generating samples and computing summary statistics from samples.

Batched Averages#

Vectorized computation of various averages using a consistent interface.

numpy_batched_arithmetic_mean #

numpy_batched_arithmetic_mean(
    array: Float[ndarray, "... axis ..."],
    axis: int = -1,
    *,
    keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]

Computes the arithmetic mean over an axis.

numpy_batched_convex_combination #

numpy_batched_convex_combination(
    array: Float[ndarray, "... axis ..."],
    axis: int = -1,
    *,
    convex_weights: Float[ndarray, " num_samples final_dim"],
    keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]

Computes a weighted mean over an axis.

numpy_batched_harmonic_mean #

numpy_batched_harmonic_mean(
    array: Float[ndarray, "... axis ..."],
    axis: int = -1,
    *,
    keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]

Computes the harmonic mean over an axis.

numpy_batched_geometric_mean #

numpy_batched_geometric_mean(
    array: Float[ndarray, "... axis ..."],
    axis: int = -1,
    *,
    keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]

Computes the weighted mean over an axis.

Dirichlet Distribution#

Optimized sampling of batched samples from independent Dirichlet distributions.

These functions tend to be a performance bottleneck, and should be as optimized as much as possible.

Creates a prior array for a Dirichlet distribution.

Returns:

Float[ndarray, ' ...'] –

jtyping.Float[np.ndarray, " ..."]: the prior vector

dirichlet_prior #

dirichlet_prior(
    strategy: str | float | int | Float[ArrayLike, " ..."],
    shape: tuple[int, ...],
) -> Float[ndarray, " ..."]

Creates a prior array for a Dirichlet distribution.

Returns:

Float[ndarray, ' ...'] –

jtyping.Float[np.ndarray, " ..."]: the prior vector

dirichlet_sample #

dirichlet_sample(
    rng: RNG, alphas: Float[ndarray, " ..."], num_samples: int
) -> Float[ndarray, " num_samples ..."]

Generate Dirichlet distributed samples from an array of Gamma distributions.

A Dirichlet distribution can be constructed by dividing a set of Gamma distributions by their sum.

For some reason the Numpy implementation of the Dirichlet distribution is not vectorized, while the implementation of the Gamma distribution. is.

Adapted from StackOverflow.

This function is the performance bottleneck for this package. Need to make sure it's performant.

Parameters:

rng (RNG) –

the random number generator
alphas (Float[ndarray, '...']) –

the Dirichlet parameters
num_samples (int) –

the number of samples to retrieve

Returns:

Float[ndarray, ' num_samples ...'] –

jtyping.Float[np.ndarray, " num_samples ..."]: samples from the specified Dirichlet distribution

HDI Estimation#

Tries to find the Highest Density Interval (HDI) of a posterior distribution from its samples.

hdi_estimator #

hdi_estimator(
    samples: Float[ndarray, " num_samples"], prob: float
) -> tuple[float | Float[ndarray, ""], float | Float[ndarray, ""]]

Computes the highest density interval (HDI) of an array of samples for a given probability.

Adapted from arviz.

Guaranteed to contain the median if prob > 0.5, and if the distribution is unimodal, also contains the mode.

Parameters:

samples (Float[ndarray, ' num_samples']) –

the array of samples
prob (float) –

the probability

Returns:

tuple[float | Float[ndarray, ''], float | Float[ndarray, '']] –

tuple[float, float]: the lower and upper bound of the HDI

Mode Estimation#

Tries to find the mode of a distribution from its samples.

histogram_mode_estimator #

histogram_mode_estimator(
    samples: Float[ndarray, " num_samples"],
    bounds: tuple[float, float] | None = None,
) -> float

"Tries to estimate the mode of a distribution from its samples.

Summary Statistics#

Computes various summary statistics about a distribution from its samples.

PosteriorSummary `dataclass` #

A container for summary statistics of some probability distribution.

metric_uncertainty `property` #

metric_uncertainty: float

The metric uncertainty (MU), defined as the size of the HDI.

Returns:

float ( float ) –

the MU

headers `property` #

headers: list[str]

The column headers.

as_dict #

as_dict() -> dict[str, float | tuple[float, float]]

Returns the dict representation of the statistics.

Useful for coverting to a table.

Returns:

dict[str, float | tuple[float, float]] –

dict[str, float | tuple[float, float]]

summarize_posterior #

summarize_posterior(
    posterior_samples: Float[ndarray, " num_samples"], ci_probability: float
) -> PosteriorSummary

Summarizes a distribution, assumed to be a posterior, based on samples from it.

Parameters:

posterior_samples (Float[ndarray, ' num_samples']) –

samples from the posterior.
ci_probability (float) –

the probability under the HDI.

Returns:

PosteriorSummary ( PosteriorSummary ) –

the summary statistics.

Truncated Sampling#

Draws bounded samples from unbounded Scipy distributions.

This is necessary when making parametric assumptions about the distribution of metrics that have minimum and maximum values.

truncated_sample #

truncated_sample(
    sampling_distribution: rv_continuous,
    bounds: tuple[float, float],
    rng: RNG,
    num_samples: int,
) -> Float[ndarray, " num_samples"]

Generates a bounded sample from an unbouded continuous Scipy distribution.

Uses inverse transform sampling to draw samples from the unbounded distribution.

The quantiles sampled uniformly are bounded, such that their transform is also implicitly bounded.

Parameters:

sampling_distribution (rv_continuous) –

the unbouded continuous Scipy distribution
bounds (tuple[float, float]) –

the bounds
rng (RNG) –

the random number generator
num_samples (int) –

the number of samples to draw

Returns:

Float[ndarray, ' num_samples'] –

jtyping.Float[np.ndarray, " num_samples"]: the samples from the bounded distribution

Statistics#

Batched Averages#

numpy_batched_arithmetic_mean #

numpy_batched_convex_combination #

numpy_batched_harmonic_mean #

numpy_batched_geometric_mean #

Dirichlet Distribution#

dirichlet_prior #

dirichlet_sample #

HDI Estimation#

hdi_estimator #

Mode Estimation#

histogram_mode_estimator #

Summary Statistics#

PosteriorSummary dataclass #

metric_uncertainty property #

headers property #

as_dict #

summarize_posterior #

Truncated Sampling#

truncated_sample #

PosteriorSummary `dataclass` #

metric_uncertainty `property` #

headers `property` #