Skip to content

Statistics#

A module dedicated to generating samples and computing summary statistics from samples.

Batched Averages#

Vectorized computation of various averages using a consistent interface.

numpy_batched_arithmetic_mean #

numpy_batched_arithmetic_mean(
    array: Float[ndarray, "... axis ..."],
    axis: int = -1,
    *,
    keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]

Computes the arithmetic mean over an axis.

numpy_batched_convex_combination #

numpy_batched_convex_combination(
    array: Float[ndarray, "... axis ..."],
    axis: int = -1,
    *,
    convex_weights: Float[ndarray, " num_samples final_dim"],
    keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]

Computes a weighted mean over an axis.

numpy_batched_harmonic_mean #

numpy_batched_harmonic_mean(
    array: Float[ndarray, "... axis ..."],
    axis: int = -1,
    *,
    keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]

Computes the harmonic mean over an axis.

numpy_batched_geometric_mean #

numpy_batched_geometric_mean(
    array: Float[ndarray, "... axis ..."],
    axis: int = -1,
    *,
    keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]

Computes the weighted mean over an axis.

Dirichlet Distribution#

Optimized sampling of batched samples from independent Dirichlet distributions.

These functions tend to be a performance bottleneck, and should be as optimized as much as possible.

Creates a prior array for a Dirichlet distribution.

Returns:

  • Float[ndarray, ' ...']

    jtyping.Float[np.ndarray, " ..."]: the prior vector

dirichlet_prior #

dirichlet_prior(
    strategy: str | float | int | Float[ArrayLike, " ..."],
    shape: tuple[int, ...],
) -> Float[ndarray, " ..."]

Creates a prior array for a Dirichlet distribution.

Returns:

  • Float[ndarray, ' ...']

    jtyping.Float[np.ndarray, " ..."]: the prior vector

dirichlet_sample #

dirichlet_sample(
    rng: RNG, alphas: Float[ndarray, " ..."], num_samples: int
) -> Float[ndarray, " num_samples ..."]

Generate Dirichlet distributed samples from an array of Gamma distributions.

A Dirichlet distribution can be constructed by dividing a set of Gamma distributions by their sum.

For some reason the Numpy implementation of the Dirichlet distribution is not vectorized, while the implementation of the Gamma distribution. is.

Adapted from StackOverflow.

This function is the performance bottleneck for this package. Need to make sure it's performant.

Parameters:

  • rng (RNG) –

    the random number generator

  • alphas (Float[ndarray, '...']) –

    the Dirichlet parameters

  • num_samples (int) –

    the number of samples to retrieve

Returns:

  • Float[ndarray, ' num_samples ...']

    jtyping.Float[np.ndarray, " num_samples ..."]: samples from the specified Dirichlet distribution

HDI Estimation#

Tries to find the Highest Density Interval (HDI) of a posterior distribution from its samples.

hdi_estimator #

hdi_estimator(
    samples: Float[ndarray, " num_samples"], prob: float
) -> tuple[float | Float[ndarray, ""], float | Float[ndarray, ""]]

Computes the highest density interval (HDI) of an array of samples for a given probability.

Adapted from arviz.

Guaranteed to contain the median if prob > 0.5, and if the distribution is unimodal, also contains the mode.

Parameters:

  • samples (Float[ndarray, ' num_samples']) –

    the array of samples

  • prob (float) –

    the probability

Returns:

  • tuple[float | Float[ndarray, ''], float | Float[ndarray, '']]

    tuple[float, float]: the lower and upper bound of the HDI

Mode Estimation#

Tries to find the mode of a distribution from its samples.

histogram_mode_estimator #

histogram_mode_estimator(
    samples: Float[ndarray, " num_samples"],
    bounds: tuple[float, float] | None = None,
) -> float

"Tries to estimate the mode of a distribution from its samples.

Summary Statistics#

Computes various summary statistics about a distribution from its samples.

PosteriorSummary dataclass #

A container for summary statistics of some probability distribution.

metric_uncertainty property #

metric_uncertainty: float

The metric uncertainty (MU), defined as the size of the HDI.

Returns:

  • float ( float ) –

    the MU

headers property #

headers: list[str]

The column headers.

as_dict #

as_dict() -> dict[str, float | tuple[float, float]]

Returns the dict representation of the statistics.

Useful for coverting to a table.

Returns:

  • dict[str, float | tuple[float, float]]

    dict[str, float | tuple[float, float]]

summarize_posterior #

summarize_posterior(
    posterior_samples: Float[ndarray, " num_samples"], ci_probability: float
) -> PosteriorSummary

Summarizes a distribution, assumed to be a posterior, based on samples from it.

Parameters:

  • posterior_samples (Float[ndarray, ' num_samples']) –

    samples from the posterior.

  • ci_probability (float) –

    the probability under the HDI.

Returns:

Truncated Sampling#

Draws bounded samples from unbounded Scipy distributions.

This is necessary when making parametric assumptions about the distribution of metrics that have minimum and maximum values.

truncated_sample #

truncated_sample(
    sampling_distribution: rv_continuous,
    bounds: tuple[float, float],
    rng: RNG,
    num_samples: int,
) -> Float[ndarray, " num_samples"]

Generates a bounded sample from an unbouded continuous Scipy distribution.

Uses inverse transform sampling to draw samples from the unbounded distribution.

The quantiles sampled uniformly are bounded, such that their transform is also implicitly bounded.

Parameters:

  • sampling_distribution (rv_continuous) –

    the unbouded continuous Scipy distribution

  • bounds (tuple[float, float]) –

    the bounds

  • rng (RNG) –

    the random number generator

  • num_samples (int) –

    the number of samples to draw

Returns:

  • Float[ndarray, ' num_samples']

    jtyping.Float[np.ndarray, " num_samples"]: the samples from the bounded distribution