Statistics#
A module dedicated to generating samples and computing summary statistics from samples.
Batched Averages#
Vectorized computation of various averages using a consistent interface.
numpy_batched_arithmetic_mean
#
numpy_batched_arithmetic_mean(
array: Float[ndarray, "... axis ..."],
axis: int = -1,
*,
keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]
Computes the arithmetic mean over an axis.
numpy_batched_convex_combination
#
numpy_batched_convex_combination(
array: Float[ndarray, "... axis ..."],
axis: int = -1,
*,
convex_weights: Float[ndarray, " num_samples final_dim"],
keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]
Computes a weighted mean over an axis.
numpy_batched_harmonic_mean
#
numpy_batched_harmonic_mean(
array: Float[ndarray, "... axis ..."],
axis: int = -1,
*,
keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]
Computes the harmonic mean over an axis.
numpy_batched_geometric_mean
#
numpy_batched_geometric_mean(
array: Float[ndarray, "... axis ..."],
axis: int = -1,
*,
keepdims: bool = True,
) -> Float[ndarray, "... 1 ..."]
Computes the weighted mean over an axis.
Dirichlet Distribution#
Optimized sampling of batched samples from independent Dirichlet distributions.
These functions tend to be a performance bottleneck, and should be as optimized as much as possible.
Creates a prior array for a Dirichlet distribution.
Returns:
-
Float[ndarray, ' ...']
–jtyping.Float[np.ndarray, " ..."]: the prior vector
dirichlet_prior
#
dirichlet_prior(
strategy: str | float | int | Float[ArrayLike, " ..."],
shape: tuple[int, ...],
) -> Float[ndarray, " ..."]
Creates a prior array for a Dirichlet distribution.
Returns:
-
Float[ndarray, ' ...']
–jtyping.Float[np.ndarray, " ..."]: the prior vector
dirichlet_sample
#
dirichlet_sample(
rng: RNG, alphas: Float[ndarray, " ..."], num_samples: int
) -> Float[ndarray, " num_samples ..."]
Generate Dirichlet distributed samples from an array of Gamma distributions.
A Dirichlet distribution can be constructed by dividing a set of Gamma distributions by their sum.
For some reason the Numpy implementation of the Dirichlet distribution is not vectorized, while the implementation of the Gamma distribution. is.
Adapted from StackOverflow.
This function is the performance bottleneck for this package. Need to make sure it's performant.
Parameters:
-
rng
(RNG
) –the random number generator
-
alphas
(Float[ndarray, '...']
) –the Dirichlet parameters
-
num_samples
(int
) –the number of samples to retrieve
Returns:
-
Float[ndarray, ' num_samples ...']
–jtyping.Float[np.ndarray, " num_samples ..."]: samples from the specified Dirichlet distribution
HDI Estimation#
Tries to find the Highest Density Interval (HDI) of a posterior distribution from its samples.
hdi_estimator
#
hdi_estimator(
samples: Float[ndarray, " num_samples"], prob: float
) -> tuple[float | Float[ndarray, ""], float | Float[ndarray, ""]]
Computes the highest density interval (HDI) of an array of samples for a given probability.
Adapted from arviz.
Guaranteed to contain the median if prob > 0.5
, and if the distribution is unimodal, also
contains the mode.
Parameters:
-
samples
(Float[ndarray, ' num_samples']
) –the array of samples
-
prob
(float
) –the probability
Returns:
-
tuple[float | Float[ndarray, ''], float | Float[ndarray, '']]
–tuple[float, float]: the lower and upper bound of the HDI
Mode Estimation#
Tries to find the mode of a distribution from its samples.
histogram_mode_estimator
#
histogram_mode_estimator(
samples: Float[ndarray, " num_samples"],
bounds: tuple[float, float] | None = None,
) -> float
"Tries to estimate the mode of a distribution from its samples.
Summary Statistics#
Computes various summary statistics about a distribution from its samples.
PosteriorSummary
dataclass
#
A container for summary statistics of some probability distribution.
metric_uncertainty
property
#
The metric uncertainty (MU), defined as the size of the HDI.
Returns:
-
float
(float
) –the MU
as_dict
#
Returns the dict representation of the statistics.
Useful for coverting to a table.
Returns:
-
dict[str, float | tuple[float, float]]
–dict[str, float | tuple[float, float]]
summarize_posterior
#
summarize_posterior(
posterior_samples: Float[ndarray, " num_samples"], ci_probability: float
) -> PosteriorSummary
Summarizes a distribution, assumed to be a posterior, based on samples from it.
Parameters:
-
posterior_samples
(Float[ndarray, ' num_samples']
) –samples from the posterior.
-
ci_probability
(float
) –the probability under the HDI.
Returns:
-
PosteriorSummary
(PosteriorSummary
) –the summary statistics.
Truncated Sampling#
Draws bounded samples from unbounded Scipy distributions.
This is necessary when making parametric assumptions about the distribution of metrics that have minimum and maximum values.
truncated_sample
#
truncated_sample(
sampling_distribution: rv_continuous,
bounds: tuple[float, float],
rng: RNG,
num_samples: int,
) -> Float[ndarray, " num_samples"]
Generates a bounded sample from an unbouded continuous Scipy distribution.
Uses inverse transform sampling to draw samples from the unbounded distribution.
The quantiles sampled uniformly are bounded, such that their transform is also implicitly bounded.
Parameters:
-
sampling_distribution
(rv_continuous
) –the unbouded continuous Scipy distribution
-
bounds
(tuple[float, float]
) –the bounds
-
rng
(RNG
) –the random number generator
-
num_samples
(int
) –the number of samples to draw
Returns:
-
Float[ndarray, ' num_samples']
–jtyping.Float[np.ndarray, " num_samples"]: the samples from the bounded distribution