Skip to content

Experiment Aggregators#

SingletonAggregator #

Bases: ExperimentAggregator

An aggregation to apply to an ExperimentGroup that needs no aggregation.

For example, the ExperimentGroup only contains one Experiment.

Essentially just the identity function:

\[f(x)=x\]

BetaAggregator #

Bases: ExperimentAggregator

Samples from the beta-conflated distribution.

Specifically, the aggregate distribution \(\text{Beta}(\tilde{\alpha}, \tilde{\beta})\) is estimated as:

\[\begin{aligned} \tilde{\alpha}&=\left[\sum_{i=1}^{M}\alpha_{i}\right]-\left(M-1\right) \\ \tilde{\beta}&=\left[\sum_{i=1}^{M}\beta_{i}\right]-\left(M-1\right) \end{aligned}\]

where \(M\) is the total number of experiments.

Uses scipy.stats.beta class to fit beta-distributions.

  • the individual experiment distributions are beta distributed
  • the metrics are bounded, although the range need not be (0, 1)
Read more:
  1. Hill, T. P. (2008). Conflations Of Probability Distributions: An Optimal Method For Consolidating Data From Different Experiments.
  2. Hill, T. P., & Miller, J. (2011). How to combine independent data sets for the same quantity.
  3. 'Beta distribution' on Wikipedia

Parameters:

  • estimation_method (str, default: 'mle' ) –

    method for estimating the parameters of the individual experiment distributions. Options are 'mle' for maximum-likelihood estimation, or 'mome' for the method of moments estimator. MLE tends be more efficient but is difficult to estimate

GammaAggregator #

Bases: ExperimentAggregator

Samples from the Gamma-conflated distribution.

Specifically, the aggregate distribution \(\\text{Gamma}(\\tilde{\\alpha}, \\tilde{\\beta})\) (\(\\alpha\) is the shape, \(\\beta\) the rate parameter) is estimated as:

\[\\begin{aligned} \\tilde{\\alpha}&=\\left[\\sum_{i}^{M}\\alpha_{i}\\right]-(M-1) \\\\ \\tilde{\\beta}&=\\dfrac{1}{\\sum_{i}^{M}\\beta_{i}^{-1}} \\end{aligned}\]

where \(M\) is the total number of experiments.

An optional shifted: bool argument exists to dynamically estimate the support for the distribution. Can help fit to individual experiments, but likely minimally impacts the aggregate distribution.

  • the individual experiment distributions are gamma distributed
Read more:
  1. Hill, T. (2008). Conflations Of Probability Distributions: An Optimal Method For Consolidating Data From Different Experiments.
  2. Hill, T., & Miller, J. (2011). How to combine independent data sets for the same quantity.
  3. 'Gamma distribution' on Wikipedia

FEGaussianAggregator #

Bases: ExperimentAggregator

Samples from the Gaussian-conflated distribution.

This is equivalent to the fixed-effects meta-analytical estimator.

Uses the inverse variance weighted mean and standard errors. Specifically, the aggregate distribution \(\\mathcal{N}(\\tilde{\\mu}, \\tilde{\\sigma})\) is estimated as:

\[\\begin{aligned} w_{i}&=\\dfrac{\\sigma_{i}^{-2}}{\\sum_{j}^{M}\\sigma_{j}^{-2}} \\\\ \\tilde{\\mu}&=\\sum_{i}^{M}w_{i}\\mu_{i} \\\\ \\tilde{\\sigma^2}&=\\dfrac{1}{\\sum_{i}^{M}\\sigma_{i}^{-2}} \\end{aligned}\]

where \(M\) is the total number of experiments.

  • the individual experiment distributions are normally (Gaussian) distributed
  • there is no inter-experiment heterogeneity present
Read more:
  1. Hill, T. (2008). Conflations Of Probability Distributions: An Optimal Method For Consolidating Data From Different Experiments.
  2. Hill, T., & Miller, J. (2011). How to combine independent data sets for the same quantity.
  3. Higgins, J., & Thomas, J. (Eds.). (2023). Cochrane handbook for systematic reviews of interventions.
  4. Borenstein et al. (2021). Introduction to meta-analysis.
  5. 'Meta-analysis' on Wikipedia

REGaussianAggregator #

Bases: ExperimentAggregator

Samples from the Random Effects Meta-Analytical Estimator.

First uses the standard the inverse variance weighted mean and standard errors as model parameters, before debiasing the weights to incorporate inter-experiment heterogeneity. As a result, studies with larger standard errors will be upweighted relative to the fixed-effects model.

Specifically, starting with a Fixed-Effects model \(\\mathcal{N}(\\tilde{\\mu_{\\text{FE}}}, \\tilde{\\sigma_{\\text{FE}}})\),

\[\\begin{aligned} w_{i}&=\\dfrac{\\left(\\sigma_{i}^2+\\tau^2\\right)^{-1}}{\\sum_{j}^{M}\\left(\\sigma_{j}^2+\\tau^2\\right)^{-1}} \\\\ \\tilde{\\mu}&=\\sum_{i}^{M}w_{i}\\mu_{i} \\\\ \\tilde{\\sigma^2}&=\\dfrac{1}{\\sum_{i}^{M}\\sigma_{i}^{-2}} \\end{aligned}\]

where \(\\tau\) is the estimated inter-experiment heterogeneity, and \(M\) is the total number of experiments.

Uses the Paule-Mandel iterative heterogeneity estimator, which does not make a parametric assumption. The more common (but biased) DerSimonian-Laird estimator can also be used by setting paule_mandel_heterogeneity: bool = False.

If hksj_sampling_distribution: bool = True, the aggregated distribution is a more conservative \(t\)-distribution, with degrees of freedom equal to \(M-1\). This is especially more conservative when there are only a few experiments available, and can substantially increase the aggregated distribution's variance.

  • the individual experiment distributions are normally (Gaussian) distributed
  • there is inter-experiment heterogeneity present
Read more:
  1. Higgins, J., & Thomas, J. (Eds.). (2023). Cochrane handbook for systematic reviews of interventions.
  2. Borenstein et al. (2021). Introduction to meta-analysis.
  3. 'Meta-analysis' on Wikipedia
  4. IntHout, J., Ioannidis, J. P., & Borm, G. F. (2014). The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method.
  5. Langan et al. (2019). A comparison of heterogeneity variance estimators in simulated random‐effects meta‐analyses.

Parameters:

  • paule_mandel_heterogeneity (bool, default: True ) –

    whether to use the Paule-Mandel method for estimating inter-experiment heterogeneity, or fallback to the DerSimonian-Laird estimator. Defaults to True.

  • hksj_sampling_distribution (bool, default: False ) –

    whether to use the Hartung-Knapp-Sidik-Jonkman corrected \(t\)-distribition as the aggregate sampling distribution. Defaults to False.

HistogramAggregator #

Bases: ExperimentAggregator

Samples from a histogram approximate conflation distribution.

First bins all individual experiment groups, and then computes the product of the probability masses across individual experiments.

Unlike other methods, this does not make a parametric assumption. However, the resulting distribution can 'look' unnatural, and requires overlapping supports within the sample. If any experiment assigns 0 probability mass to any bin, the conflated bin will also contain 0 probability mass.

As such, inter-experiment heterogeneity can be a significant problem.

Uses numpy.histogram_bin_edges to estimate the number of bin edges needed per experiment, and takes the smallest across all experiments for the aggregate distribution.

  • the individual experiment distributions' supports overlap
Read more:
  1. Hill, T. (2008). Conflations Of Probability Distributions: An Optimal Method For Consolidating Data From Different Experiments.
  2. Hill, T., & Miller, J. (2011). How to combine independent data sets for the same quantity.