Study#

`Study` #

Bases: Config

This class represents a study, a collection of related experiments and experiment groups.

It handles all lower level operations for you.

You can use it to organize your experiments, compute and cache metrics, request analyses or figures, etc.

Experiment groups should be directly comparable across groups.

For example, a series of different models evaluated on the same dataset.

Parameters:

seed (int, default: None ) –

the random seed used to initialise the RNG. Defaults to the current time, in fractional seconds.
num_samples (int, default: None ) –

the number of syntehtic confusion matrices to sample. A higher value is better, but more computationally expensive. Defaults to 10000, the minimum recommended value.
ci_probability (float, default: None ) –

the size of the credibility intervals to compute. Defaults to 0.95, which is an arbitrary value, and should be carefully considered.
experiments (dict[str, dict[str, dict[str, Any]]], default: {} ) –

a nested dict that contains (1) the experiment group name, (2) the experiment name, (3) and finally any IO/prior hyperparameters. Defaults to an empty dict.
metrics (dict[str, dict[str, Any]], default: {} ) –

a nested dict that contains (1) the metric as metric syntax strings, (2) and any metric aggregation parameters. Defaults to an empty dict.

Attributes#

`num_classes` `property` #

Returns the number of classes used in experiments in this study.

`num_experiment_groups` `property` #

Returns the number of ExperimentGroups in this Study.

`num_experiments` `property` #

Returns the total number of Experiments in this Study.

Configuration#

`to_dict` #

Returns the configuration of this Study as a Pythonic dict.

Returns:

dict[str, Any] –

dict[str, typing.Any]: the configuration dict, necessary to recreate this Study

`from_dict` `classmethod` #

Creates a Study from a dictionary.

Keys and values should match pattern of output from Study.to_dict.

Parameters:

config_dict (dict[str, Any]) –

the dictionary representation of the study configuration.
kwargs –

any additional keyword arguments typically passed to Study's .__init__ method

Returns:

Self –

typing.Self: an instance of a study

`add_experiment` #

Adds an experiment to this study.

Parameters:

experiment_name (str) –

the name of the experiment and experiment group. Should be written as 'experiment_group/experiment'. If the experiment group name is omitted, the experiment gets added to a new experiment group of the same name.
confusion_matrix (Int[ArrayLike, 'num_classes num_classes']) –

the confusion matrix for this experiment
prevalence_prior (str | float | Float[ArrayLike, ' num_classes'], default: None ) –

the prior over the prevalence counts for this experiments. Defaults to 0, Haldane's prior.
confusion_prior (str | float | Float[ArrayLike, ' num_classes num_classes'], default: None ) –

the prior over the confusion counts for this experiments. Defaults to 0, Haldane's prior.
**io_kwargs –

any additional keyword arguments that are needed for confusion matrix I/O

Examples:

Add an experiment named 'test_a' to experiment group 'test'

>>> self.add_experiment(
...     name="test/test_a",
...     confusion_matrix=[[1, 0], [0, 1]],
... )

Add an experiment named 'test_a' to experiment group 'test', with some specific prior.

>>> self.add_experiment(
...     name="test/test_a",
...     confusion_matrix=[[1, 0], [0, 1]],
...     prevalence_prior=[1, 1],
...     confusion_prior="half",
... )

`getitem` #

Gets an ExperimentGroup or Experiment by name.

Parameters:

key (str) –

the name of the ExperimentGroup or the Experiment. Experiment names must be in the '{EXPERIMENT_GROUP}/{EXPERIMENT}' format

Returns:

Experiment | ExperimentGroup –

Experiment | ExperimentGroup: description

`add_metric` #

Adds a metric to the study.

If there are more than one Experiment in an ExperimentGroup, an aggregation method is required.

Parameters:

metric (str | MetricLike) –

the metric to be added
aggregation (str, default: None ) –

the name of the aggregation method. Defaults to None.
aggregation_kwargs –

keyword arguments passed to the get_experiment_aggregator function

Estimating Uncertainty#

`get_metric_samples` #

Loads or computes samples for a metric, belonging to an experiment.

Parameters:

metric (str | MetricLike) –

the name of the metric
experiment_name (str) –

the name of the experiment. You can also pass 'experiment_group/aggregated' to retrieve the aggregated metric values.
sampling_method (str) –

the sampling method used to generate the metric values. Must a member of the SamplingMethod enum

Returns:

ExperimentResult | ExperimentAggregationResult –

typing.Union[ExperimentResult, ExperimentAggregationResult]

Examples:

Get the accuracy scores for experiment 'test/test_a' for synthetic confusion matrices sampled from the posterior predictive distribution.

>>> experiment_result = self.get_metric_samples(
...     metric="accuracy",
...     sampling_method="posterior",
...     experiment_name="test/test_a",
... )
ExperimentResult(experiment=ExperimentGroup(test_a), metric=Metric(accuracy))

Similarly, get the accuracy scores, but now aggregated across an entire ExperimentGroup

>>> experiment_result = self.get_metric_samples(
...     metric="accuracy",
...     sampling_method="posterior",
...     experiment_name="test/aggregated",
... )
ExperimentAggregationResult(
    experiment_group=ExperimentGroup(test),
    metric=Metric(accuracy),
    aggregator=ExperimentAggregator(fe_gaussian)
    )

`report_metric_summaries` #

Generates a table with summary statistics for all experiments.

Parameters:

metric (str) –

the name of the metric
class_label (Optional[int], default: None ) –

the class label. Leave 0 or None if using a multiclass metric. Defaults to None.

Other Parameters:

table_fmt (str) –

the format of the table. If 'records', the raw list of values is returned. If 'pandas' or 'pd', a Pandas DataFrame is returned. In all other cases, it is passed to tabulate. Defaults to tabulate's "html".
precision (int) –

the required precision of the presented numbers. Defaults to 4.

Returns:

str ( list | DataFrame | str ) –

the table as a string

Examples:

Return the a table with summary statistics of the metric distribution

>>> print(
...     study.report_metric_summaries(
...         metric="acc", class_label=0, table_fmt="github"
...     )
... )

| Group   | Experiment   |   Observed |   Median |   Mode |        95.0% HDI |     MU |   Skew |    Kurt |
|---------|--------------|------------|----------|--------|------------------|--------|--------|---------|
| GROUP   | EXPERIMENT   |     0.5000 |   0.4999 | 0.4921 | [0.1863, 0.8227] | 0.6365 | 0.0011 | -0.5304 |

`report_random_metric_summaries` #

Provides a table with metric results from a simulated random classifier.

Parameters:

metric (str) –

the name of the metric
class_label (Optional[int], default: None ) –

the class label. Leave 0 or None if using a multiclass metric. Defaults to None.

Other Parameters:

table_fmt (str) –

the format of the table, passed to tabulate. Defaults to "html".
precision (int) –

the required precision of the presented numbers. Defaults to 4.

Returns:

str ( list | DataFrame | str ) –

the table as a string

Examples:

Return the a table with summary statistics of the metric distribution

>>> print(
...     study.report_random_metric_summaries(
...         metric="acc", class_label=0, table_fmt="github"
...     )
... )

| Group   | Experiment   |   Median |   Mode |        95.0% HDI |     MU |    Skew |    Kurt |
|---------|--------------|----------|--------|------------------|--------|---------|---------|
| GROUP   | EXPERIMENT   |   0.4994 | 0.5454 | [0.1778, 0.8126] | 0.6348 | -0.0130 | -0.5623 |

`plot_metric_summaries` #

Plots the distrbution of sampled metric values for a metric and class combination.

Parameters:

metric (str) –

the name of the metric
class_label (int | None, default: None ) –

the class label. Defaults to None.

Other Parameters:

method (str) –

the method for displaying a histogram, provided by Seaborn. Can be either a histogram or KDE. Defaults to "kde".
bandwidth (float) –

the bandwith parameter for the KDE. Corresponds to Seaborn's bw_adjust parameter. Defaults to 1.0.
bins (int | list[int] | str) –

the number of bins to use in the histrogram. Corresponds to numpy's bins parameter. Defaults to "auto".
normalize (bool) –

if normalized, each distribution will be scaled to [0, 1]. Otherwise, uses a shared y-axis. Defaults to False.
figsize (tuple[float, float]) –

the figure size, in inches. Corresponds to matplotlib's figsize parameter. Defaults to None, in which case a decent default value will be approximated.
fontsize (float) –

fontsize for the experiment name labels. Defaults to 9.
axis_fontsize (float) –

fontsize for the x-axis ticklabels. Defaults to None, in which case the fontsize will be used.
edge_colour (str) –

the colour of the histogram or KDE edge. Corresponds to matplotlib's color parameter. Defaults to "black".
area_colour (str) –

the colour of the histogram or KDE filled area. Corresponds to matplotlib's color parameter. Defaults to "gray".
area_alpha (float) –

the opacity of the histogram or KDE filled area. Corresponds to matplotlib's alpha parameter. Defaults to 0.5.
plot_median_line (bool) –

whether to plot the median line. Defaults to True.
median_line_colour (str) –

the colour of the median line. Corresponds to matplotlib's color parameter. Defaults to "black".
median_line_format (str) –

the format of the median line. Corresponds to matplotlib's linestyle parameter. Defaults to "--".
plot_hdi_lines (bool) –

whether to plot the HDI lines. Defaults to True.
hdi_lines_colour (str) –

the colour of the HDI lines. Corresponds to matplotlib's color parameter. Defaults to "black".
hdi_line_format (str) –

the format of the HDI lines. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
plot_obs_point (bool) –

whether to plot the observed value as a marker. Defaults to True.
obs_point_marker (str) –

the marker type of the observed value. Corresponds to matplotlib's marker parameter. Defaults to "D".
obs_point_colour (str) –

the colour of the observed marker. Corresponds to matplotlib's color parameter. Defaults to "black".
obs_point_size (float) –

the size of the observed marker. Defaults to None.
plot_extrema_lines (bool) –

whether to plot small lines at the distribution extreme values. Defaults to True.
extrema_line_colour (str) –

the colour of the extrema lines. Corresponds to matplotlib's color parameter. Defaults to "black".
extrema_line_format (str) –

the format of the extrema lines. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
extrema_line_height (float) –

the maximum height of the extrema lines. Defaults to 12.
extrema_line_width (float) –

the width of the extrema line. Defaults to 1.
plot_base_line (bool) –

whether to plot a line at the base of the distribution. Defaults to True.
base_line_colour (str) –

the colour of the base line. Corresponds to matplotlib's color parameter. Defaults to "black".
base_line_format (str) –

the format of the base line. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
base_line_width (int) –

the width of the base line. Defaults to 1.
plot_experiment_name (bool) –

whether to plot the experiment names as labels. Defaults to True.

Returns:

Figure –

matplotlib.figure.Figure: the completed figure of the distribution plot

Examples:

Plot a distribution of metric values

study.plot_metric_summaries(
    metric="acc",
    class_label=0,
)

A plot of a metric's distribution

Comparing Experiments#

`report_pairwise_comparison` #

Reports on the comparison between two Experiments or ExperimentGroups.

Parameters:

metric (str) –

the name of the metric
class_label (int | None, default: None ) –

the class label. Defaults to None.

Other Parameters:

experiment_a (str) –

the name of an experiment in the '{EXPERIMENT_NAME}/{EXPERIMENT}' format. To compare an ExperimentGroup, use 'aggregated' as the experiment name
experiment_b (str) –

the name of an experiment in the '{EXPERIMENT_NAME}/{EXPERIMENT}' format. To compare an ExperimentGroup, use 'aggregated' as the experiment name
min_sig_diff (float | None) –

the minimal difference which is considered significant. Defaults to 0.1 * std.
precision (int) –

the precision of floats used when printing. Defaults to 4.

Returns:

str ( str ) –

a description of the significance of the difference between experiment_a and experiment_b

Examples:

Report on the difference in accuracy between experiments 'EXPERIMENT_A' and 'EXPERIMENT_B', with a minimum significance difference of 0.03.

>>> study.report_pairwise_comparison(
...     metric="acc",
...     class_label=0,
...     experiment_a="GROUP/EXPERIMENT_A",
...     experiment_b="GROUP/EXPERIMENT_B",
...     min_sig_diff=0.03,
... )

Experiment GROUP/EXPERIMENT_A's acc being lesser than GROUP/EXPERIMENT_B could be considered 'dubious'* (Median Δ=-0.0002, 95.00% HDI=[-0.0971, 0.0926], p_direction=50.13%).

There is a 53.11% probability that this difference is bidirectionally significant (ROPE=[-0.0300, 0.0300], p_ROPE=46.89%).

Bidirectional significance could be considered 'undecided'*.

There is a 26.27% probability that this difference is significantly negative (p_pos=26.84%, p_neg=26.27%).

Relative to two random models (p_ROPE,random=36.56%) significance is 1.2825 times less likely.

* These interpretations are based off of loose guidelines, and should change according to the application.

`report_pairwise_comparison_to_random` #

Reports on the comparison between an Experiment or ExperimentGroup and a simulated random classifier.

Parameters:

metric (str) –

the name of the metric
class_label (int | None, default: None ) –

the class label. Defaults to None.

Other Parameters:

experiment (str) –

the name of an experiment in the '{EXPERIMENT_NAME}/{EXPERIMENT}' format. To compare an ExperimentGroup, use 'aggregated' as the experiment name
min_sig_diff (float | None) –

the minimal difference which is considered significant. Defaults to 0.1 * std.
table_fmt (str) –

the format of the table. If 'records', the raw list of values is returned. If 'pandas' or 'pd', a Pandas DataFrame is returned. In all other cases, it is passed to tabulate. Defaults to tabulate's "html".
precision (int) –

the precision of floats used when printing. Defaults to 4.

Returns:

str ( list | DataFrame | str ) –

a description of the significance of the difference between experiment_a and experiment_b

Examples:

Report on the difference in accuracy to that of a random classifier

>>> print(
...     study.report_pairwise_comparison_to_random(
...         metric="acc",
...         class_label=0,
...         table_fmt="github",
...     )
... )

| Group   | Experiment   |   Median Δ |   p_direction |              ROPE |   p_ROPE |   p_sig |
|---------|--------------|------------|---------------|-------------------|----------|---------|
| GROUP   | EXPERIMENT_A |     0.3235 |        1.0000 | [-0.0056, 0.0056] |   0.0000 |  1.0000 |
| GROUP   | EXPERIMENT_B |     0.3231 |        1.0000 | [-0.0056, 0.0056] |   0.0000 |  1.0000 |

`plot_pairwise_comparison` #

Plots the distribution of the difference between two experiments.

Parameters:

metric (str) –

the name of the metric
class_label (int | None, default: None ) –

the class label. Defaults to None.

Other Parameters:

experiment_a (str) –

the name of an experiment in the '{EXPERIMENT_NAME}/{EXPERIMENT}' format. To compare an ExperimentGroup, use 'aggregated' as the experiment name
experiment_b (str) –

the name of an experiment in the '{EXPERIMENT_NAME}/{EXPERIMENT}' format. To compare an ExperimentGroup, use 'aggregated' as the experiment name
min_sig_diff (float | None) –

the minimal difference which is considered significant. Defaults to 0.1 * std.
method (str) –

the method for displaying a histogram, provided by Seaborn. Can be either a histogram or KDE. Defaults to "kde".
bandwidth (float) –

the bandwith parameter for the KDE. Corresponds to Seaborn's bw_adjust parameter. Defaults to 1.0.
bins (int | list[int] | str) –

the number of bins to use in the histrogram. Corresponds to numpy's bins parameter. Defaults to "auto".
figsize (tuple[float, float]) –

the figure size, in inches. Corresponds to matplotlib's figsize parameter. Defaults to None, in which case a decent default value will be approximated.
fontsize (float) –

fontsize for the experiment name labels. Defaults to 9.
axis_fontsize (float) –

fontsize for the x-axis ticklabels. Defaults to None, in which case the fontsize will be used.
precision (int) –

the required precision of the presented numbers. Defaults to 4.
edge_colour (str) –

the colour of the histogram or KDE edge. Corresponds to matplotlib's color parameter. Defaults to "black".
plot_min_sig_diff_lines (bool) –

whether to plot the borders of the ROPE, the lines of minimal significance. Defaults to True.
min_sig_diff_lines_colour (str) –

the colour of the lines of minimal significance. Corresponds to matplotlib's color parameter. Defaults to "black".
min_sig_diff_lines_format (str) –

the format of the lines of minimal significance. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
rope_area_colour (str) –

the colour of the ROPE area. Corresponds to matplotlib's color parameter. Defaults to "gray".
rope_area_alpha (float) –

the opacity of the ROPE area. Corresponds to matplotlib's alpha parameter. Defaults to 0.5.
neg_sig_diff_area_colour (str) –

the colour of the negatively significant area. Corresponds to matplotlib's color parameter. Defaults to "red".
neg_sig_diff_area_alpha (float) –

the opacity of the negatively significant area. Corresponds to matplotlib's alpha parameter. Defaults to 0.5.
pos_sig_diff_area_colour (str) –

the colour of the positively significant area. Corresponds to matplotlib's color parameter. Defaults to "green".
pos_sig_diff_area_alpha (float) –

the opacity of the positively significant area. Corresponds to matplotlib's alpha parameter. Defaults to 0.5.
plot_obs_point (bool) –

whether to plot the observed value as a marker. Defaults to True.
obs_point_marker (str) –

the marker type of the observed value. Corresponds to matplotlib's marker parameter. Defaults to "D".
obs_point_colour (str) –

the colour of the observed marker. Corresponds to matplotlib's color parameter. Defaults to "black".
obs_point_size (float) –

the size of the observed marker. Defaults to None.
plot_median_line (bool) –

whether to plot the median line. Defaults to True.
median_line_colour (str) –

the colour of the median line. Corresponds to matplotlib's color parameter. Defaults to "black".
median_line_format (str) –

the format of the median line. Corresponds to matplotlib's linestyle parameter. Defaults to "--".
plot_extrema_lines (bool) –

description. Defaults to True.
plot_extrema_lines (bool) –

whether to plot small lines at the distribution extreme values. Defaults to True.
extrema_line_colour (str) –

the colour of the extrema lines. Corresponds to matplotlib's color parameter. Defaults to "black".
extrema_line_format (str) –

the format of the extrema lines. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
extrema_line_width (float) –

the width of the extrema lines. Defaults to 1.
extrema_line_height (float) –

the maximum height of the extrema lines. Defaults to 12.
plot_base_line (bool) –

whether to plot a line at the base of the distribution. Defaults to True.
base_line_colour (str) –

the colour of the base line. Corresponds to matplotlib's color parameter. Defaults to "black".
base_line_format (str) –

the format of the base line. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
base_line_width (float) –

the width of the base line. Defaults to 1.
plot_proportions (bool) –

whether to plot the proportions of the data under the three areas as text. Defaults to True.

Returns:

Figure –

matplotlib.figure.Figure: the Matplotlib Figure represenation of the plot

Examples:

Plot the distribution of the difference of two metrics

study.plot_pairwise_comparison(
    metric="acc",
    class_label=0,
    experiment_a="GROUP/EXPERIMENT_A",
    experiment_b="GROUP/EXPERIMENT_B",
    min_sig_diff=0.03,
)

A plot of the distribution of the difference of two metrics

`report_listwise_comparison` #

Reports the probability for an experiment achieving a rank when compared to all other experiments on the same metric.

Any probability values smaller than the precision are discarded.

Parameters:

metric (str) –

the name of the metric
class_label (int | None, default: None ) –

the class label. Leave 0 or None if using a multiclass metric. Defaults to None.

Other Parameters:

table_fmt (str) –

the format of the table. If 'records', the raw list of values is returned. If 'pandas' or 'pd', a Pandas DataFrame is returned. In all other cases, it is passed to tabulate. Defaults to tabulate's "html".
precision (int) –

the required precision of the presented numbers. Defaults to 4.

Returns:

str ( list | DataFrame | str ) –

the table as a string

Examples:

Prints the probability of all experiments achieving a particular rank when compared against all others.

>>> print(
...     study.report_listwise_comparison(
...         metric="acc",
...         class_label=0,
...         table_fmt="github",
...     ),
... )

| Group   | Experiment   |   Rank 1 |   Rank 2 |
|---------|--------------|----------|----------|
| GROUP   | EXPERIMENT_B |   0.5013 |   0.4987 |
| GROUP   | EXPERIMENT_A |   0.4987 |   0.5013 |

Aggregating Experiments#

`report_aggregated_metric_summaries` #

Reports on the aggregation of Experiments in all ExperimentGroups.

Parameters:

metric (str) –

the name of the metric
class_label (int | None, default: None ) –

the class label. Defaults to None.

Other Parameters:

table_fmt (str) –

the format of the table. If 'records', the raw list of values is returned. If 'pandas' or 'pd', a Pandas DataFrame is returned. In all other cases, it is passed to tabulate. Defaults to tabulate's "html".
precision (int) –

the precision of floats used when printing. Defaults to 4.

Returns:

str ( list | DataFrame | str ) –

the table with experiment aggregation statistics as a string

Examples:

Report on the aggregated accuracy scores for each ExperimentGroup in this Study

>>> print(
...     study.report_aggregated_metric_summaries(
...         metric="acc",
...         class_label=0,
...         table_fmt="github",
...     ),
... )

| Group   |   Median |   Mode |              HDI |     MU |   Kurtosis |    Skew |   Var. Within |   Var. Between |     I2 |
|---------|----------|--------|------------------|--------|------------|---------|---------------|----------------|--------|
| GROUP   |   0.7884 | 0.7835 | [0.7413, 0.8411] | 0.0997 |    -0.0121 | -0.0161 |        0.0013 |         0.0019 | 59.04% |

`plot_experiment_aggregation` #

Plots the distrbution of sampled metric values for a specific experiment group, with the aggregated distribution, for a particular metric and class combination.

Parameters:

metric (str) –

the name of the metric
class_label (int | None, default: None ) –

the class label. Defaults to None.

Other Parameters:

experiment_group (str) –

the name of the experiment group
observed_values (dict[str, ExperimentResult]) –

the observed metric values
sampled_values (dict[str, ExperimentResult]) –

the sampled metric values
metric (Metric | AveragedMetric) –

the metric
method (str) –

the method for displaying a histogram, provided by Seaborn. Can be either a histogram or KDE. Defaults to "kde".
bandwidth (float) –

the bandwith parameter for the KDE. Corresponds to Seaborn's bw_adjust parameter. Defaults to 1.0.
bins (int | list[int] | str) –

the number of bins to use in the histrogram. Corresponds to numpy's bins parameter. Defaults to "auto".
normalize (bool) –

if normalized, each distribution will be scaled to [0, 1]. Otherwise, uses a shared y-axis. Defaults to False.
figsize (tuple[float, float]) –

the figure size, in inches. Corresponds to matplotlib's figsize parameter. Defaults to None, in which case a decent default value will be approximated.
fontsize (float) –

fontsize for the experiment name labels. Defaults to 9.
axis_fontsize (float) –

fontsize for the x-axis ticklabels. Defaults to None, in which case the fontsize will be used.
edge_colour (str) –

the colour of the histogram or KDE edge. Corresponds to matplotlib's color parameter. Defaults to "black".
area_colour (str) –

the colour of the histogram or KDE filled area. Corresponds to matplotlib's color parameter. Defaults to "gray".
area_alpha (float) –

the opacity of the histogram or KDE filled area. Corresponds to matplotlib's alpha parameter. Defaults to 0.5.
plot_median_line (bool) –

whether to plot the median line. Defaults to True.
median_line_colour (str) –

the colour of the median line. Corresponds to matplotlib's color parameter. Defaults to "black".
median_line_format (str) –

the format of the median line. Corresponds to matplotlib's linestyle parameter. Defaults to "--".
plot_hdi_lines (bool) –

whether to plot the HDI lines. Defaults to True.
hdi_lines_colour (str) –

the colour of the HDI lines. Corresponds to matplotlib's color parameter. Defaults to "black".
hdi_line_format (str) –

the format of the HDI lines. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
plot_obs_point (bool) –

whether to plot the observed value as a marker. Defaults to True.
obs_point_marker (str) –

the marker type of the observed value. Corresponds to matplotlib's marker parameter. Defaults to "D".
obs_point_colour (str) –

the colour of the observed marker. Corresponds to matplotlib's color parameter. Defaults to "black".
obs_point_size (float) –

the size of the observed marker. Defaults to None.
plot_extrema_lines (bool) –

whether to plot small lines at the distribution extreme values. Defaults to True.
extrema_line_colour (str) –

the colour of the extrema lines. Defaults to "black".
extrema_line_format (str) –

the format of the extrema lines. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
extrema_line_width (float) –

the width of the extrema lines. Defaults to 1.
extrema_line_height (float) –

the maximum height of the extrema lines. Defaults to 12.
plot_base_line (bool) –

whether to plot a line at the base of the distribution. Defaults to True.
base_line_colour (str) –

the colour of the base line. Corresponds to matplotlib's color parameter. Defaults to "black".
base_line_format (str) –

the format of the base line. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
base_line_width (float) –

the width of the base line. Defaults to 1.
plot_experiment_name (bool) –

whether to plot the experiment names as labels. Defaults to True.

Returns:

Figure –

matplotlib.figure.Figure: the completed figure of the distribution plot

Examples:

Plot the distributions and the aggregated distribution for ExperimentGroup 'GROUP'.

study.plot_experiment_aggregation(
    metric="acc",
    class_label=0,
    experiment_group="GROUP",
)

A plot of the distributions in an ExperimentGroup, along with the aggregated distribution.

`plot_forest_plot` #

Plots the distributions for a metric for each Experiment and aggregated ExperimentGroup.

Uses a forest plot format.

The median and HDIs of individual Experiment distributions are plotted as squares, and the aggregate distribution is plotted as a diamond below it. Also provides summary statistics bout each distribution, and the aggregation.

Parameters:

metric (str) –

the name of the metric
class_label (int | None, default: None ) –

the class label. Defaults to None.

Other Parameters:

figsize (tuple[float, float]) –

the figure size, in inches. Corresponds to matplotlib's figsize parameter. Defaults to None, in which case a decent default value will be approximated.
fontsize (float) –

fontsize for the experiment name labels. Defaults to 9.
axis_fontsize (float) –

fontsize for the x-axis ticklabels. Defaults to None, in which case the fontsize will be used.
fontname (str) –

the name of the font used. Corresponds to matplotlib's font family parameter. Defaults to "monospace".
median_marker (str) –

the marker type of the median value marker of the individual Experiment distributions. Corresponds to matplotlib's marker parameter. Defaults to "s".
median_marker_edge_colour (str) –

the colour of the individual Experiment median markers' edges. Corresponds to matplotlib's color parameter. Defaults to "black".
median_marker_face_colour (str) –

the colour of the individual Experiment median markers. Corresponds to matplotlib's color parameter. Defaults to "white".
median_marker_size (float) –

the size of the individual Experiment median markers. Defaults to None.
median_marker_line_width (float) –

the width of the aggregated median line. Defaults to 1.5.
agg_offset (int) –

the number of empty rows between the last Experiment and the aggregated row. Defaults to 1.
agg_median_marker (str) –

the marker type of the median value marker of the aggregated distribution. Corresponds to matplotlib's marker parameter. Defaults to "D".
agg_median_marker_edge_colour (str) –

the colour of the aggregated median markers' edges. Corresponds to matplotlib's color parameter. Defaults to "black".
agg_median_marker_face_colour (str) –

the colour of the aggregated median marker. Corresponds to matplotlib's color parameter. Defaults to "white".
agg_median_marker_size (float) –

the size of the individual aggregated median marker. Defaults to 10.
agg_median_marker_line_width (float) –

the width of the aggregated median marker. Defaults to 1.5.
hdi_lines_colour (str) –

the colour of the HDI lines. Corresponds to matplotlib's color parameter. Defaults to "black".
hdi_lines_format (str) –

the format of the HDI lines. Corresponds to matplotlib's linestyle parameter. Defaults to "-".
hdi_lines_width (int) –

the width of the HDI lines. Defaults to 1.
plot_agg_median_line (bool) –

whether to plot the a line through the aggregated median through all other Experiments in the ExperimentGroup. Defaults to True.
agg_median_line_colour (str) –

the colour of the aggregated median line. Corresponds to matplotlib's color parameter. Defaults to "black".
agg_median_line_format (str) –

the format of the aggregated median line. Corresponds to matplotlib's linestyle parameter. Defaults to "--".
agg_median_line_width (float) –

the width of the aggregated median line. Defaults to 1.0.
plot_experiment_name (bool) –

whether to plot the name of the individual Experiments. Defaults to True.
experiment_name_padding (int) –

the padding between the experiment names and the forest plot. Defaults to 0.
plot_experiment_info (bool) –

whether to plot statistics of the individual and aggregated distributions. Defaults to True.
precision (int) –

the required precision of the presented numbers. Defaults to 4.

Returns:

Figure –

matplotlib.figure.Figure: the Matplotlib Figure represenation of the forest plot

Examples:

Plot the distributions and the aggregated distribution for all ExperimentGroups as a forest plot.

study.plot_forest_plot(
    metric="acc",
    class_label=0,
)

A forest plot of the distributions in an ExperimentGroup, along with the aggregated distribution.

Study#

`Study` #

Attributes#

`num_classes` `property` #

`num_experiment_groups` `property` #

`num_experiments` `property` #

Configuration#

`to_dict` #

`from_dict` `classmethod` #

`add_experiment` #

`getitem` #

`add_metric` #

Estimating Uncertainty#

`get_metric_samples` #

`report_metric_summaries` #

`report_random_metric_summaries` #

`plot_metric_summaries` #

Comparing Experiments#

`report_pairwise_comparison` #

`report_pairwise_comparison_to_random` #

`plot_pairwise_comparison` #

`report_listwise_comparison` #

Aggregating Experiments#

`report_aggregated_metric_summaries` #

`plot_experiment_aggregation` #

`plot_forest_plot` #

`Config` #

Attributes#

`fingerprint` `property` #

`seed` `property` `writable` #

`num_samples` `property` `writable` #

`ci_probability` `property` `writable` #

`experiments` `property` `writable` #

`metrics` `property` `writable` #

Functions#

`_validate_seed` #

`_validate_num_samples` #

`_validate_ci_probability` #

`_validate_experiments` #

`_validate_metrics` #

Study#

Study #

Attributes#

num_classes property #

num_experiment_groups property #

num_experiments property #

Configuration#

to_dict #

from_dict classmethod #

add_experiment #

__getitem__ #

add_metric #

Estimating Uncertainty#

get_metric_samples #

report_metric_summaries #

report_random_metric_summaries #

plot_metric_summaries #

Comparing Experiments#

report_pairwise_comparison #

report_pairwise_comparison_to_random #

plot_pairwise_comparison #

report_listwise_comparison #

Aggregating Experiments#

report_aggregated_metric_summaries #

plot_experiment_aggregation #

plot_forest_plot #

Config #

Attributes#

fingerprint property #

seed property writable #

num_samples property writable #

ci_probability property writable #

experiments property writable #

metrics property writable #

Functions#

_validate_seed #

_validate_num_samples #

_validate_ci_probability #

_validate_experiments #

_validate_metrics #

`Study` #

`num_classes` `property` #

`num_experiment_groups` `property` #

`num_experiments` `property` #

`to_dict` #

`from_dict` `classmethod` #

`add_experiment` #

`getitem` #

`add_metric` #

`get_metric_samples` #

`report_metric_summaries` #

`report_random_metric_summaries` #

`plot_metric_summaries` #

`report_pairwise_comparison` #

`report_pairwise_comparison_to_random` #

`plot_pairwise_comparison` #

`report_listwise_comparison` #

`report_aggregated_metric_summaries` #

`plot_experiment_aggregation` #

`plot_forest_plot` #

`Config` #

`fingerprint` `property` #

`seed` `property` `writable` #

`num_samples` `property` `writable` #

`ci_probability` `property` `writable` #

`experiments` `property` `writable` #

`metrics` `property` `writable` #

`_validate_seed` #

`_validate_num_samples` #

`_validate_ci_probability` #

`_validate_experiments` #

`_validate_metrics` #