Metrics and Averaging#

Metrics, within the scope of this project, summarize a model's performance on some test set. It does so by comparing the model's class predictions against a paired set of condition labels (i.e. the ground truth class). The value a metric function spits out, should tell you something about the model's classification performance, whether it's good, bad or something in between.

Metrics can be either:

multiclass, in which case they spit out a single value that combines all classes in one go
binary, in which case they compute a value for each class individually

Usually, the former is a better indication for the overall performance of the model, whereas the latter provides more (usually supporting) fine-grained detail. To convert a binary metric into a multiclass metric, it can be composed with an averaging method. The averaging method takes in the \(k\) dimensional array of metric values (where \(k\) the number of classes), and yields a scalar value that combines all the per-class values.

Interface#

Usually, you will not be interacting with the metrics themselves. Instead, this library provides users with high-level methods for defining metrics and collections of metrics. The easiest method for constructing metrics is by passing a metric syntax string.

A valid metric syntax string consists of (in order):

[Required] A registered metric alias (see below)
[Required] Any keyword arguments that need to be passed to the metric function
[Optional] An @ symbol
[Optional] A registered averaging method alias (see below)
[Optional] Any keyword arguments that need to be passed to the averaging function

No spaces should be used. Instead, keywords arguments start with a + prepended to the key, followed by a = and the value. All together:

<metric-alias>+<arg-key>=<arg-val>@<avg-method-alias>+<arg-key>=<arg-value>

Only numeric (float, int) or string arguments are accepted. The strings "None", "True" and "False" are converted to their Pythonic counterpart. The order of the keyword arguments does not matter, as long as they appear in the correct block.

Examples#

f1: the F1 score
mcc: the MCC score
ppv: the Positive Predictive Value
precision: also Positive Predictive Value, as its a registered alias (see below)
fbeta+beta=3.0: the F3 score
f1@macro: the macro averaged F1 score
ba+adjusted=True@binary+positive_class=2: the chance-correct balanced accuracy score, but only for class 2 (starting at 0)
p4@geometric: the geometric mean of the P4 scores
mcc@harmonic: the MCC score, since it's already a multiclass metric, the averaging is ignored

Metrics#

The following lists all implemented metrics, by alias

Alias	Metric	Multiclass	sklearn
'acc'	`Accuracy`	True	accuracy_score
'accuracy'	`Accuracy`	True	accuracy_score
'ba'	`BalancedAccuracy`	True	balanced_accuracy_score
'balanced_accuracy'	`BalancedAccuracy`	True	balanced_accuracy_score
'bookmaker_informedness'	`Informedness`	False
'cohen_kappa'	`CohensKappa`	True	cohen_kappa_score
'critical_success_index'	`JaccardIndex`	False	jaccard_score
'delta_p'	`Markedness`	False
'diag_mass'	`DiagMass`	False
'diagnostic_odds_ratio'	`DiagnosticOddsRatio`	False
'dor'	`DiagnosticOddsRatio`	False
'f1'	`F1`	False	f1_score
'fall-out'	`FalsePositiveRate`	False
'fall_out'	`FalsePositiveRate`	False
'false_discovery_rate'	`FalseDiscoveryRate`	False
'false_negative_rate'	`FalseNegativeRate`	False
'false_omission_rate'	`FalseOmissionRate`	False
'false_positive_rate'	`FalsePositiveRate`	False
'fbeta'	`FBeta`	False	fbeta_score
'fdr'	`FalseDiscoveryRate`	False
'fnr'	`FalseNegativeRate`	False
'for'	`FalseOmissionRate`	False
'fpr'	`FalsePositiveRate`	False
'hit_rate'	`TruePositiveRate`	False
'informedness'	`Informedness`	False
'jaccard'	`JaccardIndex`	False	jaccard_score
'jaccard_index'	`JaccardIndex`	False	jaccard_score
'kappa'	`CohensKappa`	True	cohen_kappa_score
'ldor'	`LogDiagnosticOddsRatio`	False
'lnlr'	`LogNegativeLikelihoodRatio`	False	class_likelihood_ratios
'log_diagnostic_odds_ratio'	`LogDiagnosticOddsRatio`	False
'log_dor'	`LogDiagnosticOddsRatio`	False
'log_negative_likelihood_ratio'	`LogNegativeLikelihoodRatio`	False	class_likelihood_ratios
'log_nlr'	`LogNegativeLikelihoodRatio`	False	class_likelihood_ratios
'log_plr'	`LogPositiveLikelihoodRatio`	False	class_likelihood_ratios
'log_positive_likelihood_ratio'	`LogPositiveLikelihoodRatio`	False	class_likelihood_ratios
'lplr'	`LogPositiveLikelihoodRatio`	False	class_likelihood_ratios
'markedness'	`Markedness`	False
'matthews_corrcoef'	`MatthewsCorrelationCoefficient`	True	matthews_corrcoef
'matthews_correlation_coefficient'	`MatthewsCorrelationCoefficient`	True	matthews_corrcoef
'mcc'	`MatthewsCorrelationCoefficient`	True	matthews_corrcoef
'miss_rate'	`FalseNegativeRate`	False
'model_bias'	`ModelBias`	False
'negative_likelihood_ratio'	`NegativeLikelihoodRatio`	False	class_likelihood_ratios
'negative_predictive_value'	`NegativePredictiveValue`	False
'nlr'	`NegativeLikelihoodRatio`	False	class_likelihood_ratios
'npv'	`NegativePredictiveValue`	False
'p4'	`P4`	False
'phi'	`MatthewsCorrelationCoefficient`	True	matthews_corrcoef
'phi_coefficient'	`MatthewsCorrelationCoefficient`	True	matthews_corrcoef
'plr'	`PositiveLikelihoodRatio`	False	class_likelihood_ratios
'positive_likelihood_ratio'	`PositiveLikelihoodRatio`	False	class_likelihood_ratios
'positive_predictive_value'	`PositivePredictiveValue`	False
'ppv'	`PositivePredictiveValue`	False
'precision'	`PositivePredictiveValue`	False
'prev_thresh'	`PrevalenceThreshold`	False
'prevalence'	`Prevalence`	False
'prevalence_threshold'	`PrevalenceThreshold`	False
'pt'	`PrevalenceThreshold`	False
'recall'	`TruePositiveRate`	False
'selectivity'	`TrueNegativeRate`	False
'sensitivity'	`TruePositiveRate`	False
'specificity'	`TrueNegativeRate`	False
'threat_score'	`JaccardIndex`	False	jaccard_score
'tnr'	`TrueNegativeRate`	False
'tpr'	`TruePositiveRate`	False
'true_negative_rate'	`TrueNegativeRate`	False
'true_positive_rate'	`TruePositiveRate`	False
'youden_j'	`Informedness`	False
'youdenj'	`Informedness`	False

Averaging#

The following lists all implemented metric averaging methods, by alias

Alias	Metric	sklearn
'binary'	`SelectPositiveClass`	binary
'geom'	`GeometricMean`
'geometric'	`GeometricMean`
'harm'	`HarmonicMean`
'harmonic'	`HarmonicMean`
'macro'	`MacroAverage`	macro
'macro_average'	`MacroAverage`	macro
'mean'	`MacroAverage`	macro
'micro'	`WeightedAverage`	weighted
'micro_average'	`WeightedAverage`	weighted
'select'	`SelectPositiveClass`	binary
'select_positive'	`SelectPositiveClass`	binary
'weighted'	`WeightedAverage`	weighted
'weighted_average'	`WeightedAverage`	weighted