Skip to content

Metrics and Averaging#

Metrics, within the scope of this project, summarize a model's performance on some test set. It does so by comparing the model's class predictions against a paired set of condition labels (i.e. the ground truth class). The value a metric function spits out, should tell you something about the model's classification performance, whether it's good, bad or something in between.

Metrics can be either:

  1. multiclass, in which case they spit out a single value that combines all classes in one go
  2. binary, in which case they compute a value for each class individually

Usually, the former is a better indication for the overall performance of the model, whereas the latter provides more (usually supporting) fine-grained detail. To convert a binary metric into a multiclass metric, it can be composed with an averaging method. The averaging method takes in the \(k\) dimensional array of metric values (where \(k\) the number of classes), and yields a scalar value that combines all the per-class values.

Interface#

Usually, you will not be interacting with the metrics themselves. Instead, this library provides users with high-level methods for defining metrics and collections of metrics. The easiest method for constructing metrics is by passing a metric syntax string.

A valid metric syntax string consists of (in order):

  1. [Required] A registered metric alias (see below)
  2. [Required] Any keyword arguments that need to be passed to the metric function
  3. [Optional] An @ symbol
  4. [Optional] A registered averaging method alias (see below)
  5. [Optional] Any keyword arguments that need to be passed to the averaging function

No spaces should be used. Instead, keywords arguments start with a + prepended to the key, followed by a = and the value. All together:

<metric-alias>+<arg-key>=<arg-val>@<avg-method-alias>+<arg-key>=<arg-value>

Only numeric (float, int) or string arguments are accepted. The strings "None", "True" and "False" are converted to their Pythonic counterpart. The order of the keyword arguments does not matter, as long as they appear in the correct block.

Examples#

  1. f1: the F1 score
  2. mcc: the MCC score
  3. ppv: the Positive Predictive Value
  4. precision: also Positive Predictive Value, as its a registered alias (see below)
  5. fbeta+beta=3.0: the F3 score
  6. f1@macro: the macro averaged F1 score
  7. ba+adjusted=True@binary+positive_class=2: the chance-correct balanced accuracy score, but only for class 2 (starting at 0)
  8. p4@geometric: the geometric mean of the P4 scores
  9. mcc@harmonic: the MCC score, since it's already a multiclass metric, the averaging is ignored

Metrics#

The following lists all implemented metrics, by alias

Alias Metric Multiclass sklearn
'acc' Accuracy True accuracy_score
'accuracy' Accuracy True accuracy_score
'ba' BalancedAccuracy True balanced_accuracy_score
'balanced_accuracy' BalancedAccuracy True balanced_accuracy_score
'bookmaker_informedness' Informedness False
'cohen_kappa' CohensKappa True cohen_kappa_score
'critical_success_index' JaccardIndex False jaccard_score
'delta_p' Markedness False
'diag_mass' DiagMass False
'diagnostic_odds_ratio' DiagnosticOddsRatio False
'dor' DiagnosticOddsRatio False
'f1' F1 False f1_score
'fall-out' FalsePositiveRate False
'fall_out' FalsePositiveRate False
'false_discovery_rate' FalseDiscoveryRate False
'false_negative_rate' FalseNegativeRate False
'false_omission_rate' FalseOmissionRate False
'false_positive_rate' FalsePositiveRate False
'fbeta' FBeta False fbeta_score
'fdr' FalseDiscoveryRate False
'fnr' FalseNegativeRate False
'for' FalseOmissionRate False
'fpr' FalsePositiveRate False
'hit_rate' TruePositiveRate False
'informedness' Informedness False
'jaccard' JaccardIndex False jaccard_score
'jaccard_index' JaccardIndex False jaccard_score
'kappa' CohensKappa True cohen_kappa_score
'ldor' LogDiagnosticOddsRatio False
'lnlr' LogNegativeLikelihoodRatio False class_likelihood_ratios
'log_diagnostic_odds_ratio' LogDiagnosticOddsRatio False
'log_dor' LogDiagnosticOddsRatio False
'log_negative_likelihood_ratio' LogNegativeLikelihoodRatio False class_likelihood_ratios
'log_nlr' LogNegativeLikelihoodRatio False class_likelihood_ratios
'log_plr' LogPositiveLikelihoodRatio False class_likelihood_ratios
'log_positive_likelihood_ratio' LogPositiveLikelihoodRatio False class_likelihood_ratios
'lplr' LogPositiveLikelihoodRatio False class_likelihood_ratios
'markedness' Markedness False
'matthews_corrcoef' MatthewsCorrelationCoefficient True matthews_corrcoef
'matthews_correlation_coefficient' MatthewsCorrelationCoefficient True matthews_corrcoef
'mcc' MatthewsCorrelationCoefficient True matthews_corrcoef
'miss_rate' FalseNegativeRate False
'model_bias' ModelBias False
'negative_likelihood_ratio' NegativeLikelihoodRatio False class_likelihood_ratios
'negative_predictive_value' NegativePredictiveValue False
'nlr' NegativeLikelihoodRatio False class_likelihood_ratios
'npv' NegativePredictiveValue False
'p4' P4 False
'phi' MatthewsCorrelationCoefficient True matthews_corrcoef
'phi_coefficient' MatthewsCorrelationCoefficient True matthews_corrcoef
'plr' PositiveLikelihoodRatio False class_likelihood_ratios
'positive_likelihood_ratio' PositiveLikelihoodRatio False class_likelihood_ratios
'positive_predictive_value' PositivePredictiveValue False
'ppv' PositivePredictiveValue False
'precision' PositivePredictiveValue False
'prev_thresh' PrevalenceThreshold False
'prevalence' Prevalence False
'prevalence_threshold' PrevalenceThreshold False
'pt' PrevalenceThreshold False
'recall' TruePositiveRate False
'selectivity' TrueNegativeRate False
'sensitivity' TruePositiveRate False
'specificity' TrueNegativeRate False
'threat_score' JaccardIndex False jaccard_score
'tnr' TrueNegativeRate False
'tpr' TruePositiveRate False
'true_negative_rate' TrueNegativeRate False
'true_positive_rate' TruePositiveRate False
'youden_j' Informedness False
'youdenj' Informedness False

Averaging#

The following lists all implemented metric averaging methods, by alias

Alias Metric sklearn
'binary' SelectPositiveClass binary
'geom' GeometricMean
'geometric' GeometricMean
'harm' HarmonicMean
'harmonic' HarmonicMean
'macro' MacroAverage macro
'macro_average' MacroAverage macro
'mean' MacroAverage macro
'micro' WeightedAverage weighted
'micro_average' WeightedAverage weighted
'select' SelectPositiveClass binary
'select_positive' SelectPositiveClass binary
'weighted' WeightedAverage weighted
'weighted_average' WeightedAverage weighted