Metrics and Averaging#
Metrics, within the scope of this project, summarize a model's performance on some test set. It does so by comparing the model's class predictions against a paired set of condition labels (i.e. the ground truth class). The value a metric function spits out, should tell you something about the model's classification performance, whether it's good, bad or something in between.
Metrics can be either:
- multiclass, in which case they spit out a single value that combines all classes in one go
- binary, in which case they compute a value for each class individually
Usually, the former is a better indication for the overall performance of the model, whereas the latter provides more (usually supporting) fine-grained detail. To convert a binary metric into a multiclass metric, it can be composed with an averaging method. The averaging method takes in the \(k\) dimensional array of metric values (where \(k\) the number of classes), and yields a scalar value that combines all the per-class values.
Interface#
Usually, you will not be interacting with the metrics themselves. Instead, this library provides users with high-level methods for defining metrics and collections of metrics. The easiest method for constructing metrics is by passing a metric syntax string.
A valid metric syntax string consists of (in order):
- [Required] A registered metric alias (see below)
- [Required] Any keyword arguments that need to be passed to the metric function
- [Optional] An
@
symbol - [Optional] A registered averaging method alias (see below)
- [Optional] Any keyword arguments that need to be passed to the averaging function
No spaces should be used. Instead, keywords arguments start with a +
prepended to the key, followed by a =
and the value. All together:
Only numeric (float, int) or string arguments are accepted. The strings "None", "True" and "False" are converted to their Pythonic counterpart. The order of the keyword arguments does not matter, as long as they appear in the correct block.
Examples#
f1
: the F1 scoremcc
: the MCC scoreppv
: the Positive Predictive Valueprecision
: also Positive Predictive Value, as its a registered alias (see below)fbeta+beta=3.0
: the F3 scoref1@macro
: the macro averaged F1 scoreba+adjusted=True@binary+positive_class=2
: the chance-correct balanced accuracy score, but only for class 2 (starting at 0)p4@geometric
: the geometric mean of the P4 scoresmcc@harmonic
: the MCC score, since it's already a multiclass metric, the averaging is ignored
Metrics#
The following lists all implemented metrics, by alias
Alias | Metric | Multiclass | sklearn |
---|---|---|---|
'acc' | Accuracy |
True | accuracy_score |
'accuracy' | Accuracy |
True | accuracy_score |
'ba' | BalancedAccuracy |
True | balanced_accuracy_score |
'balanced_accuracy' | BalancedAccuracy |
True | balanced_accuracy_score |
'bookmaker_informedness' | Informedness |
False | |
'cohen_kappa' | CohensKappa |
True | cohen_kappa_score |
'critical_success_index' | JaccardIndex |
False | jaccard_score |
'delta_p' | Markedness |
False | |
'diag_mass' | DiagMass |
False | |
'diagnostic_odds_ratio' | DiagnosticOddsRatio |
False | |
'dor' | DiagnosticOddsRatio |
False | |
'f1' | F1 |
False | f1_score |
'fall-out' | FalsePositiveRate |
False | |
'fall_out' | FalsePositiveRate |
False | |
'false_discovery_rate' | FalseDiscoveryRate |
False | |
'false_negative_rate' | FalseNegativeRate |
False | |
'false_omission_rate' | FalseOmissionRate |
False | |
'false_positive_rate' | FalsePositiveRate |
False | |
'fbeta' | FBeta |
False | fbeta_score |
'fdr' | FalseDiscoveryRate |
False | |
'fnr' | FalseNegativeRate |
False | |
'for' | FalseOmissionRate |
False | |
'fpr' | FalsePositiveRate |
False | |
'hit_rate' | TruePositiveRate |
False | |
'informedness' | Informedness |
False | |
'jaccard' | JaccardIndex |
False | jaccard_score |
'jaccard_index' | JaccardIndex |
False | jaccard_score |
'kappa' | CohensKappa |
True | cohen_kappa_score |
'ldor' | LogDiagnosticOddsRatio |
False | |
'lnlr' | LogNegativeLikelihoodRatio |
False | class_likelihood_ratios |
'log_diagnostic_odds_ratio' | LogDiagnosticOddsRatio |
False | |
'log_dor' | LogDiagnosticOddsRatio |
False | |
'log_negative_likelihood_ratio' | LogNegativeLikelihoodRatio |
False | class_likelihood_ratios |
'log_nlr' | LogNegativeLikelihoodRatio |
False | class_likelihood_ratios |
'log_plr' | LogPositiveLikelihoodRatio |
False | class_likelihood_ratios |
'log_positive_likelihood_ratio' | LogPositiveLikelihoodRatio |
False | class_likelihood_ratios |
'lplr' | LogPositiveLikelihoodRatio |
False | class_likelihood_ratios |
'markedness' | Markedness |
False | |
'matthews_corrcoef' | MatthewsCorrelationCoefficient |
True | matthews_corrcoef |
'matthews_correlation_coefficient' | MatthewsCorrelationCoefficient |
True | matthews_corrcoef |
'mcc' | MatthewsCorrelationCoefficient |
True | matthews_corrcoef |
'miss_rate' | FalseNegativeRate |
False | |
'model_bias' | ModelBias |
False | |
'negative_likelihood_ratio' | NegativeLikelihoodRatio |
False | class_likelihood_ratios |
'negative_predictive_value' | NegativePredictiveValue |
False | |
'nlr' | NegativeLikelihoodRatio |
False | class_likelihood_ratios |
'npv' | NegativePredictiveValue |
False | |
'p4' | P4 |
False | |
'phi' | MatthewsCorrelationCoefficient |
True | matthews_corrcoef |
'phi_coefficient' | MatthewsCorrelationCoefficient |
True | matthews_corrcoef |
'plr' | PositiveLikelihoodRatio |
False | class_likelihood_ratios |
'positive_likelihood_ratio' | PositiveLikelihoodRatio |
False | class_likelihood_ratios |
'positive_predictive_value' | PositivePredictiveValue |
False | |
'ppv' | PositivePredictiveValue |
False | |
'precision' | PositivePredictiveValue |
False | |
'prev_thresh' | PrevalenceThreshold |
False | |
'prevalence' | Prevalence |
False | |
'prevalence_threshold' | PrevalenceThreshold |
False | |
'pt' | PrevalenceThreshold |
False | |
'recall' | TruePositiveRate |
False | |
'selectivity' | TrueNegativeRate |
False | |
'sensitivity' | TruePositiveRate |
False | |
'specificity' | TrueNegativeRate |
False | |
'threat_score' | JaccardIndex |
False | jaccard_score |
'tnr' | TrueNegativeRate |
False | |
'tpr' | TruePositiveRate |
False | |
'true_negative_rate' | TrueNegativeRate |
False | |
'true_positive_rate' | TruePositiveRate |
False | |
'youden_j' | Informedness |
False | |
'youdenj' | Informedness |
False |
Averaging#
The following lists all implemented metric averaging methods, by alias
Alias | Metric | sklearn |
---|---|---|
'binary' | SelectPositiveClass |
binary |
'geom' | GeometricMean |
|
'geometric' | GeometricMean |
|
'harm' | HarmonicMean |
|
'harmonic' | HarmonicMean |
|
'macro' | MacroAverage |
macro |
'macro_average' | MacroAverage |
macro |
'mean' | MacroAverage |
macro |
'micro' | WeightedAverage |
weighted |
'micro_average' | WeightedAverage |
weighted |
'select' | SelectPositiveClass |
binary |
'select_positive' | SelectPositiveClass |
binary |
'weighted' | WeightedAverage |
weighted |
'weighted_average' | WeightedAverage |
weighted |