Skip to content

Metrics

Abstract Base Class#

Metric #

The abstract base class for metrics.

Properties should be implemented as class attributes in derived metrics

The compute_metric method needs to be implemented

Attributes#

full_name abstractmethod instance-attribute property #

A human-readable name for this metric.

is_multiclass abstractmethod instance-attribute property #

Whether or not this metric computes a value for each class individually, or for all classes at once.

bounds abstractmethod instance-attribute property #

A tuple of the minimum and maximum possible value for this metric to take.

Can be infinite.

dependencies abstractmethod instance-attribute property #

All metrics upon which this metric depends.

Used to generate a computation schedule, such that no metric is calculated before its dependencies. The dependencies must match the compute_metric signature. This is checked during class definition.

sklearn_equivalent abstractmethod instance-attribute property #

The sklearn equivalent function, if applicable

aliases abstractmethod instance-attribute property #

A list of all valid aliases for this metric. Can be used when creating metric syntax strings.

Functions#

compute_metric abstractmethod #

Computes the metric values from its dependencies.

AveragedMetric #

The abstract base class for the composition of any instance of Metric with any instance of Averaging.

Parameters:

  • metric (Metric) –

    a binary metric

  • averaging (Averaging) –

    an averaging method

Attributes#

aliases property #

A list of all valid aliases for this metric.

Constructed from the product of the all aliases of the Metric and Averaging methods.

Can be used when creating metric syntax strings.

is_multiclass property #

Whether or not this metric computes a value for each class individually, or for all classes at once.

An AveragedMetric is always multiclass.

bounds property #

A tuple of the minimum and maximum possible value for this metric to take. Can be infinite.

dependencies property #

All metrics upon which this AveragedMetric depends.

Constructed from the union of all Metric and AveragingMethod dependencies.

Used to generate a computation schedule, such that no metric is calculated before its dependencies.

The dependencies must match the compute_metric signature.

This is checked during class definition.

sklearn_equivalent property #

The sklearn equivalent function, if applicable

Metric Instances#

DiagMass #

Bases: Metric

Computes the mass on the diagonal of the normalized confusion matrix.

It is defined as the rate of true positives to all entries:

\[\mathtt{diag}(\mathbf{CM})=TP / N\]

where \(TP\) are the true positives, and \(N\) are the total number of predictions.

This is a metric primarily used as a intermediate value for other metrics, and says relatively little on its own.

Not to be confused with the True Positive Rate.

aliases = ['diag_mass'] #

bounds = (0.0, 1.0) #

dependencies = ('norm_confusion_matrix',) #

is_multiclass = False #

sklearn_equivalent = None #

Prevalence #

Bases: Metric

Computes the marginal distribution of condition occurence. Also known as the prevalence.

It can be defined as the rate of positives to all predictions:

\[\mathtt{Prev}=P / N\]

where \(P\) is the count of condition positives, and \(N\) are the total number of predictions.

This is a metric primarily used as a intermediate value for other metrics, and say relatively little on its own.

aliases = ['prevalence'] #

bounds = (0.0, 1.0) #

dependencies = ('p_condition',) #

is_multiclass = False #

sklearn_equivalent = None #

ModelBias #

Bases: Metric

Computes the marginal distribution of prediction occurence. Also known as the model bias.

It can be defined as the rate of predicted positives to all predictions:

\[\mathtt{Bias}=PP / N\]

where \(PP\) is the count of predicted positives, and \(N\) are the total number of predictions.

This is a metric primarily used as a intermediate value for other metrics, and say relatively little on its own.

aliases = ['model_bias'] #

bounds = (0.0, 1.0) #

dependencies = ('p_pred',) #

is_multiclass = False #

sklearn_equivalent = None #

TruePositiveRate #

Bases: Metric

Computes the True Positive Rate, also known as recall, sensitivity.

It is defined as the ratio of correctly predited positives to all condition positives:

\[\mathtt{TPR}=TP / P\]

where \(TP\) are the true positives, and \(TN\) are true negatives and \(N\) the number of predictions.

Essentially, out of all condition positives, how many were correctly predicted. Can be seen as a metric measuring retrieval.

Examples:

  • tpr
  • recall@macro
Read more:
  1. scikit-learn
  2. Wikipedia

aliases = ['true_positive_rate', 'sensitivity', 'recall', 'hit_rate', 'tpr'] #

bounds = (0.0, 1.0) #

dependencies = ('p_pred_given_condition',) #

is_multiclass = False #

sklearn_equivalent = None #

FalseNegativeRate #

Bases: Metric

Computes the False Negative Rate, also known as the miss-rate.

It is defined as the ratio of false negatives to condition positives:

\[\mathtt{FNR}=FN / (TP + FN)\]

where \(TP\) are the true positives, and \(FN\) are the false negatives.

Examples:

  • fnr
  • false_negative_rate@macro
Read more:
  1. Wikipedia

aliases = ['false_negative_rate', 'miss_rate', 'fnr'] #

bounds = (0.0, 1.0) #

dependencies = ('true_positive_rate',) #

is_multiclass = False #

sklearn_equivalent = None #

PositivePredictiveValue #

Bases: Metric

Computes the Positive Predictive Value, also known as precision.

It is defined as the ratio of true positives to predicted positives:

\[\mathtt{PPV}=TP / (TP + FP)\]

where \(TP\) is the count of true positives, and \(FP\) the count falsely predicted positives.

It is the complement of the False Discovery Rate, \(PPV=1-FDR\).

Examples:

  • ppv
  • precision@macro
Read more:
  1. scikit-learn
  2. Wikipedia

aliases = ['positive_predictive_value', 'precision', 'ppv'] #

bounds = (0.0, 1.0) #

dependencies = ('p_condition_given_pred',) #

is_multiclass = False #

sklearn_equivalent = None #

FalseDiscoveryRate #

Bases: Metric

Computes the False Discovery Rate.

It is defined as the ratio of falsely predicted positives to predicted positives:

\[\mathtt{FDR}=FP / (TP + FP)\]

where \(TP\) is the count of true positives, and \(FP\) the count of falsely predicted positives.

It is the complement of the Positive Predictve Value, \(FDR=1-PPV\).

Examples:

  • fdr
  • false_discovery_rate@macro
Read more:
  1. Wikipedia

aliases = ['false_discovery_rate', 'fdr'] #

bounds = (0.0, 1.0) #

dependencies = ('positive_predictive_value',) #

is_multiclass = False #

sklearn_equivalent = None #

FalsePositiveRate #

Bases: Metric

Computes the False Positive Rate, the probability of false alarm.

Also known as the fall-out.

It is defined as the ratio of falsely predicted positives to condition negatives:

\[\mathtt{FPR}=FP / (TN + FP)\]

where \(TN\) is the count of true negatives, and \(FP\) the count of falsely predicted positives.

It is the complement of the True Negative Rate, \(FPR=1-TNR\).

Examples:

  • fpr
  • fall-out@macro
Read more:
  1. Wikipedia

aliases = ['false_positive_rate', 'fall-out', 'fall_out', 'fpr'] #

bounds = (0.0, 1.0) #

dependencies = ('diag_mass', 'p_pred', 'p_condition') #

is_multiclass = False #

sklearn_equivalent = None #

TrueNegativeRate #

Bases: Metric

Computes the True Negative Rate, i.e. specificity, selectivity.

It is defined as the ratio of true predicted negatives to condition negatives:

\[\mathtt{TNR}=TN / (TN + FP)\]

where \(TN\) is the count of true negatives, and FP the count of falsely predicted positives.

It is the complement of the False Positive Rate, \(TNR=1-FPR\).

Examples:

  • tnr
  • selectivity@macro
Read more:
  1. Wikipedia

aliases = ['true_negative_rate', 'specificity', 'selectivity', 'tnr'] #

bounds = (0.0, 1.0) #

dependencies = ('false_positive_rate',) #

is_multiclass = False #

sklearn_equivalent = None #

FalseOmissionRate #

Bases: Metric

Computes the False Omission Rate.

It is defined as the ratio of falsely predicted negatives to all predicted negatives:

\[\mathtt{FOR}=FN / (TN + FN)\]

where \(\(TN\)\) is the count of true negatives, and \(\(FN\)\) the count of falsely predicted negatives.

It is the complement of the Negative Predictive Value, \(FOR=1-NPV\).

Examples:

  • for
  • false_omission_rate@macro
Read more:
  1. Wikipedia

aliases = ['false_omission_rate', 'for'] #

bounds = (0.0, 1.0) #

dependencies = ('p_condition', 'p_pred', 'diag_mass') #

is_multiclass = False #

sklearn_equivalent = None #

NegativePredictiveValue #

Bases: Metric

Computes the Negative Predicitive Value.

It is defined as the ratio of true negatives to all predicted negatives:

\[\mathtt{NPV}=TN / (TN + FN)\]

where TN are the true negatives, and FN are the falsely predicted negatives.

It is the complement of the False Omission Rate, \(NPV=1-FOR\).

Examples:

  • npv
  • negative_predictive_value@macro
Read more:
  1. Wikipedia

aliases = ['negative_predictive_value', 'npv'] #

bounds = (0.0, 1.0) #

dependencies = ('false_omission_rate',) #

is_multiclass = False #

sklearn_equivalent = None #

Accuracy #

Bases: Metric

Computes the multiclass accuracy score.

It is defined as the rate of correct classifications to all classifications:

\[\mathtt{Acc}=(TP + TN) / N\]

where \(TP\) are the true positives, \(TN\) the true negatives and \(N\) the total number of predictions.

Possible values lie in the range [0.0, 1.0], with larger values denoting better performance. The value of a random classifier is dependent on the label distribution, which makes accuracy especially susceptible to class imbalance. It is also not directly comparable across datasets.

Examples:

  • acc
  • accuracy@macro
Read more:
  1. scikit-learn
  2. Wikipedia

aliases = ['acc', 'accuracy'] #

bounds = (0.0, 1.0) #

dependencies = ('diag_mass',) #

is_multiclass = True #

sklearn_equivalent = 'accuracy_score' #

BalancedAccuracy #

Bases: Metric

Computes the balanced accuracy score.

It is defined as the the arithmetic average of the per-class true-positive rate:

\[\mathtt{BA}=\frac{1}{|C|}\sum TPR_{c}\]

where \(TPR\) is the true positive rate (precision).

Possible values lie in the range [0.0, 1.0], with larger values denoting better performance. Unlike accuracy, balanced accuracy can be 'chance corrected', such that random performance is yield a score of 0.0. This can be achieved by setting adjusted=True.

Examples:

  • ba
  • balanced_accuracy@macro
  • ba+adjusted=True
Read more:
  1. scikit-learn

Parameters:

  • adjusted (bool, default: False ) –

    whether the chance-corrected variant is computed. Defaults to False.

aliases = ['ba', 'balanced_accuracy'] #

bounds = (0.0, 1.0) #

dependencies = ('tpr', 'p_condition') #

is_multiclass = True #

sklearn_equivalent = 'balanced_accuracy_score' #

MatthewsCorrelationCoefficient #

Bases: Metric

Computes the multiclass Matthew's Correlation Coefficient (MCC), also known as the phi coefficient.

Goes by a variety of names, depending on the application scenario.

A metric that holistically combines many different classification metrics.

A perfect classifier scores 1.0, a random classifier 0.0. Values smaller than 0 indicate worse than random performance.

It's absolute value is proportional to the square root of the Chi-square test statistic.

Quoting Wikipedia:

Some scientists claim the Matthews correlation coefficient to be the most informative single score to establish the quality of a binary classifier prediction in a confusion matrix context.

Examples:

  • mcc
  • phi
Read more:
  1. scikit-learn
  2. Wikipedia

aliases = ['mcc', 'matthews_corrcoef', 'matthews_correlation_coefficient', 'phi', 'phi_coefficient'] #

bounds = (-1.0, 1.0) #

dependencies = ('diag_mass', 'p_condition', 'p_pred') #

is_multiclass = True #

sklearn_equivalent = 'matthews_corrcoef' #

CohensKappa #

Bases: Metric

Computes the multiclass Cohen's Kappa coefficient.

Commonly used to quantify inter-annotator agreement, Cohen's kappa can also be used to quantify the quality of a predictor.

It is defined as

\[\kappa=\frac{p_o-p_e}{1-p_e}\]

where \(p_o\) is the observed agreement and \(p_e\) the expected agreement due to chance. Perfect agreement yields a score of 1, with a score of 0 corresponding to random performance. Several guidelines exist to interpret the magnitude of the score.

Examples:

  • kappa
  • cohen_kappa
Read more:
  1. sklearn
  2. Wikipedia

aliases = ['kappa', 'cohen_kappa'] #

bounds = (-1.0, 1.0) #

dependencies = ('diag_mass', 'p_condition', 'p_pred') #

is_multiclass = True #

sklearn_equivalent = 'cohen_kappa_score' #

F1 #

Bases: Metric

Computes the univariate \(F_{1}\)-score.

It is defined as:

\[\mathtt{F}_{1}=2\dfrac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}}\]

or simply put, the harmonic mean between precision (PPV) and recall (TPR).

It is an exceedingly common metric used to evaluate machine learning performance. It is closely related to the Precision-Recall curve, an anlysis with varying thresholds.

The 1 in the name from an unseen \(\beta\) parameter that weights precision and recall. See the FBeta metric.

The \(F_{1}\)-score is susceptible to class imbalance. Values fall in the range [0, 1]. A random classifier which predicts a class with a probability \(p\), achieves a performance of,

\[2\dfrac{\text{prevalence}\cdot p}{\text{prevalence}+p}.\]

Since this value is maximized for \(p=1\), Flach & Kull recommend comparing performance not to a random classifier, but the 'always-on' classifier (perfect recall but poor precision). See the F1Gain metric.

Examples:

  • f1
  • f1@macro
Read more:
  1. sklearn
  2. Wikipedia

aliases = ['f1'] #

bounds = (0.0, 1.0) #

dependencies = ('ppv', 'tpr') #

is_multiclass = False #

sklearn_equivalent = 'f1_score' #

FBeta #

Bases: Metric

Computes the univariate \(F_{\beta}\)-score.

Commonly used to quantify inter-annotator agreement, Cohen's kappa can also be used to quantify the quality of a predictor.

It is defined as:

\[\mathtt{F}_{\beta}=(1+\beta^2)\dfrac{\text{precision} \cdot \text{recall}}{\beta^2\cdot\text{precision} + \text{recall}}\]

or simply put, the weighted harmonic mean between precision (PPV) and recall (TPR).

The value of \(\beta\) determines to which degree a user deems recall more important than precision. Larger values (x > 1) weight recall more, whereas lower values weight precision more. A value of 1 corresponds to equal weighting, see the F1 metric.

The \(F_{\beta}\)-score is susceptible to class imbalance. Values fall in the range [0, 1]. A random classifier which predicts a class with a probability \(p\), achieves a performance of,

\[(1+\beta^2)\dfrac{\text{prevalence}\cdot p}{\beta^2\cdot\text{prevalence}+p}.\]

Since this value is maximized for \(p=1\), Flach & Kull recommend comparing performance not to a random classifier, but the 'always-on' classifier (perfect recall but poor precision). See the FBetaGain metric.

Examples:

  • fbeta+beta=2
  • fbeta+beta=0.5@macro
Read more:
  1. sklearn
  2. Wikipedia

aliases = ['fbeta'] #

bounds = (0.0, 1.0) #

dependencies = ('ppv', 'tpr') #

is_multiclass = False #

sklearn_equivalent = 'fbeta_score' #

Informedness #

Bases: Metric

Computes the Informedness metric, also known Youden's J.

It is defined as:

\[\mathtt{J}=\text{sensitivity}+\text{specificity}-1\]

where sensitivity is the True Positive Rate (TPR), and specificity is the True Negative Rate (TNR).

Values fall in the range [-1, 1], with higher values corresponding to better performance and 0 corresponding to random performance.

In the binary case, this metric is equivalent to the adjusted balanced accuracy, ba+adj=True.

It is commonly used in conjunction with a Reciever-Operator Curve analysis.

Examples:

  • informedness
  • youdenj@macro
Read more:
  1. Wikipedia

aliases = ['informedness', 'youdenj', 'youden_j', 'bookmaker_informedness', 'bm'] #

bounds = (-1.0, 1.0) #

dependencies = ('tpr', 'tnr') #

is_multiclass = False #

sklearn_equivalent = None #

Markedness #

Bases: Metric

Computes the markedness metric, also known as \(\Delta p\).

It is defined as:

\[\Delta p=\text{precision}+NPV-1\]

where precision is the Positive Predictive Value (PPV).

Values fall in the range [-1, 1], with higher values corresponding to better performance and 0 corresponding to random performance.

It is commonly used in conjunction with a Reciever-Operator Curve analysis.

Examples:

  • markedness
  • delta_p@macro
Read more:
  1. Wikipedia

aliases = ['markedness', 'delta_p'] #

bounds = (-1.0, 1.0) #

dependencies = ('ppv', 'npv') #

is_multiclass = False #

sklearn_equivalent = None #

P4 #

Bases: Metric

Computes the P4 metric.

It is defined as:

\[\mathtt{P4}=4\left(\dfrac{1}{\text{precision}}+\dfrac{1}{\text{recall}}+\dfrac{1}{\text{specificity}}+\dfrac{1}{NPV}\right)^{-1}\]

where precision corresponds to the Positive Predictive Value (PPV), recall to the True Positive Rate (TPR), and specificity to the True Negative Rate (TNR). Put otherwise, it is the harmonic mean of the 4 listed metrics.

Introduced in 2022 by Sitarz, it is meant to extend the properties of the F1, Markedness and Informedness metrics. It is one of few defined metrics that incorporates the Negative Predictive Value.

Possible values lie in the range [0, 1], with a score of 0 implying one of the intermediate metrics is 0, and a 1 requiring perfect classification.

Relative to MCC, the author notes different behaviour at extreme values, but otherwise the metrics are meant to provide a similar amount of information with a single value.

Examples:

  • p4
  • p4@macro
Read more:
  1. Wikipedia

aliases = ['p4'] #

bounds = (0.0, 1.0) #

dependencies = ('ppv', 'tpr', 'tnr', 'npv') #

is_multiclass = False #

sklearn_equivalent = None #

JaccardIndex #

Bases: Metric

Computes the Jaccard Index, also known as the threat score.

It is defined as:

\[\mathtt{Jaccard}=\dfrac{TP}{TP+FP+FN}\]

where \(TP\) is the count of true positives, \(FP\) the count of false positives and \(FN\) the count of false negatives.

Alternatively, it may be defined as the area of overlap between predicted and conditions, divided by the area of all predicted and condition positives.

Due to the alternative definition, it is commonly used when labels are not readily present, for example in evaluating clustering performance.

Examples:

  • jaccard
  • critical_success_index@macro
Read more:
  1. Wikipedia

aliases = ['jaccard', 'jaccard_index', 'threat_score', 'critical_success_index'] #

bounds = (0.0, 1.0) #

dependencies = ('diag_mass', 'p_pred', 'p_condition') #

is_multiclass = False #

sklearn_equivalent = 'jaccard_score' #

PositiveLikelihoodRatio #

Bases: Metric

Computes the positive likelihood ratio.

It is defined as

\[\mathtt{LR}^{+}=\dfrac{\text{sensitivity}}{1-\text{specificity}}\]

where sensitivity is the True Positive Rate (TPR), and specificity is the True Negative Rate (TNR).

Simply put, it is the ratio of the probabilities of the model predicting a positive when the condition is positive and negative, respectively.

Possible values lie in the range [0.0, \(\infty\)], with 0.0 corresponding to no true positives, and infinity corresponding to no false positives. Larger values indicate better performance, with a score of 1 corresponding to random performance.

Examples:

  • plr
  • positive_likelihood_ratio@macro
Read more:
  1. Wikipedia

aliases = ['plr', 'positive_likelihood_ratio'] #

bounds = (0.0, float('inf')) #

dependencies = ('tpr', 'fpr') #

is_multiclass = False #

sklearn_equivalent = 'class_likelihood_ratios' #

LogPositiveLikelihoodRatio #

Bases: Metric

Computes the positive likelihood ratio.

It is defined as

\[\mathtt{LogLR}{+}=\log\dfrac{\text{sensitivity}}{1-\text{specificity}}\]

where sensitivity is the True Positive Rate (TPR), and specificity is the True Negative Rate (TNR).

Simply put, it is logarithm of the ratio of the probabilities of the model predicting a positive when the condition is positive and negative, respectively.

Possible values lie in the range (\(-\infty\), \(\infty\)), with \(-\infty\) corresponding to no true positives, and infinity corresponding to no false positives. Larger values indicate better performance, with a score of 0 corresponding to random performance.

Examples:

  • log_plr
  • lplr
  • log_positive_likelihood_ratio@macro
Read more:
  1. Wikipedia

aliases = ['log_plr', 'lplr', 'log_positive_likelihood_ratio'] #

bounds = (-float('inf'), float('inf')) #

dependencies = ('tpr', 'fpr') #

is_multiclass = False #

sklearn_equivalent = 'class_likelihood_ratios' #

NegativeLikelihoodRatio #

Bases: Metric

Computes the negative likelihood ratio.

It is defined as

\[\mathtt{LR}^{-}=\dfrac{1-\text{sensitivity}}{\text{specificity}}\]

where sensitivity is the True Positive Rate (TPR), and specificity is the True Negative Rate(TNR).

Simply put, it is the ratio of the probabilities of the model predicting a negative when the condition is positive and negative, respectively.

Possible values lie in the range [0.0, \(\infty\)], with 0.0 corresponding to no false negatives, and infinity corresponding to no true negatives. Smaller values indicate better performance, with a score of 1 corresponding to random performance.

Examples:

  • nlr
  • negative_likelihood_ratio@macro
Read more:
  1. Wikipedia

aliases = ['negative_likelihood_ratio', 'nlr'] #

bounds = (0.0, float('inf')) #

dependencies = ('fnr', 'tnr') #

is_multiclass = False #

sklearn_equivalent = 'class_likelihood_ratios' #

LogNegativeLikelihoodRatio #

Bases: Metric

Computes the negative likelihood ratio.

It is defined as

\[\mathtt{LogLR}{-}=\log \dfrac{1-\text{sensitivity}}{\text{specificity}}\]

where sensitivity is the True Positive Rate (TPR), and specificity is the True Negative Rate (TNR).

Simply put, it is the logarithm of the ratio of the probabilities of the model predicting a negative when the condition is positive and negative, respectively.

Possible values lie in the range (\(-\infty\), \(\infty\)), with \(-\infty\) corresponding to no true positives, and infinity corresponding to no true negatives. Smaller values indicate better performance, with a score of 0 corresponding to random performance.

Examples:

  • log_nlr
  • lnlr
  • log_negative_likelihood_ratio@macro
Read more:
  1. Wikipedia

aliases = ['lnlr', 'log_negative_likelihood_ratio', 'log_nlr'] #

bounds = (-float('inf'), float('inf')) #

dependencies = ('fnr', 'tnr') #

is_multiclass = False #

sklearn_equivalent = 'class_likelihood_ratios' #

DiagnosticOddsRatio #

Bases: Metric

Computes the diagnostic odds ratio.

It is defined as:

\[\mathtt{DOR}=\dfrac{\mathtt{LR}^{+}=}{\mathtt{LR}^{-}=}\]

where \(\mathtt{LR}^{+}=\) and \(\mathtt{LR}^{-}=\) are the positive and negative likelihood ratios, respectively.

Possible values lie in the range [0.0, \(\infty\)]. Larger values indicate better performance, with a score of 1 corresponding to random performance.

To make experiment aggregation easier, you can log transform this metric by specifying log_transform=true. This makes the sampling distribution essentially Gaussian.

Examples:

  • dor
  • diagnostic_odds_ratio@macro
Read more:
  1. Wikipedia

aliases = ['dor', 'diagnostic_odds_ratio'] #

bounds = (0.0, float('inf')) #

dependencies = ('nlr', 'plr') #

is_multiclass = False #

sklearn_equivalent = None #

LogDiagnosticOddsRatio #

Bases: Metric

Computes the diagnostic odds ratio.

It is defined as:

\[\mathtt{LogDOR}=\mathtt{LogLR}^{+}-\mathtt{LogLR}^{-}\]

where \(\mathtt{LR}^{+}\) and \(\mathtt{LR}^{-}=\) are the positive and negative likelihood ratios, respectively.

Possible values lie in the range (-\(\infty\), \(\infty\)). Larger values indicate better performance, with a score of 0 corresponding to random performance.

Examples:

  • log_dor
  • ldor
  • log_diagnostic_odds_ratio@macro
Read more:
  1. Wikipedia

aliases = ['log_dor', 'ldor', 'log_diagnostic_odds_ratio'] #

bounds = (-float('inf'), float('inf')) #

dependencies = ('log_plr', 'log_nlr') #

is_multiclass = False #

sklearn_equivalent = None #

PrevalenceThreshold #

Bases: Metric

Computes the prevalence threshold.

It is defined as:

\[\phi \mathtt{e}=\frac{\sqrt{\mathtt{TPR}\cdot(1-\mathtt{TNR})}+\mathtt{TNR}-1}{\mathtt{TPR}+\mathtt{TNR}-1}\]

where \(\mathtt{TPR}\) and \(\mathtt{TNR}\) are the true positive and negative rates, respectively.

Possible values lie in the range (0, 1). Larger values indicate worse performance, with a score of 0 corresponding to perfect classification, and a score of 1 to perfect misclassifcation.

It representents the inflection point in a sensitivity and specificity curve (ROC), beyond which a classifiers positive predictive value drops sharply. See Balayla (2020) for more information.

Examples:

  • pt
  • prevalence_threshold
Read more:
  1. Balayla, J. (2020). Prevalence threshold (\(\phi \mathtt{e}\)) and the geometry of screening curves. Plos one, 15(10), e0240215.

aliases = ['prev_thresh', 'pt', 'prevalence_threshold'] #

bounds = (0, 1) #

dependencies = ('tpr', 'tnr') #

is_multiclass = False #

sklearn_equivalent = None #