Skip to content

Classification metrics

A classification metric uses confusion matrix terminology (true positive, false positive, true negative, false negative) to classify each item predicted, and in general it needs a way to discern relevant items from non-relevant items for users

FMeasure(beta=1, relevant_threshold=None, sys_average='macro', precision=np.float64)

Bases: ClassificationMetric

The FMeasure metric combines Precision and Recall into a single metric. It is calculated as such for the single user:

\[ FMeasure_u = (1 + \beta^2) \cdot \frac{P_u \cdot R_u}{(\beta^2 \cdot P_u) + R_u} \]

Where:

  • \(P_u\) is the Precision calculated for the user u
  • \(R_u\) is the Recall calculated for the user u
  • \(\beta\) is a real factor which could weight differently Recall or Precision based on its value:

    • \(\beta = 1\): Equally weight Precision and Recall
    • \(\beta > 1\): Weight Recall more
    • \(\beta < 1\): Weight Precision more

A famous FMeasure is the F1 Metric, where \(\beta = 1\), which basically is the harmonic mean of recall and precision:

\[ F1_u = \frac{2 \cdot P_u \cdot R_u}{P_u + R_u} \]

The FMeasure metric is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ FMeasure_{sys} - micro = (1 + \beta^2) \cdot \frac{P_u \cdot R_u}{(\beta^2 \cdot P_u) + R_u} \]
\[ FMeasure_{sys} - macro = \frac{\sum_{u \in U} FMeasure_u}{|U|} \]
PARAMETER DESCRIPTION
beta

real factor which could weight differently Recall or Precision based on its value. Default is 1

TYPE: float DEFAULT: 1

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py
467
468
469
470
def __init__(self, beta: float = 1, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)
    self.__beta = beta

FMeasureAtK(k, beta=1, relevant_threshold=None, sys_average='macro')

Bases: FMeasure

The FMeasure@K metric combines Precision@K and Recall@K into a single metric. It is calculated as such for the single user:

\[ FMeasure@K_u = (1 + \beta^2) \cdot \frac{P@K_u \cdot R@K_u}{(\beta^2 \cdot P@K_u) + R@K_u} \]

Where:

  • \(P@K_u\) is the Precision at K calculated for the user u
  • \(R@K_u\) is the Recall at K calculated for the user u
  • \(\beta\) is a real factor which could weight differently Recall or Precision based on its value:

    • \(\beta = 1\): Equally weight Precision and Recall
    • \(\beta > 1\): Weight Recall more
    • \(\beta < 1\): Weight Precision more

A famous FMeasure@K is the F1@K Metric, where :math:\beta = 1, which basically is the harmonic mean of recall and precision:

\[ F1@K_u = \frac{2 \cdot P@K_u \cdot R@K_u}{P@K_u + R@K_u} \]

The FMeasure@K metric is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ FMeasure@K_{sys} - micro = (1 + \beta^2) \cdot \frac{P@K_u \cdot R@K_u}{(\beta^2 \cdot P@K_u) + R@K_u} \]
\[ FMeasure@K_{sys} - macro = \frac{\sum_{u \in U} FMeasure@K_u}{|U|} \]
PARAMETER DESCRIPTION
k

cutoff parameter. Will be used for the computation of Precision@K and Recall@K

TYPE: int

beta

real factor which could weight differently Recall or Precision based on its value. Default is 1

TYPE: float DEFAULT: 1

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py
552
553
554
555
556
def __init__(self, k: int, beta: int = 1, relevant_threshold: float = None, sys_average: str = 'macro'):
    super().__init__(beta, relevant_threshold, sys_average)
    if k < 1:
        raise ValueError('k={} not valid! k must be >= 1!'.format(k))
    self.__k = k

Precision(relevant_threshold=None, sys_average='macro', precision=np.float64)

Bases: ClassificationMetric

The Precision metric is calculated as such for the single user:

\[ Precision_u = \frac{tp_u}{tp_u + fp_u} \]

Where:

  • \(tp_u\) is the number of items which are in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'
  • \(fp_u\) is the number of items which are in the recommendation list of the user and have a rating < relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Precision_{sys} - micro = \frac{\sum_{u \in U} tp_u}{\sum_{u \in U} tp_u + \sum_{u \in U} fp_u} \]
\[ Precision_{sys} - macro = \frac{\sum_{u \in U} Precision_u}{|U|} \]
PARAMETER DESCRIPTION
relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py
156
157
158
def __init__(self, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super(Precision, self).__init__(relevant_threshold, sys_average, precision)

PrecisionAtK(k, relevant_threshold=None, sys_average='macro', precision=np.float64)

Bases: Precision

The Precision@K metric is calculated as such for the single user:

\[ Precision@K_u = \frac{tp@K_u}{tp@K_u + fp@K_u} \]

Where:

  • \(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'
  • \(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating < relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Precision@K_{sys} - micro = \frac{\sum_{u \in U} tp@K_u}{\sum_{u \in U} tp@K_u + \sum_{u \in U} fp@K_u} \]
\[ Precision@K_{sys} - macro = \frac{\sum_{u \in U} Precision@K_u}{|U|} \]
PARAMETER DESCRIPTION
k

cutoff parameter. Only the first k items of the recommendation list will be considered

TYPE: int

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py
216
217
218
219
220
221
def __init__(self, k: int, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)
    if k < 1:
        raise ValueError('k={} not valid! k must be >= 1!'.format(k))
    self.__k = k

RPrecision(relevant_threshold=None, sys_average='macro', precision=np.float64)

Bases: Precision

The R-Precision metric is calculated as such for the single user:

\[ R-Precision_u = \frac{tp@R_u}{tp@R_u + fp@R_u} \]

Where:

  • \(R\) it's the number of relevant items for the user u
  • \(tp@R_u\) is the number of items which are in the recommendation list of the user cutoff to the first R items and have a rating >= relevant_threshold in its 'ground truth'
  • \(tp@R_u\) is the number of items which are in the recommendation list of the user cutoff to the first R items and have a rating < relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Precision@R_{sys} - micro = \frac{\sum_{u \in U} tp@R_u}{\sum_{u \in U} tp@R_u + \sum_{u \in U} fp@R_u} \]
\[ Precision@R_{sys} - macro = \frac{\sum_{u \in U} R-Precision_u}{|U|} \]
PARAMETER DESCRIPTION
relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py
278
279
280
def __init__(self, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)

Recall(relevant_threshold=None, sys_average='macro', precision=np.float64)

Bases: ClassificationMetric

The Recall metric is calculated as such for the single user:

\[ Recall_u = \frac{tp_u}{tp_u + fn_u} \]

Where:

  • \(tp_u\) is the number of items which are in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'
  • \(fn_u\) is the number of items which are NOT in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Recall_{sys} - micro = \frac{\sum_{u \in U} tp_u}{\sum_{u \in U} tp_u + \sum_{u \in U} fn_u} \]
\[ Recall_{sys} - macro = \frac{\sum_{u \in U} Recall_u}{|U|} \]
PARAMETER DESCRIPTION
relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py
334
335
336
def __init__(self, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)

RecallAtK(k, relevant_threshold=None, sys_average='macro', precision=np.float64)

Bases: Recall

The Recall@K metric is calculated as such for the single user:

\[ Recall@K_u = \frac{tp@K_u}{tp@K_u + fn@K_u} \]

Where:

  • \(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'
  • \(tp@K_u\) is the number of items which are NOT in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Recall@K_{sys} - micro = \frac{\sum_{u \in U} tp@K_u}{\sum_{u \in U} tp@K_u + \sum_{u \in U} fn@K_u} \]
\[ Recall@K_{sys} - macro = \frac{\sum_{u \in U} Recall@K_u}{|U|} \]
PARAMETER DESCRIPTION
k

cutoff parameter. Only the first k items of the recommendation list will be considered

TYPE: int

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py
393
394
395
396
397
398
def __init__(self, k: int, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)
    if k < 1:
        raise ValueError('k={} not valid! k must be >= 1!'.format(k))
    self.__k = k