Classification metrics

A classification metric uses confusion matrix terminology (true positive, false positive, true negative, false negative) to classify each item predicted, and in general it needs a way to discern relevant items from non-relevant items for users

`FMeasure(beta=1, relevant_threshold=None, sys_average='macro', precision=np.float64)`

Bases: ClassificationMetric

The FMeasure metric combines Precision and Recall into a single metric. It is calculated as such for the single user:

\[ FMeasure_u = (1 + \beta^2) \cdot \frac{P_u \cdot R_u}{(\beta^2 \cdot P_u) + R_u} \]

Where:

\(P_u\) is the Precision calculated for the user u
\(R_u\) is the Recall calculated for the user u
\(\beta\) is a real factor which could weight differently Recall or Precision based on its value:
- \(\beta = 1\): Equally weight Precision and Recall
- \(\beta > 1\): Weight Recall more
- \(\beta < 1\): Weight Precision more

A famous FMeasure is the F1 Metric, where \(\beta = 1\), which basically is the harmonic mean of recall and precision:

\[ F1_u = \frac{2 \cdot P_u \cdot R_u}{P_u + R_u} \]

The FMeasure metric is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ FMeasure_{sys} - micro = (1 + \beta^2) \cdot \frac{P_u \cdot R_u}{(\beta^2 \cdot P_u) + R_u} \]

\[ FMeasure_{sys} - macro = \frac{\sum_{u \in U} FMeasure_u}{|U|} \]

PARAMETER DESCRIPTION

beta

real factor which could weight differently Recall or Precision based on its value. Default is 1

TYPE: float DEFAULT: 1

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py

def __init__(self, beta: float = 1, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)
    self.__beta = beta

`FMeasureAtK(k, beta=1, relevant_threshold=None, sys_average='macro')`

Bases: FMeasure

The FMeasure@K metric combines Precision@K and Recall@K into a single metric. It is calculated as such for the single user:

\[ FMeasure@K_u = (1 + \beta^2) \cdot \frac{P@K_u \cdot R@K_u}{(\beta^2 \cdot P@K_u) + R@K_u} \]

Where:

\(P@K_u\) is the Precision at K calculated for the user u
\(R@K_u\) is the Recall at K calculated for the user u
\(\beta\) is a real factor which could weight differently Recall or Precision based on its value:
- \(\beta = 1\): Equally weight Precision and Recall
- \(\beta > 1\): Weight Recall more
- \(\beta < 1\): Weight Precision more

A famous FMeasure@K is the F1@K Metric, where :math:\beta = 1, which basically is the harmonic mean of recall and precision:

\[ F1@K_u = \frac{2 \cdot P@K_u \cdot R@K_u}{P@K_u + R@K_u} \]

The FMeasure@K metric is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ FMeasure@K_{sys} - micro = (1 + \beta^2) \cdot \frac{P@K_u \cdot R@K_u}{(\beta^2 \cdot P@K_u) + R@K_u} \]

\[ FMeasure@K_{sys} - macro = \frac{\sum_{u \in U} FMeasure@K_u}{|U|} \]

PARAMETER	DESCRIPTION
`k`	cutoff parameter. Will be used for the computation of Precision@K and Recall@K TYPE: `int`
`beta`	real factor which could weight differently Recall or Precision based on its value. Default is 1 TYPE: `float` DEFAULT: `1`
`relevant_threshold`	parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used TYPE: `float` DEFAULT: `None`
`sys_average`	specify how the system average must be computed. Default is 'macro' TYPE: `str` DEFAULT: `'macro'`

Source code in clayrs/evaluation/metrics/classification_metrics.py

def __init__(self, k: int, beta: int = 1, relevant_threshold: float = None, sys_average: str = 'macro'):
    super().__init__(beta, relevant_threshold, sys_average)
    if k < 1:
        raise ValueError('k={} not valid! k must be >= 1!'.format(k))
    self.__k = k

`Precision(relevant_threshold=None, sys_average='macro', precision=np.float64)`

Bases: ClassificationMetric

The Precision metric is calculated as such for the single user:

\[ Precision_u = \frac{tp_u}{tp_u + fp_u} \]

Where:

\(tp_u\) is the number of items which are in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'
\(fp_u\) is the number of items which are in the recommendation list of the user and have a rating < relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Precision_{sys} - micro = \frac{\sum_{u \in U} tp_u}{\sum_{u \in U} tp_u + \sum_{u \in U} fp_u} \]

\[ Precision_{sys} - macro = \frac{\sum_{u \in U} Precision_u}{|U|} \]

PARAMETER DESCRIPTION

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py

def __init__(self, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super(Precision, self).__init__(relevant_threshold, sys_average, precision)

`PrecisionAtK(k, relevant_threshold=None, sys_average='macro', precision=np.float64)`

Bases: Precision

The Precision@K metric is calculated as such for the single user:

\[ Precision@K_u = \frac{tp@K_u}{tp@K_u + fp@K_u} \]

Where:

\(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'
\(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating < relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Precision@K_{sys} - micro = \frac{\sum_{u \in U} tp@K_u}{\sum_{u \in U} tp@K_u + \sum_{u \in U} fp@K_u} \]

\[ Precision@K_{sys} - macro = \frac{\sum_{u \in U} Precision@K_u}{|U|} \]

PARAMETER DESCRIPTION

k

cutoff parameter. Only the first k items of the recommendation list will be considered

TYPE: int

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py

def __init__(self, k: int, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)
    if k < 1:
        raise ValueError('k={} not valid! k must be >= 1!'.format(k))
    self.__k = k

`RPrecision(relevant_threshold=None, sys_average='macro', precision=np.float64)`

Bases: Precision

The R-Precision metric is calculated as such for the single user:

\[ R-Precision_u = \frac{tp@R_u}{tp@R_u + fp@R_u} \]

Where:

\(R\) it's the number of relevant items for the user u
\(tp@R_u\) is the number of items which are in the recommendation list of the user cutoff to the first R items and have a rating >= relevant_threshold in its 'ground truth'
\(tp@R_u\) is the number of items which are in the recommendation list of the user cutoff to the first R items and have a rating < relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Precision@R_{sys} - micro = \frac{\sum_{u \in U} tp@R_u}{\sum_{u \in U} tp@R_u + \sum_{u \in U} fp@R_u} \]

\[ Precision@R_{sys} - macro = \frac{\sum_{u \in U} R-Precision_u}{|U|} \]

PARAMETER DESCRIPTION

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py

def __init__(self, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)

`Recall(relevant_threshold=None, sys_average='macro', precision=np.float64)`

Bases: ClassificationMetric

The Recall metric is calculated as such for the single user:

\[ Recall_u = \frac{tp_u}{tp_u + fn_u} \]

Where:

\(tp_u\) is the number of items which are in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'
\(fn_u\) is the number of items which are NOT in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Recall_{sys} - micro = \frac{\sum_{u \in U} tp_u}{\sum_{u \in U} tp_u + \sum_{u \in U} fn_u} \]

\[ Recall_{sys} - macro = \frac{\sum_{u \in U} Recall_u}{|U|} \]

PARAMETER DESCRIPTION

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py

def __init__(self, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)

`RecallAtK(k, relevant_threshold=None, sys_average='macro', precision=np.float64)`

Bases: Recall

The Recall@K metric is calculated as such for the single user:

\[ Recall@K_u = \frac{tp@K_u}{tp@K_u + fn@K_u} \]

Where:

\(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'
\(tp@K_u\) is the number of items which are NOT in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'

And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:

\[ Recall@K_{sys} - micro = \frac{\sum_{u \in U} tp@K_u}{\sum_{u \in U} tp@K_u + \sum_{u \in U} fn@K_u} \]

\[ Recall@K_{sys} - macro = \frac{\sum_{u \in U} Recall@K_u}{|U|} \]

PARAMETER DESCRIPTION

k

cutoff parameter. Only the first k items of the recommendation list will be considered

TYPE: int

relevant_threshold

parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used

TYPE: float DEFAULT: None

sys_average

specify how the system average must be computed. Default is 'macro'

TYPE: str DEFAULT: 'macro'

Source code in clayrs/evaluation/metrics/classification_metrics.py

def __init__(self, k: int, relevant_threshold: float = None, sys_average: str = 'macro',
             precision: [Callable] = np.float64):
    super().__init__(relevant_threshold, sys_average, precision)
    if k < 1:
        raise ValueError('k={} not valid! k must be >= 1!'.format(k))
    self.__k = k