Classification metrics
A classification metric uses confusion matrix terminology (true positive, false positive, true negative, false negative) to classify each item predicted, and in general it needs a way to discern relevant items from non-relevant items for users
FMeasure(beta=1, relevant_threshold=None, sys_average='macro', precision=np.float64)
Bases: ClassificationMetric
The FMeasure metric combines Precision and Recall into a single metric. It is calculated as such for the single user:
Where:
- \(P_u\) is the Precision calculated for the user u
- \(R_u\) is the Recall calculated for the user u
-
\(\beta\) is a real factor which could weight differently Recall or Precision based on its value:
- \(\beta = 1\): Equally weight Precision and Recall
- \(\beta > 1\): Weight Recall more
- \(\beta < 1\): Weight Precision more
A famous FMeasure is the F1 Metric, where \(\beta = 1\), which basically is the harmonic mean of recall and precision:
The FMeasure metric is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:
PARAMETER | DESCRIPTION |
---|---|
beta |
real factor which could weight differently Recall or Precision based on its value. Default is 1
TYPE:
|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
sys_average |
specify how the system average must be computed. Default is 'macro'
TYPE:
|
Source code in clayrs/evaluation/metrics/classification_metrics.py
467 468 469 470 |
|
FMeasureAtK(k, beta=1, relevant_threshold=None, sys_average='macro')
Bases: FMeasure
The FMeasure@K metric combines Precision@K and Recall@K into a single metric. It is calculated as such for the single user:
Where:
- \(P@K_u\) is the Precision at K calculated for the user u
- \(R@K_u\) is the Recall at K calculated for the user u
-
\(\beta\) is a real factor which could weight differently Recall or Precision based on its value:
- \(\beta = 1\): Equally weight Precision and Recall
- \(\beta > 1\): Weight Recall more
- \(\beta < 1\): Weight Precision more
A famous FMeasure@K is the F1@K Metric, where :math:\beta = 1
, which basically is the harmonic mean of recall and
precision:
The FMeasure@K metric is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:
PARAMETER | DESCRIPTION |
---|---|
k |
cutoff parameter. Will be used for the computation of Precision@K and Recall@K
TYPE:
|
beta |
real factor which could weight differently Recall or Precision based on its value. Default is 1
TYPE:
|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
sys_average |
specify how the system average must be computed. Default is 'macro'
TYPE:
|
Source code in clayrs/evaluation/metrics/classification_metrics.py
552 553 554 555 556 |
|
Precision(relevant_threshold=None, sys_average='macro', precision=np.float64)
Bases: ClassificationMetric
The Precision metric is calculated as such for the single user:
Where:
- \(tp_u\) is the number of items which are in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'
- \(fp_u\) is the number of items which are in the recommendation list of the user and have a rating < relevant_threshold in its 'ground truth'
And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:
PARAMETER | DESCRIPTION |
---|---|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
sys_average |
specify how the system average must be computed. Default is 'macro'
TYPE:
|
Source code in clayrs/evaluation/metrics/classification_metrics.py
156 157 158 |
|
PrecisionAtK(k, relevant_threshold=None, sys_average='macro', precision=np.float64)
Bases: Precision
The Precision@K metric is calculated as such for the single user:
Where:
- \(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'
- \(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating < relevant_threshold in its 'ground truth'
And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:
PARAMETER | DESCRIPTION |
---|---|
k |
cutoff parameter. Only the first k items of the recommendation list will be considered
TYPE:
|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
sys_average |
specify how the system average must be computed. Default is 'macro'
TYPE:
|
Source code in clayrs/evaluation/metrics/classification_metrics.py
216 217 218 219 220 221 |
|
RPrecision(relevant_threshold=None, sys_average='macro', precision=np.float64)
Bases: Precision
The R-Precision metric is calculated as such for the single user:
Where:
- \(R\) it's the number of relevant items for the user u
- \(tp@R_u\) is the number of items which are in the recommendation list of the user cutoff to the first R items and have a rating >= relevant_threshold in its 'ground truth'
- \(tp@R_u\) is the number of items which are in the recommendation list of the user cutoff to the first R items and have a rating < relevant_threshold in its 'ground truth'
And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:
PARAMETER | DESCRIPTION |
---|---|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
sys_average |
specify how the system average must be computed. Default is 'macro'
TYPE:
|
Source code in clayrs/evaluation/metrics/classification_metrics.py
278 279 280 |
|
Recall(relevant_threshold=None, sys_average='macro', precision=np.float64)
Bases: ClassificationMetric
The Recall metric is calculated as such for the single user:
Where:
- \(tp_u\) is the number of items which are in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'
- \(fn_u\) is the number of items which are NOT in the recommendation list of the user and have a rating >= relevant_threshold in its 'ground truth'
And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:
PARAMETER | DESCRIPTION |
---|---|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
sys_average |
specify how the system average must be computed. Default is 'macro'
TYPE:
|
Source code in clayrs/evaluation/metrics/classification_metrics.py
334 335 336 |
|
RecallAtK(k, relevant_threshold=None, sys_average='macro', precision=np.float64)
Bases: Recall
The Recall@K metric is calculated as such for the single user:
Where:
- \(tp@K_u\) is the number of items which are in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'
- \(tp@K_u\) is the number of items which are NOT in the recommendation list of the user cutoff to the first K items and have a rating >= relevant_threshold in its 'ground truth'
And it is calculated as such for the entire system, depending if 'macro' average or 'micro' average has been chosen:
PARAMETER | DESCRIPTION |
---|---|
k |
cutoff parameter. Only the first k items of the recommendation list will be considered
TYPE:
|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
sys_average |
specify how the system average must be computed. Default is 'macro'
TYPE:
|
Source code in clayrs/evaluation/metrics/classification_metrics.py
393 394 395 396 397 398 |
|