Ranking metrics
Ranking metrics evaluate the quality of the recommendation lists
Correlation(method='pearson', top_n=None)
Bases: RankingMetric
The Correlation metric calculates the correlation between the ranking of a user and its ideal ranking. The currently correlation methods implemented are:
pearson
kendall
spearman
Every correlation method is implemented by the pandas library, so read its documentation for more
The correlation metric is calculated as such for the single user:
Where:
- \(ranking_u\) is ranking of the user
- \(ideal\_ranking_u\) is the ideal ranking for the user
The ideal ranking is calculated based on the rating inside the ground truth of the user
The Correlation metric calculated for the entire system is simply the average of every \(Corr\):
Where:
- \(Corr_u\) is the correlation of the user \(u\)
- \(U\) is the set of all users
The system average excludes NaN values.
It's also possible to specify a cutoff parameter thanks to the 'top_n' parameter: if specified, only the first \(n\) results of the recommendation list will be used in order to calculate the correlation
PARAMETER | DESCRIPTION |
---|---|
method |
The correlation method to use. It must be 'pearson', 'kendall' or 'spearman', otherwise a ValueError exception is raised. By default is 'pearson'
TYPE:
|
top_n |
Cutoff parameter, if specified only the first n items of the recommendation list will be used in order to calculate the correlation
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if an invalid method parameter is passed |
Source code in clayrs/evaluation/metrics/ranking_metrics.py
571 572 573 574 575 576 577 578 579 |
|
MAP(relevant_threshold=None)
Bases: RankingMetric
The \(MAP\) metric (Mean average Precision) is a ranking metric computed by first calculating the \(AP\) (Average Precision) for each user and then taking its mean.
The \(AP\) is calculated as such for the single user:
Where:
- \(m_u\) is the number of relevant items for the user \(u\)
- \(N_u\) is the number of recommended items for the user \(u\)
- \(P(i)\) is the precision computed at cutoff \(i\)
- \(rel(i)\) is an indicator variable that says whether the i-th item is relevant (\(rel(i)=1\)) or not (\(rel(i)=0\))
After computing the \(AP\) for each user, we can compute the \(MAP\) for the whole system:
This metric will return the \(AP\) computed for each user in the dataframe containing users results, and the \(MAP\) computed for the whole system in the dataframe containing system results
PARAMETER | DESCRIPTION |
---|---|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
Source code in clayrs/evaluation/metrics/ranking_metrics.py
396 397 |
|
MAPAtK(k, relevant_threshold=None)
Bases: MAP
The \(MAP@K\) metric (Mean average Precision At K) is a ranking metric computed by first calculating the \(AP@K\) (Average Precision At K) for each user and then taking its mean.
The \(AP@K\) is calculated as such for the single user:
Where:
- \(m_u\) is the number of relevant items for the user \(u\)
- \(K\) is the cutoff value
- \(P(i)\) is the precision computed at cutoff \(i\)
- \(rel(i)\) is an indicator variable that says whether the i-th item is relevant (\(rel(i)=1\)) or not (\(rel(i)=0\))
After computing the \(AP@K\) for each user, we can compute the \(MAP@K\) for the whole system:
This metric will return the \(AP@K\) computed for each user in the dataframe containing users results, and the \(MAP@K\) computed for the whole system in the dataframe containing system results
PARAMETER | DESCRIPTION |
---|---|
k |
the cutoff parameter. It must be >= 1, otherwise a ValueError exception is raised
TYPE:
|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
Source code in clayrs/evaluation/metrics/ranking_metrics.py
503 504 505 |
|
MRR(relevant_threshold=None)
Bases: RankingMetric
The MRR (Mean Reciprocal Rank) metric is a system wide metric, so only its result it will be returned and not those of every user. MRR is calculated as such:
Where:
- \(Q\) is the set of recommendation lists
- \(rank(i)\) is the position of the first relevant item in the i-th recommendation list
The MRR metric needs to discern relevant items from the not relevant ones: in order to do that, one could pass a
custom relevant_threshold
parameter that will be applied to every user, so that if a rating of an item
is >= relevant_threshold, then it's relevant, otherwise it's not.
If no relevant_threshold
parameter is passed then, for every user, its mean rating score will be used
PARAMETER | DESCRIPTION |
---|---|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
Source code in clayrs/evaluation/metrics/ranking_metrics.py
233 234 |
|
calc_reciprocal_rank(user_predictions_items, user_truth_relevant_items)
Method which calculates the RR (Reciprocal Rank) for a single user
PARAMETER | DESCRIPTION |
---|---|
user_predictions_items |
list of ranked item ids for the user computed by the Recommender |
user_truth_relevant_items |
list of relevant item ids for the user in its truth set |
Source code in clayrs/evaluation/metrics/ranking_metrics.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|
MRRAtK(k, relevant_threshold=None)
Bases: MRR
The MRR@K (Mean Reciprocal Rank at K) metric is a system wide metric, so only its result will be returned and not those of every user. MRR@K is calculated as such
Where:
- \(K\) is the cutoff parameter
- \(Q\) is the set of recommendation lists
- \(rank(i)\) is the position of the first relevant item in the i-th recommendation list
PARAMETER | DESCRIPTION |
---|---|
k |
the cutoff parameter. It must be >= 1, otherwise a ValueError exception is raised
TYPE:
|
relevant_threshold |
parameter needed to discern relevant items and non-relevant items for every user. If not specified, the mean rating score of every user will be used
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if an invalid cutoff parameter is passed (0 or negative) |
Source code in clayrs/evaluation/metrics/ranking_metrics.py
335 336 337 338 339 |
|
calc_reciprocal_rank(user_predictions_items, user_truth_relevant_items)
Method which calculates the RR (Reciprocal Rank) for a single user
PARAMETER | DESCRIPTION |
---|---|
user_predictions_items |
list of ranked item ids for the user computed by the Recommender |
user_truth_relevant_items |
list of relevant item ids for the user in its truth set |
Source code in clayrs/evaluation/metrics/ranking_metrics.py
351 352 353 354 355 356 357 358 359 360 361 |
|
NDCG(gains='linear', discount_log=np.log2)
Bases: RankingMetric
The NDCG (Normalized Discounted Cumulative Gain) metric is calculated for the single user by first computing the DCG score using the following formula:
Where:
- \(scores_{u}\) are the ground truth scores for predicted items, ordered according to the order of said items in the ranking for the user \(u\)
- \(f\) is a gain function (linear or exponential, in particular)
- \(x\) is the base of the logarithm
- \(i\) is the index of the truth score \(r\) in the list of scores \(scores_{u}\)
If \(f\) is "linear", then the truth score \(r\) is returned as is. Otherwise, in the "exponential" case, the following formula is applied to \(r\):
The NDCG for a single user is then calculated using the following formula:
Where:
- \(IDCG_{u}\) is the DCG of the ideal ranking for the truth scores
So the basic idea is to compare the actual ranking with the ideal one
Finally, the NDCG of the entire system is calculated instead as such:
Where:
- \(NDCG_u\) is the NDCG calculated for user :math:
u
- \(U\) is the set of all users
The system average excludes NaN values.
PARAMETER | DESCRIPTION |
---|---|
gains |
type of gain function to use when calculating the DCG score, the possible options are "linear" or "exponential"
TYPE:
|
discount_log |
logarithm function to use when calculating the DCG score, by default numpy logarithm in base 2 is used
TYPE:
|
Source code in clayrs/evaluation/metrics/ranking_metrics.py
77 78 79 80 81 82 83 84 85 86 |
|
NDCGAtK(k, gains='linear', discount_log=np.log2)
Bases: NDCG
The NDCG@K (Normalized Discounted Cumulative Gain at K) metric is calculated for the single user by using the framework implementation of the NDCG but considering \(scores_{u}\) cut at the first \(k\) predictions
PARAMETER | DESCRIPTION |
---|---|
k |
the cutoff parameter
TYPE:
|
gains |
type of gain function to use when calculating the DCG score, the possible options are "linear" or "exponential"
TYPE:
|
discount_log |
logarithm function to use when calculating the DCG score, by default numpy logarithm in base 2 is used
TYPE:
|
Source code in clayrs/evaluation/metrics/ranking_metrics.py
179 180 181 182 |
|