Fairness metrics
Fairness metrics evaluate how unbiased the recommendation lists are (e.g. unbiased towards popularity of the items)
CatalogCoverage(catalog, top_n=None, k=None)
Bases: PredictionCoverage
The Catalog Coverage metric measures in percentage how many distinct items are being recommended in relation to all available items. It's a system wide metric, so only its result it will be returned and not those of every user. It differs from the Prediction Coverage since it allows for different parameters to come into play. If no parameter is passed then it's a simple Prediction Coverage. The metric is calculated as such:
Where:
- \(N\) is the total number of users
- \(reclist(u_j)\) is the set of items contained in the recommendation list of user \(j\)
- \(I\) is the set of all available items
The \(I\) must be specified through the 'catalog' parameter
The recommendation list of every user (\(reclist(u_j)\)) can be reduced to the first n parameter with the top-n parameter, so that catalog coverage is measured considering only the most highest ranked items.
With the 'k' parameter one could specify the number of users that will be used to calculate catalog coverage: k users will be randomly sampled and their recommendation lists will be used. The formula above becomes:
Where:
- \(k\) is the parameter specified
Obviously 'k' < N, else simply recommendation lists of all users will be used
Check the 'Beyond Accuracy: Evaluating Recommender Systems by Coverage and Serendipity' paper and page 13 of the 'Comparison of group recommendation algorithms' paper for more
PARAMETER | DESCRIPTION |
---|---|
catalog |
set of item id of the catalog on which the prediction coverage must be computed |
top_n |
it's a cutoff parameter, if specified the Catalog Coverage will be calculated considering only the first 'n' items of every recommendation list of all users. Default is None
TYPE:
|
k |
number of users randomly sampled. If specified, k users will be randomly sampled across all users and only their recommendation lists will be used to compute the CatalogCoverage
TYPE:
|
Source code in clayrs/evaluation/metrics/fairness_metrics.py
360 361 362 363 |
|
DeltaGap(user_groups, user_profiles, original_ratings, top_n=None, pop_percentage=0.2)
Bases: GroupFairnessMetric
The Delta GAP (Group Average popularity) metric lets you compare the average popularity "requested" by one or multiple groups of users and the average popularity "obtained" with the recommendation given by the recsys. It's a system wise metric and results of every group will be returned.
It is calculated as such:
Users are split into groups based on the user_groups parameter, which contains names of the groups as keys, and percentage of how many user must contain a group as values. For example:
user_groups = {'popular_users': 0.3, 'medium_popular_users': 0.2, 'low_popular_users': 0.5}
Every user will be inserted in a group based on how many popular items the user has rated (in relation to the percentage of users we specified as value in the dictionary):
- users with many popular items will be inserted into the first group
- users with niche items rated will be inserted into one of the last groups.
In general users are grouped by \(Popularity\_ratio\) in a descending order. \(Popularity\_ratio\) for a single user \(u\) is defined as:
The most popular items are the first pop_percentage
% items of all items ordered in a descending order by
popularity.
The popularity of an item is defined as the number of times it is rated in the original_ratings
parameter
divided by the total number of users in the original_ratings
.
It can happen that for a particular user of a group no recommendation are available: in that case it will be skipped and it won't be considered in the \(\Delta GAP\) computation of its group. In case no user of a group has recs available, a warning will be printed and the whole group won't be considered.
If the 'top_n' parameter is specified, then the \(\Delta GAP\) will be calculated considering only the first n items of every recommendation list of all users
PARAMETER | DESCRIPTION |
---|---|
user_groups |
Dict containing group names as keys and percentage of users as value, used to split users in groups. Users with more popular items rated are grouped into the first group, users with slightly less popular items rated are grouped into the second one, etc. |
user_profiles |
one or more |
original_ratings |
TYPE:
|
top_n |
it's a cutoff parameter, if specified the Gini index will be calculated considering only their first 'n' items of every recommendation list of all users. Default is None
TYPE:
|
pop_percentage |
How many (in percentage) most popular items must be considered. Default is 0.2
TYPE:
|
Source code in clayrs/evaluation/metrics/fairness_metrics.py
464 465 466 467 468 469 470 471 472 473 474 475 476 |
|
calculate_delta_gap(recs_gap, profile_gap)
staticmethod
Compute the ratio between the recommendation gap and the user profiles gap
PARAMETER | DESCRIPTION |
---|---|
recs_gap |
recommendation gap
TYPE:
|
profile_gap |
user profiles gap
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
score
|
delta gap measure
TYPE:
|
Source code in clayrs/evaluation/metrics/fairness_metrics.py
516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 |
|
calculate_gap(group, avg_pop_by_users)
staticmethod
Compute the GAP (Group Average Popularity) formula
Where:
- \(G\) is the set of users
- \(i_u\) is the set of items rated/recommended by/to user \(u\)
- \(pop_i\) is the popularity of item i
PARAMETER | DESCRIPTION |
---|---|
group |
the set of users (user_id) |
avg_pop_by_users |
average popularity by user |
RETURNS | DESCRIPTION |
---|---|
score
|
gap score
TYPE:
|
Source code in clayrs/evaluation/metrics/fairness_metrics.py
488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 |
|
GiniIndex(top_n=None)
Bases: FairnessMetric
The Gini Index metric measures inequality in recommendation lists. It's a system wide metric, so only its result it will be returned and not those of every user. The metric is calculated as such:
Where:
- \(n\) is the total number of distinct items that are being recommended
- \(x_i\) is the number of times that the item \(i\) has been recommended
A perfectly equal recommender system should recommend every item the same number of times, in which case the Gini index would be equal to 0. The more the recsys is "disegual", the more the Gini Index is closer to 1
If the 'top_n' parameter is specified, then the Gini index will measure inequality considering only the first n items of every recommendation list of all users
PARAMETER | DESCRIPTION |
---|---|
top_n |
it's a cutoff parameter, if specified the Gini index will be calculated considering only the first 'n' items of every recommendation list of all users. Default is None
TYPE:
|
Source code in clayrs/evaluation/metrics/fairness_metrics.py
188 189 |
|
GroupFairnessMetric(user_groups)
Bases: FairnessMetric
Abstract class for fairness metrics based on user groups
It has some concrete methods useful for group divisions, since every subclass needs to split users into groups.
PARAMETER | DESCRIPTION |
---|---|
user_groups |
Dict containing group names as keys and percentage of users as value, used to split users in groups. Users with more popular items rated are grouped into the first group, users with slightly less popular items rated are grouped into the second one, etc. |
Source code in clayrs/evaluation/metrics/fairness_metrics.py
43 44 |
|
get_avg_pop_by_users(data, pop_by_items, group=None)
staticmethod
Get the average popularity for each user in the data
parameter.
Average popularity of a single user \(u\) is defined as:
PARAMETER | DESCRIPTION |
---|---|
data |
The
TYPE:
|
pop_by_items |
popularity for each label ('label', 'popularity')
TYPE:
|
group |
(optional) the set of users (user_id) |
RETURNS | DESCRIPTION |
---|---|
Dict[str, float]
|
Python dictionary containing as keys each user id and as values the average popularity of each user |
Source code in clayrs/evaluation/metrics/fairness_metrics.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
split_user_in_groups(score_frame, groups, pop_items)
staticmethod
Users are split into groups based on the groups parameter, which contains names of the groups as keys, and percentage of how many user must contain a group as values. For example:
groups = {'popular_users': 0.3, 'medium_popular_users': 0.2, 'low_popular_users': 0.5}
Every user will be inserted in a group based on how many popular items the user has rated (in relation to the percentage of users we specified as value in the dictionary):
- users with many popular items will be inserted into the first group
- users with niche items rated will be inserted into one of the last groups.
In general users are grouped by \(Popularity\_ratio\) in a descending order. \(Popularity\_ratio\) for a single user \(u\) is defined as:
The most popular items are the first pop_percentage
% items of all items ordered in a descending order by
popularity.
The popularity of an item is defined as the number of times it is rated in the original_ratings
parameter
divided by the total number of users in the original_ratings
.
PARAMETER | DESCRIPTION |
---|---|
score_frame |
the Ratings object
TYPE:
|
groups |
each key contains the name of the group and each value contains the percentage of the specified group. If the groups don't cover the entire user collection, the rest of the users are considered in a 'default_diverse' group |
pop_items |
set of most popular item_id labels |
RETURNS | DESCRIPTION |
---|---|
Dict[str, Set[str]]
|
A python dictionary containing as keys each group name and as values the set of user_id belonging to the particular group. |
Source code in clayrs/evaluation/metrics/fairness_metrics.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
PredictionCoverage(catalog)
Bases: FairnessMetric
The Prediction Coverage metric measures in percentage how many distinct items are being recommended in relation to all available items. It's a system wise metric, so only its result it will be returned and not those of every user. The metric is calculated as such:
Where:
- \(I\) is the set of all available items
- \(I_p\) is the set of recommended items
The \(I\) must be specified through the 'catalog' parameter
Check the 'Beyond Accuracy: Evaluating Recommender Systems by Coverage and Serendipity' paper for more
PARAMETER | DESCRIPTION |
---|---|
catalog |
set of item id of the catalog on which the prediction coverage must be computed |
Source code in clayrs/evaluation/metrics/fairness_metrics.py
270 271 |
|