Skip to content

Plot metrics

Plot metrics save a plot in the chosen output directory

LongTailDistr(out_dir='.', file_name='long_tail_distr', on='truth', format='png', overwrite=False)

Bases: PlotMetric

This metric generates the Long Tail Distribution plot and saves it in the output directory with the file name specified. The plot can be generated both for the truth set or the predictions set (based on the on parameter):

  • on = 'truth': in this case the long tail distribution is useful to see which are the most popular items (the most rated ones)

  • on = 'pred': in this case the long tail distribution is useful to see which are the most recommended items

The plot file will be saved as out_dir/file_name.format

Since multiple split could be evaluated at once, the overwrite parameter comes into play: if is set to False, file with the same name will be saved as file_name (1).format, file_name (2).format, etc. so that for every split a plot is generated without overwriting any file previously generated

PARAMETER DESCRIPTION
out_dir

Directory where the plot will be saved. Default is '.', meaning that the plot will be saved in the same directory where the python script it's being executed

TYPE: str DEFAULT: '.'

file_name

Name of the plot file. Default is 'long_tail_distr'

TYPE: str DEFAULT: 'long_tail_distr'

on

Set on which the Long Tail Distribution plot will be generated. Values accepted are 'truth' or 'pred'

TYPE: str DEFAULT: 'truth'

format

Format of the plot file. Could be 'jpg', 'svg', 'png'. Default is 'png'

TYPE: str DEFAULT: 'png'

overwrite

parameter which specifies if the plot saved must overwrite any file that as the same name ('file_name.format'). Default is False

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
ValueError

exception raised when a invalid value for the 'on' parameter is specified

Source code in clayrs/evaluation/metrics/plot_metrics.py
115
116
117
118
119
120
121
122
123
def __init__(self, out_dir: str = '.', file_name: str = 'long_tail_distr', on: str = 'truth', format: str = 'png',
             overwrite: bool = False):
    valid = {'truth', 'pred'}
    self.__on = on.lower()

    if self.__on not in valid:
        raise ValueError("on={} is not supported! Long Tail can be calculated only on:\n"
                         "{}".format(on, valid))
    super().__init__(out_dir, file_name, format, overwrite)

PopRatioProfileVsRecs(user_groups, user_profiles, original_ratings, out_dir='.', file_name='pop_ratio_profile_vs_recs', pop_percentage=0.2, store_frame=False, format='png', overwrite=False)

Bases: GroupFairnessMetric, PlotMetric

This metric generates a plot where users are split into groups and, for every group, a boxplot comparing profile popularity ratio and recommendations popularity ratio is drawn

Users are split into groups based on the user_groups parameter, which contains names of the groups as keys, and percentage of how many user must contain a group as values. For example:

user_groups = {'popular_users': 0.3, 'medium_popular_users': 0.2, 'low_popular_users': 0.5}

Every user will be inserted in a group based on how many popular items the user has rated (in relation to the percentage of users we specified as value in the dictionary):

  • users with many popular items will be inserted into the first group
  • users with niche items rated will be inserted into one of the last groups.

In general users are grouped by \(Popularity\_ratio\) in a descending order. \(Popularity\_ratio\) for a single user \(u\) is defined as:

\[ Popularity\_ratio_u = n\_most\_popular\_items\_rated_u / n\_items\_rated_u \]

The most popular items are the first pop_percentage% items of all items ordered in a descending order by popularity.

The popularity of an item is defined as the number of times it is rated in the original_ratings parameter divided by the total number of users in the original_ratings.

It can happen that for a particular user of a group no recommendation are available: in that case it will be skipped and it won't be considered in the \(Popularity\_ratio\) computation of its group. In case no user of a group has recs available, a warning will be printed and the whole group won't be considered.

The plot file will be saved as out_dir/file_name.format

Since multiple split could be evaluated at once, the overwrite parameter comes into play: if is set to False, file with the same name will be saved as file_name (1).format, file_name (2).format, etc. so that for every split a plot is generated without overwriting any file previously generated

Thanks to the 'store_frame' parameter it's also possible to store a csv containing the calculations done in order to build every boxplot. Will be saved in the same directory and with the same file name as the plot itself (but with the .csv format):

The csv will be saved as out_dir/file_name.csv

Please note: once computed, the DeltaGAP class needs to be re-instantiated in case you want to compute it again!

PARAMETER DESCRIPTION
user_groups

Dict containing group names as keys and percentage of users as value, used to split users in groups. Users with more popular items rated are grouped into the first group, users with slightly less popular items rated are grouped into the second one, etc.

TYPE: Dict<str, float>

user_profiles

one or more Ratings objects containing interactions of the profile of each user (e.g. the train set). It should be one for each split to evaluate!

TYPE: Union[list, Ratings]

original_ratings

Ratings object containing original interactions of the dataset that will be used to compute the popularity of each item (i.e. the number of times it is rated divided by the total number of users)

TYPE: Ratings

out_dir

Directory where the plot will be saved. Default is '.', meaning that the plot will be saved in the same directory where the python script it's being executed

TYPE: str DEFAULT: '.'

file_name

Name of the plot file. Default is 'pop_ratio_profile_vs_recs'

TYPE: str DEFAULT: 'pop_ratio_profile_vs_recs'

pop_percentage

How many (in percentage) 'most popular items' must be considered. Default is 0.2

TYPE: float DEFAULT: 0.2

store_frame

True if you want to store calculations done in order to build every boxplot in a csv file, False otherwise. Default is set to False

TYPE: bool DEFAULT: False

format

Format of the plot file. Could be 'jpg', 'svg', 'png'. Default is 'png'

TYPE: str DEFAULT: 'png'

overwrite

parameter which specifies if the plot saved must overwrite any file that as the same name ('file_name.format'). Default is False

TYPE: bool DEFAULT: False

Source code in clayrs/evaluation/metrics/plot_metrics.py
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
def __init__(self, user_groups: Dict[str, float],  user_profiles: Union[list, Ratings], original_ratings: Ratings,
             out_dir: str = '.', file_name: str = 'pop_ratio_profile_vs_recs', pop_percentage: float = 0.2,
             store_frame: bool = False, format: str = 'png', overwrite: bool = False):

    PlotMetric.__init__(self, out_dir, file_name, format, overwrite)
    GroupFairnessMetric.__init__(self, user_groups)

    if not 0 < pop_percentage <= 1:
        raise ValueError('Incorrect percentage! Valid percentage range: 0 < percentage <= 1')

    self._pop_by_item = get_item_popularity(original_ratings)

    if not isinstance(user_profiles, list):
        user_profiles = [user_profiles]

    self._user_profiles = user_profiles
    self.__pop_percentage = pop_percentage
    self.__user_groups = user_groups
    self.__store_frame = store_frame

PopRecsCorrelation(original_ratings, out_dir='.', file_name='pop_recs_correlation', mode='both', format='png', overwrite=False)

Bases: PlotMetric

This metric generates a plot which has as the X-axis the popularity of each item and as Y-axis the recommendation frequency, so that it can be easily seen the correlation between popular (niche) items and how many times are being recommended

The popularity of an item is defined as the number of times it is rated in the original_ratings parameter divided by the total number of users in the original_ratings.

The plot file will be saved as out_dir/file_name.format

Since multiple split could be evaluated at once, the overwrite parameter comes into play: if is set to False, file with the same name will be saved as file_name (1).format, file_name (2).format, etc. so that for every split a plot is generated without overwriting any file previously generated

There exists cases in which some items are not recommended even once, so in the graph could appear zero recommendations. One could change this behaviour thanks to the 'mode' parameter:

  • mode='both': two graphs will be created, the first one containing eventual zero recommendations, the second one where zero recommendations are excluded. This additional graph will be stored as out_dir/file_name_no_zeros.format (the string '_no_zeros' will be added to the file_name chosen automatically)

  • mode='w_zeros': only a graph containing eventual zero recommendations will be created

  • mode='no_zeros': only a graph excluding eventual zero recommendations will be created. The graph will be saved as out_dir/file_name_no_zeros.format (the string '_no_zeros' will be added to the file_name chosen automatically)

PARAMETER DESCRIPTION
original_ratings

Ratings object containing original interactions of the dataset that will be used to compute the popularity of each item (i.e. the number of times it is rated divided by the total number of users)

TYPE: Ratings

out_dir

Directory where the plot will be saved. Default is '.', meaning that the plot will be saved in the same directory where the python script it's being executed

TYPE: str DEFAULT: '.'

file_name

Name of the plot file. Default is 'pop_recs_correlation'

TYPE: str DEFAULT: 'pop_recs_correlation'

mode

Parameter which dictates which graph must be created. By default is 'both', so the graph with eventual zero recommendations as well as the graph excluding eventual zero recommendations will be created. Check the class documentation for more

TYPE: str DEFAULT: 'both'

format

Format of the plot file. Could be 'jpg', 'svg', 'png'. Default is 'png'

TYPE: str DEFAULT: 'png'

overwrite

parameter which specifies if the plot saved must overwrite any file that as the same name ('file_name.format'). Default is False

TYPE: bool DEFAULT: False

Source code in clayrs/evaluation/metrics/plot_metrics.py
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
def __init__(self, original_ratings: Ratings,
             out_dir: str = '.',
             file_name: str = 'pop_recs_correlation',
             mode: str = 'both',
             format: str = 'png', overwrite: bool = False):

    valid = {'both', 'no_zeros', 'w_zeros'}
    self.__mode = mode.lower()

    if self.__mode not in valid:
        raise ValueError("Mode {} is not supported! Modes available:\n"
                         "{}".format(mode, valid))

    self._pop_by_item = get_item_popularity(original_ratings)

    super().__init__(out_dir, file_name, format, overwrite)

build_no_zeros_plot(popularity, recommendations)

Method which builds and saves the plot excluding eventual zero recommendations It saves the plot as out_dir/filename_no_zeros.format, according to their value passed in the constructor. Note that the '_no_zeros' string is automatically added to the file_name chosen

PARAMETER DESCRIPTION
popularity

x-axis values representing popularity of every item

TYPE: list

recommendations

y-axis values representing number of times every item has been recommended

TYPE: list

Source code in clayrs/evaluation/metrics/plot_metrics.py
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
def build_no_zeros_plot(self, popularity: list, recommendations: list):
    """
    Method which builds and saves the plot **excluding** eventual *zero recommendations*
    It saves the plot as *out_dir/filename_no_zeros.format*, according to their value passed in the constructor.
    Note that the '_no_zeros' string is automatically added to the file_name chosen

    Args:
        popularity (list): x-axis values representing popularity of every item
        recommendations (list): y-axis values representing number of times every item has been recommended
    """
    title = 'Popularity Ratio - Recommendations Correlation (No zeros)'
    fig = self.build_plot(popularity, recommendations, title)

    file_name = self.file_name + '_no_zeros'

    self.save_figure(fig, file_name)

build_plot(x, y, title)

Method which builds a matplotlib plot given x-axis values, y-axis values and the title of the plot. X-axis label and Y-axis label are hard-coded as 'Popularity' and 'Recommendation frequency' respectively.

PARAMETER DESCRIPTION
x

List containing x-axis values

TYPE: list

y

List containing y-axis values

TYPE: list

title

title of the plot

TYPE: str

RETURNS DESCRIPTION
matplotlib.figure.Figure

The matplotlib figure

Source code in clayrs/evaluation/metrics/plot_metrics.py
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
def build_plot(self, x: list, y: list, title: str) -> matplotlib.figure.Figure:
    """
    Method which builds a matplotlib plot given x-axis values, y-axis values and the title of the plot.
    X-axis label and Y-axis label are hard-coded as 'Popularity' and 'Recommendation frequency' respectively.

    Args:
        x (list): List containing x-axis values
        y (list): List containing y-axis values
        title (str): title of the plot

    Returns:
        The matplotlib figure
    """
    fig = plt.figure()
    ax = fig.add_subplot()

    ax.set(xlabel='Popularity Ratio', ylabel='Recommendation frequency',
           title=title)

    ax.scatter(x, y, marker='o', s=20, c='orange', edgecolors='black',
               linewidths=0.05)

    # automatic ticks but only integer ones
    ax.yaxis.set_major_locator(plticker.MaxNLocator(integer=True))

    return fig

build_w_zeros_plot(popularity, recommendations)

Method which builds and saves the plot containing eventual zero recommendations It saves the plot as out_dir/filename.format, according to their value passed in the constructor

PARAMETER DESCRIPTION
popularity

x-axis values representing popularity of every item

TYPE: list

recommendations

y-axis values representing number of times every item has been recommended

TYPE: list

Source code in clayrs/evaluation/metrics/plot_metrics.py
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
def build_w_zeros_plot(self, popularity: list, recommendations: list):
    """
    Method which builds and saves the plot containing eventual *zero recommendations*
    It saves the plot as *out_dir/filename.format*, according to their value passed in the constructor

    Args:
        popularity (list): x-axis values representing popularity of every item
        recommendations (list): y-axis values representing number of times every item has been recommended
    """
    title = 'Popularity Ratio - Recommendations Correlation'
    fig = self.build_plot(popularity, recommendations, title)

    file_name = self.file_name

    self.save_figure(fig, file_name)