Plot metrics

Plot metrics save a plot in the chosen output directory

`LongTailDistr(out_dir='.', file_name='long_tail_distr', on='truth', format='png', overwrite=False)`

Bases: PlotMetric

This metric generates the Long Tail Distribution plot and saves it in the output directory with the file name specified. The plot can be generated both for the truth set or the predictions set (based on the on parameter):

on = 'truth': in this case the long tail distribution is useful to see which are the most popular items (the most rated ones)
on = 'pred': in this case the long tail distribution is useful to see which are the most recommended items

The plot file will be saved as out_dir/file_name.format

Since multiple split could be evaluated at once, the overwrite parameter comes into play: if is set to False, file with the same name will be saved as file_name (1).format, file_name (2).format, etc. so that for every split a plot is generated without overwriting any file previously generated

PARAMETER	DESCRIPTION
`out_dir`	Directory where the plot will be saved. Default is '.', meaning that the plot will be saved in the same directory where the python script it's being executed TYPE: `str` DEFAULT: `'.'`
`file_name`	Name of the plot file. Default is 'long_tail_distr' TYPE: `str` DEFAULT: `'long_tail_distr'`
`on`	Set on which the Long Tail Distribution plot will be generated. Values accepted are 'truth' or 'pred' TYPE: `str` DEFAULT: `'truth'`
`format`	Format of the plot file. Could be 'jpg', 'svg', 'png'. Default is 'png' TYPE: `str` DEFAULT: `'png'`
`overwrite`	parameter which specifies if the plot saved must overwrite any file that as the same name ('file_name.format'). Default is False TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`ValueError`	exception raised when a invalid value for the 'on' parameter is specified

Source code in clayrs/evaluation/metrics/plot_metrics.py

def __init__(self, out_dir: str = '.', file_name: str = 'long_tail_distr', on: str = 'truth', format: str = 'png',
             overwrite: bool = False):
    valid = {'truth', 'pred'}
    self.__on = on.lower()

    if self.__on not in valid:
        raise ValueError("on={} is not supported! Long Tail can be calculated only on:\n"
                         "{}".format(on, valid))
    super().__init__(out_dir, file_name, format, overwrite)

`PopRatioProfileVsRecs(user_groups, user_profiles, original_ratings, out_dir='.', file_name='pop_ratio_profile_vs_recs', pop_percentage=0.2, store_frame=False, format='png', overwrite=False)`

Bases: GroupFairnessMetric, PlotMetric

This metric generates a plot where users are split into groups and, for every group, a boxplot comparing profile popularity ratio and recommendations popularity ratio is drawn

Users are split into groups based on the user_groups parameter, which contains names of the groups as keys, and percentage of how many user must contain a group as values. For example:

user_groups = {'popular_users': 0.3, 'medium_popular_users': 0.2, 'low_popular_users': 0.5}

Every user will be inserted in a group based on how many popular items the user has rated (in relation to the percentage of users we specified as value in the dictionary):

users with many popular items will be inserted into the first group
users with niche items rated will be inserted into one of the last groups.

In general users are grouped by \(Popularity\_ratio\) in a descending order. \(Popularity\_ratio\) for a single user \(u\) is defined as:

\[ Popularity\_ratio_u = n\_most\_popular\_items\_rated_u / n\_items\_rated_u \]

The most popular items are the first pop_percentage% items of all items ordered in a descending order by popularity.

The popularity of an item is defined as the number of times it is rated in the original_ratings parameter divided by the total number of users in the original_ratings.

It can happen that for a particular user of a group no recommendation are available: in that case it will be skipped and it won't be considered in the \(Popularity\_ratio\) computation of its group. In case no user of a group has recs available, a warning will be printed and the whole group won't be considered.

The plot file will be saved as out_dir/file_name.format

Since multiple split could be evaluated at once, the overwrite parameter comes into play: if is set to False, file with the same name will be saved as file_name (1).format, file_name (2).format, etc. so that for every split a plot is generated without overwriting any file previously generated

Thanks to the 'store_frame' parameter it's also possible to store a csv containing the calculations done in order to build every boxplot. Will be saved in the same directory and with the same file name as the plot itself (but with the .csv format):

The csv will be saved as out_dir/file_name.csv

Please note: once computed, the DeltaGAP class needs to be re-instantiated in case you want to compute it again!

PARAMETER	DESCRIPTION
`user_groups`	Dict containing group names as keys and percentage of users as value, used to split users in groups. Users with more popular items rated are grouped into the first group, users with slightly less popular items rated are grouped into the second one, etc. TYPE: `Dict<str, float>`
`user_profiles`	one or more `Ratings` objects containing interactions of the profile of each user (e.g. the train set). It should be one for each split to evaluate! TYPE: `Union[list, Ratings]`
`original_ratings`	`Ratings` object containing original interactions of the dataset that will be used to compute the popularity of each item (i.e. the number of times it is rated divided by the total number of users) TYPE: `Ratings`
`out_dir`	Directory where the plot will be saved. Default is '.', meaning that the plot will be saved in the same directory where the python script it's being executed TYPE: `str` DEFAULT: `'.'`
`file_name`	Name of the plot file. Default is 'pop_ratio_profile_vs_recs' TYPE: `str` DEFAULT: `'pop_ratio_profile_vs_recs'`
`pop_percentage`	How many (in percentage) 'most popular items' must be considered. Default is 0.2 TYPE: `float` DEFAULT: `0.2`
`store_frame`	True if you want to store calculations done in order to build every boxplot in a csv file, False otherwise. Default is set to False TYPE: `bool` DEFAULT: `False`
`format`	Format of the plot file. Could be 'jpg', 'svg', 'png'. Default is 'png' TYPE: `str` DEFAULT: `'png'`
`overwrite`	parameter which specifies if the plot saved must overwrite any file that as the same name ('file_name.format'). Default is False TYPE: `bool` DEFAULT: `False`

Source code in clayrs/evaluation/metrics/plot_metrics.py

def __init__(self, user_groups: Dict[str, float],  user_profiles: Union[list, Ratings], original_ratings: Ratings,
             out_dir: str = '.', file_name: str = 'pop_ratio_profile_vs_recs', pop_percentage: float = 0.2,
             store_frame: bool = False, format: str = 'png', overwrite: bool = False):

    PlotMetric.__init__(self, out_dir, file_name, format, overwrite)
    GroupFairnessMetric.__init__(self, user_groups)

    if not 0 < pop_percentage <= 1:
        raise ValueError('Incorrect percentage! Valid percentage range: 0 < percentage <= 1')

    self._pop_by_item = get_item_popularity(original_ratings)

    if not isinstance(user_profiles, list):
        user_profiles = [user_profiles]

    self._user_profiles = user_profiles
    self.__pop_percentage = pop_percentage
    self.__user_groups = user_groups
    self.__store_frame = store_frame

`PopRecsCorrelation(original_ratings, out_dir='.', file_name='pop_recs_correlation', mode='both', format='png', overwrite=False)`

Bases: PlotMetric

This metric generates a plot which has as the X-axis the popularity of each item and as Y-axis the recommendation frequency, so that it can be easily seen the correlation between popular (niche) items and how many times are being recommended

The popularity of an item is defined as the number of times it is rated in the original_ratings parameter divided by the total number of users in the original_ratings.

The plot file will be saved as out_dir/file_name.format

Since multiple split could be evaluated at once, the overwrite parameter comes into play: if is set to False, file with the same name will be saved as file_name (1).format, file_name (2).format, etc. so that for every split a plot is generated without overwriting any file previously generated

There exists cases in which some items are not recommended even once, so in the graph could appear zero recommendations. One could change this behaviour thanks to the 'mode' parameter:

mode='both': two graphs will be created, the first one containing eventual zero recommendations, the second one where zero recommendations are excluded. This additional graph will be stored as out_dir/file_name_no_zeros.format (the string '_no_zeros' will be added to the file_name chosen automatically)
mode='w_zeros': only a graph containing eventual zero recommendations will be created
mode='no_zeros': only a graph excluding eventual zero recommendations will be created. The graph will be saved as out_dir/file_name_no_zeros.format (the string '_no_zeros' will be added to the file_name chosen automatically)

PARAMETER	DESCRIPTION
`original_ratings`	`Ratings` object containing original interactions of the dataset that will be used to compute the popularity of each item (i.e. the number of times it is rated divided by the total number of users) TYPE: `Ratings`
`out_dir`	Directory where the plot will be saved. Default is '.', meaning that the plot will be saved in the same directory where the python script it's being executed TYPE: `str` DEFAULT: `'.'`
`file_name`	Name of the plot file. Default is 'pop_recs_correlation' TYPE: `str` DEFAULT: `'pop_recs_correlation'`
`mode`	Parameter which dictates which graph must be created. By default is 'both', so the graph with eventual zero recommendations as well as the graph excluding eventual zero recommendations will be created. Check the class documentation for more TYPE: `str` DEFAULT: `'both'`
`format`	Format of the plot file. Could be 'jpg', 'svg', 'png'. Default is 'png' TYPE: `str` DEFAULT: `'png'`
`overwrite`	parameter which specifies if the plot saved must overwrite any file that as the same name ('file_name.format'). Default is False TYPE: `bool` DEFAULT: `False`

Source code in clayrs/evaluation/metrics/plot_metrics.py

def __init__(self, original_ratings: Ratings,
             out_dir: str = '.',
             file_name: str = 'pop_recs_correlation',
             mode: str = 'both',
             format: str = 'png', overwrite: bool = False):

    valid = {'both', 'no_zeros', 'w_zeros'}
    self.__mode = mode.lower()

    if self.__mode not in valid:
        raise ValueError("Mode {} is not supported! Modes available:\n"
                         "{}".format(mode, valid))

    self._pop_by_item = get_item_popularity(original_ratings)

    super().__init__(out_dir, file_name, format, overwrite)

`build_no_zeros_plot(popularity, recommendations)`

Method which builds and saves the plot excluding eventual zero recommendations It saves the plot as out_dir/filename_no_zeros.format, according to their value passed in the constructor. Note that the '_no_zeros' string is automatically added to the file_name chosen

PARAMETER DESCRIPTION

popularity

x-axis values representing popularity of every item

TYPE: list

recommendations

y-axis values representing number of times every item has been recommended

TYPE: list

Source code in clayrs/evaluation/metrics/plot_metrics.py

def build_no_zeros_plot(self, popularity: list, recommendations: list):
    """
    Method which builds and saves the plot **excluding** eventual *zero recommendations*
    It saves the plot as *out_dir/filename_no_zeros.format*, according to their value passed in the constructor.
    Note that the '_no_zeros' string is automatically added to the file_name chosen

    Args:
        popularity (list): x-axis values representing popularity of every item
        recommendations (list): y-axis values representing number of times every item has been recommended
    """
    title = 'Popularity Ratio - Recommendations Correlation (No zeros)'
    fig = self.build_plot(popularity, recommendations, title)

    file_name = self.file_name + '_no_zeros'

    self.save_figure(fig, file_name)

`build_plot(x, y, title)`

Method which builds a matplotlib plot given x-axis values, y-axis values and the title of the plot. X-axis label and Y-axis label are hard-coded as 'Popularity' and 'Recommendation frequency' respectively.

PARAMETER DESCRIPTION

x

List containing x-axis values

TYPE: list

y

List containing y-axis values

TYPE: list

title

title of the plot

TYPE: str

RETURNS	DESCRIPTION
`matplotlib.figure.Figure`	The matplotlib figure

Source code in clayrs/evaluation/metrics/plot_metrics.py

def build_plot(self, x: list, y: list, title: str) -> matplotlib.figure.Figure:
    """
    Method which builds a matplotlib plot given x-axis values, y-axis values and the title of the plot.
    X-axis label and Y-axis label are hard-coded as 'Popularity' and 'Recommendation frequency' respectively.

    Args:
        x (list): List containing x-axis values
        y (list): List containing y-axis values
        title (str): title of the plot

    Returns:
        The matplotlib figure
    """
    fig = plt.figure()
    ax = fig.add_subplot()

    ax.set(xlabel='Popularity Ratio', ylabel='Recommendation frequency',
           title=title)

    ax.scatter(x, y, marker='o', s=20, c='orange', edgecolors='black',
               linewidths=0.05)

    # automatic ticks but only integer ones
    ax.yaxis.set_major_locator(plticker.MaxNLocator(integer=True))

    return fig

`build_w_zeros_plot(popularity, recommendations)`

Method which builds and saves the plot containing eventual zero recommendations It saves the plot as out_dir/filename.format, according to their value passed in the constructor

PARAMETER DESCRIPTION

popularity

x-axis values representing popularity of every item

TYPE: list

recommendations

y-axis values representing number of times every item has been recommended

TYPE: list

Source code in clayrs/evaluation/metrics/plot_metrics.py

def build_w_zeros_plot(self, popularity: list, recommendations: list):
    """
    Method which builds and saves the plot containing eventual *zero recommendations*
    It saves the plot as *out_dir/filename.format*, according to their value passed in the constructor

    Args:
        popularity (list): x-axis values representing popularity of every item
        recommendations (list): y-axis values representing number of times every item has been recommended
    """
    title = 'Popularity Ratio - Recommendations Correlation'
    fig = self.build_plot(popularity, recommendations, title)

    file_name = self.file_name

    self.save_figure(fig, file_name)