Skip to content

Abstract methodology class

Methodology(only_greater_eq=None)

Bases: ABC

Class which, given a train set and a test set, has the task to calculate which items must be used in order to generate a recommendation list

The methodologies here implemented follow the 'Precision-Oriented Evaluation of Recommender Systems: An Algorithmic Comparison' paper

Source code in clayrs/recsys/methodology.py
23
24
25
26
27
28
29
30
31
def __init__(self, only_greater_eq: float = None):

    self._threshold = only_greater_eq

    # items arr is an array with all items id mapped to their integer
    self._items_arr: Optional[np.ndarray] = None
    # query vector is the vector with same length of _items_arr used as boolean query vector
    # position in which a True appears will be taken from _items_arr, position set to False will not
    self._query_vector: Optional[np.ndarray] = None

filter_all(train_set, test_set, result_as_dict=False, ids_as_str=True)

Concrete method which calculates for all users of the test set which items must be used in order to generate a recommendation list

It takes in input a train set and a test set and returns a single DataFrame or a python dictionary containing, for every user, all items which must be recommended based on the methodology chosen.

PARAMETER DESCRIPTION
train_set

Ratings object which contains the train set of every user

TYPE: Ratings

test_set

Ratings object which contains the test set of every user

TYPE: Ratings

result_as_dict

If True the output of the method will be a generator of a dictionary that contains users as keys and numpy arrays with items as values. If ids_as_str is set to True, users and items will be present with their string id, otherwise will be present with their mapped integer

TYPE: bool DEFAULT: False

ids_as_str

If True, the result will contain users and items represented with their string id. Otherwise, will be present with their mapped integer

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
Union[pd.DataFrame, Union[Dict[str, np.ndarray], Dict[int, np.ndarray]]]

A DataFrame or a python dictionary which contains all items which must be recommended to every user based on the methodology chosen.

Source code in clayrs/recsys/methodology.py
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def filter_all(self, train_set: Ratings, test_set: Ratings,
               result_as_dict: bool = False,
               ids_as_str: bool = True) -> Union[pd.DataFrame,
                                                 Union[Dict[str, np.ndarray], Dict[int, np.ndarray]]]:
    """
    Concrete method which calculates for all users of the *test set* which items must be used in order to
    generate a recommendation list

    It takes in input a *train set* and a *test set* and returns a single DataFrame or a python
    dictionary containing, for every user, all items which must be recommended based on the methodology chosen.

    Args:
        train_set: `Ratings` object which contains the train set of every user
        test_set: `Ratings` object which contains the test set of every user
        result_as_dict: If True the output of the method will be a generator of a dictionary that contains
            users as keys and numpy arrays with items as values. If `ids_as_str` is set to True, users and items
            will be present with their string id, otherwise will be present with their mapped integer
        ids_as_str: If True, the result will contain users and items represented with their string id. Otherwise,
            will be present with their mapped integer

    Returns:
        A DataFrame or a python dictionary which contains all items which must be recommended to
            every user based on the methodology chosen.
    """
    user_list = test_set.unique_user_idx_column
    user_int2str = train_set.user_map.convert_int2str
    item_seq_int2str = train_set.item_map.convert_seq_int2str

    with get_progbar(user_list) as pbar:
        pbar.set_description(f"Filtering items based on {str(self)}")

        if ids_as_str:
            filtered = {user_int2str(user_idx): item_seq_int2str(self.filter_single(user_idx, train_set, test_set).astype(int))
                        for user_idx in pbar}
        else:
            filtered = {user_idx: self.filter_single(user_idx, train_set, test_set)
                        for user_idx in pbar}

    if not result_as_dict:

        will_be_frame = {"user_id": [], "item_id": []}
        for user_id, filter_list in filtered.items():

            will_be_frame["user_id"].append(np.full(filter_list.shape, user_id))
            will_be_frame["item_id"].append(filter_list)

        will_be_frame["user_id"] = np.hstack(will_be_frame["user_id"])
        will_be_frame["item_id"] = np.hstack(will_be_frame["item_id"])

        filtered = pd.DataFrame.from_dict(will_be_frame)

    return filtered

filter_single(user_idx, train_set, test_set) abstractmethod

Abstract method in which must be specified how to calculate which items must be part of the recommendation list of a single user

Source code in clayrs/recsys/methodology.py
114
115
116
117
118
119
120
@abstractmethod
def filter_single(self, user_idx: int, train_set: Ratings, test_set: Ratings) -> np.ndarray:
    """
    Abstract method in which must be specified how to calculate which items must be part of the recommendation list
    of a single user
    """
    raise NotImplementedError

setup(train_set, test_set) abstractmethod

Method to call before calling filter_all() or filter_single(). It is used to set up numpy arrays which will filter items according to the methodology chosen.

This method has side effect, meaning that it will return a Methodology object which has been set up but will also change the Methodology object that has called this method

PARAMETER DESCRIPTION
train_set

Ratings object which contains the train set of every user

TYPE: Ratings

test_set

Ratings object which contains the test set of every user

TYPE: Ratings

RETURNS DESCRIPTION
Methodology

The set-up Methodology object

Source code in clayrs/recsys/methodology.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
@abstractmethod
def setup(self, train_set: Ratings, test_set: Ratings) -> Methodology:
    """
    Method to call before calling `filter_all()` or `filter_single()`.
    It is used to set up numpy arrays which will filter items according to the methodology chosen.

    This method has side effect, meaning that it will return a `Methodology` object which has been set up but will
    also change the `Methodology` object that has called this method

    Args:
        train_set: `Ratings` object which contains the train set of every user
        test_set: `Ratings` object which contains the test set of every user

    Returns:
        The set-up Methodology object
    """
    raise NotImplementedError