Abstract Partitioning class

`Partitioning(skip_user_error=True)`

Bases: ABC

Abstract class for partitioning technique. Each class must implement the split_single() method which specify how data for a single user will be split

PARAMETER DESCRIPTION

skip_user_error

If set to True, users for which data can't be split will be skipped and only a warning will be logged at the end of the split process specifying n° of users skipped. Otherwise, a ValueError exception is raised

TYPE: bool DEFAULT: True

Source code in clayrs/recsys/partitioning.py

def __init__(self, skip_user_error: bool = True):
    self.__skip_user_error = skip_user_error

`split_all(ratings_to_split, user_list=None)`

Concrete method that splits, for every user in the user column of ratings_to_split, the original ratings into train set and test set. If a user_list parameter is set, the method will do the splitting only for the users specified inside the list (Users can be specified as strings or with their mapped integer).

The method returns two lists:

The first contains all train set for each split (if the partitioning technique returns more than one split e.g. KFold)
The second contains all test set for each split (if the partitioning technique returns more than one split e.g. KFold)

Obviously the two lists will have the same length, and to the train set in position \(i\) corresponds the truth set at position \(i\)

PARAMETER DESCRIPTION

ratings_to_split

Ratings object which contains the interactions of the users that must be split into train set and test set

TYPE: Ratings

user_list

The Set of users for which splitting will be done. If set, splitting will be performed only for users inside the list. Otherwise, splitting will be performed for all users in ratings_to_split parameter. User can be specified with their string id or with their mapped integer

TYPE: Union[Set[int], Set[str]] DEFAULT: None

RAISES	DESCRIPTION
`ValueError`	if `skip_user_error=True` in the constructor and for at least one user splitting can't be performed

Source code in clayrs/recsys/partitioning.py

def split_all(self, ratings_to_split: Ratings,
              user_list: Union[Set[int], Set[str]] = None) -> Tuple[List[Ratings], List[Ratings]]:
    """
    Concrete method that splits, for every user in the user column of `ratings_to_split`, the original ratings
    into *train set* and *test set*.
    If a `user_list` parameter is set, the method will do the splitting only for the users
    specified inside the list (Users can be specified as *strings* or with their mapped *integer*).

    The method returns two lists:

    * The first contains all train set for each split (if the partitioning technique returns more than one split
    e.g. KFold)
    * The second contains all test set for each split (if the partitioning technique returns more than one split
    e.g. KFold)

    Obviously the two lists will have the same length, and to the *train set* in position $i$ corresponds the
    *truth set* at position $i$

    Args:
        ratings_to_split: `Ratings` object which contains the interactions of the users that must be split
            into *train set* and *test set*
        user_list: The Set of users for which splitting will be done. If set, splitting will be performed only
            for users inside the list. Otherwise, splitting will be performed for all users in `ratings_to_split`
            parameter. User can be specified with their string id or with their mapped integer

    Raises:
        ValueError: if `skip_user_error=True` in the constructor and for at least one user splitting
            can't be performed
    """

    # convert user list to list of int if necessary (strings are passed)
    if user_list is not None:
        all_users = np.array(list(user_list))
        if np.issubdtype(all_users.dtype, str):
            all_users = ratings_to_split.user_map.convert_seq_str2int(all_users)

        all_users = set(all_users)
    else:
        all_users = set(ratings_to_split.unique_user_idx_column)

    # {
    #   0: {'train': [u1_uir, u2_uir]},
    #       'test': [u1_uir, u2_uir]},
    #
    #   1: {'train': [u1_uir, u2_uir]},
    #       'test': [u1_uir, u2_uir]
    #  }
    train_test_dict = defaultdict(lambda: defaultdict(list))
    error_count = 0

    with get_progbar(all_users) as pbar:

        pbar.set_description("Performing {}".format(str(self)))
        for user_idx in pbar:
            user_ratings = ratings_to_split.get_user_interactions(user_idx)
            try:
                user_train_list, user_test_list = self.split_single(user_ratings)
                for split_number, (single_train, single_test) in enumerate(zip(user_train_list, user_test_list)):

                    train_test_dict[split_number]['train'].append(single_train)
                    train_test_dict[split_number]['test'].append(single_test)

            except ValueError as e:
                if self.skip_user_error:
                    error_count += 1
                    continue
                else:
                    raise e from None

    if error_count > 0:
        logger.warning(f"{error_count} users will be skipped because partitioning couldn't be performed\n"
                       f"Change this behavior by setting `skip_user_error` to True")

    train_list = [Ratings.from_uir(np.vstack(train_test_dict[split]['train']),
                                   ratings_to_split.user_map, ratings_to_split.item_map)
                  for split in train_test_dict]

    test_list = [Ratings.from_uir(np.vstack(train_test_dict[split]['test']),
                                  ratings_to_split.user_map, ratings_to_split.item_map)
                 for split in train_test_dict]

    return train_list, test_list

`split_single(uir_user)` `abstractmethod`

Abstract method in which each partitioning technique must specify how to split data for a single user

PARAMETER DESCRIPTION

uir_user

uir matrix containing interactions of a single user

TYPE: np.ndarray

RETURNS	DESCRIPTION
`List[np.ndarray]`	The first list contains a uir matrix for each split constituting the train set of the user
`List[np.ndarray]`	The second list contains a uir matrix for each split constituting the test set of the user

Source code in clayrs/recsys/partitioning.py

@abc.abstractmethod
def split_single(self, uir_user: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
    """
    Abstract method in which each partitioning technique must specify how to split data for a single user

    Args:
        uir_user: uir matrix containing interactions of a single user

    Returns:
        The first list contains a uir matrix for each split constituting the *train set* of the user

        The second list contains a uir matrix for each split constituting the *test set* of the user
    """
    raise NotImplementedError

Abstract Partitioning class

Partitioning(skip_user_error=True)

split_all(ratings_to_split, user_list=None)

split_single(uir_user) abstractmethod

`Partitioning(skip_user_error=True)`

`split_all(ratings_to_split, user_list=None)`

`split_single(uir_user)` `abstractmethod`