KFold partitioning technique

`BootstrapPartitioning(random_state=None, skip_user_error=True)`

Bases: Partitioning

Class that performs Bootstrap Partitioning.

The bootstrap partitioning consists in executing \(n\) extractions with replacement for each user from the original interaction frame, where \(n\) is the length of the user interactions:

The sampled data will be part of the train set
All the data which is part of the original dataset but was not sampled will be part of the test set

Info

The bootstrap partitioning can change the original data distribution, since during the extraction phase you could sample the same data more than once

PARAMETER DESCRIPTION

random_state

Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

TYPE: int DEFAULT: None

skip_user_error

If set to True, users for which data can't be split will be skipped and only a warning will be logged at the end of the split process specifying n° of users skipped. Otherwise, a ValueError exception is raised

TYPE: bool DEFAULT: True

Source code in clayrs/recsys/partitioning.py

def __init__(self, random_state: int = None, skip_user_error: bool = True):
    super().__init__(skip_user_error)

    self.__random_state = random_state

`split_single(uir_user)`

Method which splits train set and test set the ratings of a single user by performing \(n\) extraction with replacement of the user interactions, where \(n\) is the number of its interactions. The interactions which are not sampled will be part of the test set

PARAMETER DESCRIPTION

uir_user

uir matrix containing interactions of a single user

TYPE: np.ndarray

RETURNS	DESCRIPTION
`List[np.ndarray]`	The first list contains a uir matrix for each split constituting the train set of the user
`List[np.ndarray]`	The second list contains a uir matrix for each split constituting the test set of the user

Source code in clayrs/recsys/partitioning.py

def split_single(self, uir_user: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
    """
    Method which splits *train set* and *test set* the ratings of a single user by performing $n$ extraction with
    replacement of the user interactions, where $n$ is the number of its interactions.
    The interactions which are not sampled will be part of the *test set*

    Args:
        uir_user: uir matrix containing interactions of a single user

    Returns:
        The first list contains a uir matrix for each split constituting the *train set* of the user

        The second list contains a uir matrix for each split constituting the *test set* of the user
    """

    interactions_train = resample(uir_user,
                                  replace=True,
                                  n_samples=len(uir_user[:, 0]),
                                  random_state=self.__random_state)

    interactions_test = np.array([interaction
                                  for interaction in uir_user
                                  if not any(np.array_equal(interaction, interaction_train, equal_nan=True)
                                             for interaction_train in interactions_train)])

    user_train_list = [interactions_train]
    user_test_list = [interactions_test]

    if len(interactions_test) == 0:
        raise ValueError("The test set for the user is empty! Try increasing the number of its interactions!")

    return user_train_list, user_test_list