Skip to content

KFold partitioning technique

BootstrapPartitioning(random_state=None, skip_user_error=True)

Bases: Partitioning

Class that performs Bootstrap Partitioning.

The bootstrap partitioning consists in executing \(n\) extractions with replacement for each user from the original interaction frame, where \(n\) is the length of the user interactions:

  • The sampled data will be part of the train set
  • All the data which is part of the original dataset but was not sampled will be part of the test set

Info

The bootstrap partitioning can change the original data distribution, since during the extraction phase you could sample the same data more than once

PARAMETER DESCRIPTION
random_state

Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

TYPE: int DEFAULT: None

skip_user_error

If set to True, users for which data can't be split will be skipped and only a warning will be logged at the end of the split process specifying n° of users skipped. Otherwise, a ValueError exception is raised

TYPE: bool DEFAULT: True

Source code in clayrs/recsys/partitioning.py
305
306
307
308
def __init__(self, random_state: int = None, skip_user_error: bool = True):
    super().__init__(skip_user_error)

    self.__random_state = random_state

split_single(uir_user)

Method which splits train set and test set the ratings of a single user by performing \(n\) extraction with replacement of the user interactions, where \(n\) is the number of its interactions. The interactions which are not sampled will be part of the test set

PARAMETER DESCRIPTION
uir_user

uir matrix containing interactions of a single user

TYPE: np.ndarray

RETURNS DESCRIPTION
List[np.ndarray]

The first list contains a uir matrix for each split constituting the train set of the user

List[np.ndarray]

The second list contains a uir matrix for each split constituting the test set of the user

Source code in clayrs/recsys/partitioning.py
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
def split_single(self, uir_user: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
    """
    Method which splits *train set* and *test set* the ratings of a single user by performing $n$ extraction with
    replacement of the user interactions, where $n$ is the number of its interactions.
    The interactions which are not sampled will be part of the *test set*

    Args:
        uir_user: uir matrix containing interactions of a single user

    Returns:
        The first list contains a uir matrix for each split constituting the *train set* of the user

        The second list contains a uir matrix for each split constituting the *test set* of the user
    """

    interactions_train = resample(uir_user,
                                  replace=True,
                                  n_samples=len(uir_user[:, 0]),
                                  random_state=self.__random_state)

    interactions_test = np.array([interaction
                                  for interaction in uir_user
                                  if not any(np.array_equal(interaction, interaction_train, equal_nan=True)
                                             for interaction_train in interactions_train)])

    user_train_list = [interactions_train]
    user_test_list = [interactions_test]

    if len(interactions_test) == 0:
        raise ValueError("The test set for the user is empty! Try increasing the number of its interactions!")

    return user_train_list, user_test_list