HoldOut partitioning technique

`HoldOutPartitioning(train_set_size=None, test_set_size=None, shuffle=True, random_state=None, skip_user_error=True)`

Bases: Partitioning

Class that performs Hold-Out partitioning

PARAMETER	DESCRIPTION
`train_set_size`	Should be between 0.0 and 1.0 and represent the proportion of the ratings to *hold* in the train set for each user. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. TYPE: `Union[float, int, None]` DEFAULT: `None`
`test_set_size`	If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If `train_size` is also None, it will be set to 0.25. TYPE: `Union[float, int, None]` DEFAULT: `None`
`random_state`	Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. TYPE: `int` DEFAULT: `None`
`shuffle`	Whether to shuffle the data before splitting. TYPE: `bool` DEFAULT: `True`
`skip_user_error`	If set to True, users for which data can't be split will be skipped and only a warning will be logged at the end of the split process specifying n° of users skipped. Otherwise, a `ValueError` exception is raised TYPE: `bool` DEFAULT: `True`

Source code in clayrs/recsys/partitioning.py

def __init__(self, train_set_size: Union[float, int, None] = None, test_set_size: Union[float, int, None] = None,
             shuffle: bool = True, random_state: int = None,
             skip_user_error: bool = True):

    if train_set_size is not None and train_set_size < 0:
        raise ValueError("train_set_size must be a positive number")

    if test_set_size is not None and test_set_size < 0:
        raise ValueError("test_set_size must be a positive number")

    if isinstance(train_set_size, float) and train_set_size > 1.0:
        raise ValueError("train_set_size must be between 0.0 and 1.0")

    if isinstance(test_set_size, float) and test_set_size > 1.0:
        raise ValueError("test_set_size must be between 0.0 and 1.0")

    if isinstance(train_set_size, float) and isinstance(test_set_size, float) and \
            (train_set_size + test_set_size) > 1.0:
        raise ValueError("train_set_size and test_set_size percentages must not sum to a value greater than 1.0")

    self.__train_set_size = train_set_size
    self.__test_set_size = test_set_size
    self.__random_state = random_state
    self.__shuffle = shuffle

    super().__init__(skip_user_error)

`split_single(uir_user)`

Method which splits train set and test set the ratings of a single user by holding in the train set of the user interactions accoring to the parameters set in the constructor

PARAMETER DESCRIPTION

uir_user

uir matrix containing interactions of a single user

TYPE: np.ndarray

RETURNS	DESCRIPTION
`List[np.ndarray]`	The first list contains a uir matrix for each split constituting the train set of the user
`List[np.ndarray]`	The second list contains a uir matrix for each split constituting the test set of the user

Source code in clayrs/recsys/partitioning.py

def split_single(self, uir_user: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
    """
    Method which splits *train set* and *test set* the ratings of a single user by holding in the train set of the
    user interactions accoring to the parameters set in the constructor

    Args:
        uir_user: uir matrix containing interactions of a single user

    Returns:
        The first list contains a uir matrix for each split constituting the *train set* of the user

        The second list contains a uir matrix for each split constituting the *test set* of the user
    """
    uir_train, uir_test = train_test_split(uir_user,
                                           train_size=self.__train_set_size,
                                           test_size=self.__test_set_size,
                                           shuffle=self.__shuffle,
                                           random_state=self.__random_state)

    user_train_list = [uir_train]
    user_test_list = [uir_test]

    return user_train_list, user_test_list