Skip to content

HoldOut partitioning technique

HoldOutPartitioning(train_set_size=None, test_set_size=None, shuffle=True, random_state=None, skip_user_error=True)

Bases: Partitioning

Class that performs Hold-Out partitioning

PARAMETER DESCRIPTION
train_set_size

Should be between 0.0 and 1.0 and represent the proportion of the ratings to hold in the train set for each user. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

TYPE: Union[float, int, None] DEFAULT: None

test_set_size

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

TYPE: Union[float, int, None] DEFAULT: None

random_state

Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

TYPE: int DEFAULT: None

shuffle

Whether to shuffle the data before splitting.

TYPE: bool DEFAULT: True

skip_user_error

If set to True, users for which data can't be split will be skipped and only a warning will be logged at the end of the split process specifying n° of users skipped. Otherwise, a ValueError exception is raised

TYPE: bool DEFAULT: True

Source code in clayrs/recsys/partitioning.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def __init__(self, train_set_size: Union[float, int, None] = None, test_set_size: Union[float, int, None] = None,
             shuffle: bool = True, random_state: int = None,
             skip_user_error: bool = True):

    if train_set_size is not None and train_set_size < 0:
        raise ValueError("train_set_size must be a positive number")

    if test_set_size is not None and test_set_size < 0:
        raise ValueError("test_set_size must be a positive number")

    if isinstance(train_set_size, float) and train_set_size > 1.0:
        raise ValueError("train_set_size must be between 0.0 and 1.0")

    if isinstance(test_set_size, float) and test_set_size > 1.0:
        raise ValueError("test_set_size must be between 0.0 and 1.0")

    if isinstance(train_set_size, float) and isinstance(test_set_size, float) and \
            (train_set_size + test_set_size) > 1.0:
        raise ValueError("train_set_size and test_set_size percentages must not sum to a value greater than 1.0")

    self.__train_set_size = train_set_size
    self.__test_set_size = test_set_size
    self.__random_state = random_state
    self.__shuffle = shuffle

    super().__init__(skip_user_error)

split_single(uir_user)

Method which splits train set and test set the ratings of a single user by holding in the train set of the user interactions accoring to the parameters set in the constructor

PARAMETER DESCRIPTION
uir_user

uir matrix containing interactions of a single user

TYPE: np.ndarray

RETURNS DESCRIPTION
List[np.ndarray]

The first list contains a uir matrix for each split constituting the train set of the user

List[np.ndarray]

The second list contains a uir matrix for each split constituting the test set of the user

Source code in clayrs/recsys/partitioning.py
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
def split_single(self, uir_user: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
    """
    Method which splits *train set* and *test set* the ratings of a single user by holding in the train set of the
    user interactions accoring to the parameters set in the constructor

    Args:
        uir_user: uir matrix containing interactions of a single user

    Returns:
        The first list contains a uir matrix for each split constituting the *train set* of the user

        The second list contains a uir matrix for each split constituting the *test set* of the user
    """
    uir_train, uir_test = train_test_split(uir_user,
                                           train_size=self.__train_set_size,
                                           test_size=self.__test_set_size,
                                           shuffle=self.__shuffle,
                                           random_state=self.__random_state)

    user_train_list = [uir_train]
    user_test_list = [uir_test]

    return user_train_list, user_test_list