Skip to content

KFold partitioning technique

KFoldPartitioning(n_splits=2, shuffle=True, random_state=None, skip_user_error=True)

Bases: Partitioning

Class that performs K-Fold partitioning

PARAMETER DESCRIPTION
n_splits

Number of splits. Must be at least 2

TYPE: int DEFAULT: 2

shuffle

Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled.

TYPE: bool DEFAULT: True

random_state

When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. Pass an int for reproducible output across multiple function calls.

TYPE: int DEFAULT: None

skip_user_error

If set to True, users for which data can't be split will be skipped and only a warning will be logged at the end of the split process specifying n° of users skipped. Otherwise, a ValueError exception is raised

TYPE: bool DEFAULT: True

Source code in clayrs/recsys/partitioning.py
157
158
159
160
161
def __init__(self, n_splits: int = 2, shuffle: bool = True, random_state: int = None,
             skip_user_error: bool = True):
    self.__kf = KFold(n_splits=n_splits, shuffle=shuffle, random_state=random_state)

    super(KFoldPartitioning, self).__init__(skip_user_error)

split_single(uir_user)

Method which splits in \(k\) splits both in train set and test set the ratings of a single user

PARAMETER DESCRIPTION
uir_user

uir matrix containing interactions of a single user

TYPE: np.ndarray

RETURNS DESCRIPTION
List[np.ndarray]

The first list contains a uir matrix for each split constituting the train set of the user

List[np.ndarray]

The second list contains a uir matrix for each split constituting the test set of the user

Source code in clayrs/recsys/partitioning.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
def split_single(self, uir_user: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
    """
    Method which splits in $k$ splits both in *train set* and *test set* the ratings of a single user

    Args:
        uir_user: uir matrix containing interactions of a single user

    Returns:
        The first list contains a uir matrix for each split constituting the *train set* of the user

        The second list contains a uir matrix for each split constituting the *test set* of the user
    """
    split_result = self.__kf.split(uir_user)

    user_train_list = []
    user_test_list = []

    # split_result contains index of the ratings which must constitutes train set and test set
    for train_set_indexes, test_set_indexes in split_result:
        user_interactions_train = uir_user[train_set_indexes]
        user_interactions_test = uir_user[test_set_indexes]

        user_train_list.append(user_interactions_train)
        user_test_list.append(user_interactions_test)

    return user_train_list, user_test_list