Skip to content

Centroid Vector

VBPR(item_field, gamma_dim, theta_dim, batch_size, epochs, threshold=0, learning_rate=0.005, lambda_w=0.01, lambda_b_pos=0.01, lambda_b_neg=0.001, lambda_e=0, train_loss=fun.logsigmoid, optimizer_class=torch.optim.Adam, device=None, embedding_combiner=Centroid(), normalize=True, seed=None, additional_opt_parameters=None, additional_dl_parameters=None)

Bases: ContentBasedAlgorithm

Class that implements recommendation through the VBPR algorithm. It's a ranking algorithm, so it can't do score prediction.

The VBPR algorithm expects features extracted from images and works on implicit feedback, but in theory you could use any embedding representation, and you can use explicit feedback which will be converted into implicit one thanks to the threshold parameter:

  • All scores \(>= threshold\) are considered positive scores

For more details on VBPR algorithm, please check the relative paper here

PARAMETER DESCRIPTION
item_field

dict where the key is the name of the field that contains the content to use, value is the representation(s) id(s) that will be used for the said item. The value of a field can be a string or a list, use a list if you want to use multiple representations for a particular field.

TYPE: dict

gamma_dim

dimension of latent factors for non-visual parameters

TYPE: int

theta_dim

dimension of latent factors for visual parameters

TYPE: int

batch_size

dimension of each batch of the torch dataloader for the images features

TYPE: int

epochs

number of training epochs

TYPE: int

threshold

float value which is used to distinguish positive from negative items. If None, it will vary for each user, and it will be set to the average rating given by it

TYPE: Optional[float] DEFAULT: 0

learning_rate

learning rate for the torch optimizer

TYPE: float DEFAULT: 0.005

lambda_w

weight assigned to the regularization of the loss on \(\gamma_u\), \(\gamma_i\), \(\theta_u\)

TYPE: float DEFAULT: 0.01

lambda_b_pos

weight assigned to the regularization of the loss on \(\beta_i\) for the positive items

TYPE: float DEFAULT: 0.01

lambda_b_neg

weight assigned to the regularization of the loss on \(\beta_i\) for the negative items

TYPE: float DEFAULT: 0.001

lambda_e

weight assigned to the regularization of the loss on \(\beta'\), \(E\)

TYPE: float DEFAULT: 0

train_loss

loss function for the training phase. Default is logsigmoid

TYPE: Callable[[torch.Tensor], torch.Tensor] DEFAULT: fun.logsigmoid

optimizer_class

optimizer torch class for the training phase. It will be instantiated using additional_opt_parameters if specified

TYPE: Type[torch.optim.Optimizer] DEFAULT: torch.optim.Adam

device

device on which the training will be run. If None and a GPU is available, then the GPU is automatically selected as device to use. Otherwise, the cpu is used

TYPE: str DEFAULT: None

embedding_combiner

CombiningTechnique used when embeddings representation must be used, but they are in a matrix form instead of a single vector (e.g. WordEmbedding representations have one vector for each word). By default, the Centroid of the rows of the matrix is computed

TYPE: CombiningTechnique DEFAULT: Centroid()

normalize

Whether to normalize input features or not. If True, the input feature matrix is subtracted to its \(min\) and divided by its \(max + 1e-10\)

TYPE: bool DEFAULT: True

seed

random state which will be used for weight initialization and sampling of the negative example

TYPE: int DEFAULT: None

additional_opt_parameters

kwargs for the optimizer. If you specify learning rate in this parameter, it will be overwritten by the local learning_rate parameter

TYPE: Dict[str, Any] DEFAULT: None

additional_dl_parameters

kwargs for the dataloader. If you specify batch size in this parameter, it will be overwritten by the local batch_size parameter

TYPE: Dict[str, Any] DEFAULT: None

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
def __init__(self, item_field: dict,
             gamma_dim: int, theta_dim: int, batch_size: int, epochs: int,
             threshold: Optional[float] = 0,
             learning_rate: float = 0.005,
             lambda_w: float = 0.01, lambda_b_pos: float = 0.01, lambda_b_neg: float = 0.001, lambda_e: float = 0,
             train_loss: Callable[[torch.Tensor], torch.Tensor] = fun.logsigmoid,
             optimizer_class: Type[torch.optim.Optimizer] = torch.optim.Adam,
             device: str = None,
             embedding_combiner: CombiningTechnique = Centroid(),
             normalize: bool = True,
             seed: int = None,
             additional_opt_parameters: Dict[str, Any] = None,
             additional_dl_parameters: Dict[str, Any] = None):

    super().__init__(item_field, threshold)

    if additional_opt_parameters is None:
        additional_opt_parameters = {}

    if additional_dl_parameters is None:
        additional_dl_parameters = {}

    additional_opt_parameters["lr"] = learning_rate
    additional_dl_parameters["batch_size"] = batch_size

    self.device = device if device is not None else "cuda:0" if torch.cuda.is_available() else "cpu"

    self.gamma_dim = gamma_dim
    self.theta_dim = theta_dim

    self.epochs = epochs
    self.train_loss = train_loss
    self.train_optimizer = optimizer_class
    self.train_optimizer_parameters = additional_opt_parameters
    self.normalize = normalize
    self.lambda_w = lambda_w
    self.lambda_b_pos = lambda_b_pos
    self.lambda_b_neg = lambda_b_neg
    self.lambda_e = lambda_e

    self._embedding_combiner = embedding_combiner

    self.seed = seed
    self.dl_parameters = additional_dl_parameters

fit(train_set, items_directory, num_cpus=-1)

Method which will fit the VBPR algorithm via neural training with torch

PARAMETER DESCRIPTION
train_set

Ratings object which contains the train set of each user

TYPE: Ratings

items_directory

Path where complexly represented items are serialized by the Content Analyzer

TYPE: str

num_cpus

number of processors that must be reserved for the method. If set to 0, all cpus available will be used. Be careful though: multiprocessing in python has a substantial memory overhead!

TYPE: int DEFAULT: -1

RETURNS DESCRIPTION
VBPRNetwork

A fit VBPRNetwork object (torch module which implements the VBPR neural network)

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
def fit(self, train_set: Ratings, items_directory: str, num_cpus: int = -1) -> VBPRNetwork:
    """
    Method which will fit the VBPR algorithm via neural training with torch

    Args:
        train_set: `Ratings` object which contains the train set of each user
        items_directory: Path where complexly represented items are serialized by the Content Analyzer
        num_cpus: number of processors that must be reserved for the method. If set to `0`, all cpus available will
            be used. Be careful though: multiprocessing in python has a substantial memory overhead!

    Returns:
        A fit VBPRNetwork object (torch module which implements the VBPR neural network)
    """

    def _l2_loss(*tensors):
        l2_loss = 0
        for tensor in tensors:
            l2_loss += tensor.pow(2).sum()
        return l2_loss / 2

    train_set = self._build_only_positive_ratings(train_set)

    items_features = self._load_items_features(train_set, items_directory)

    self._seed_all()

    items_features = torch.tensor(items_features, device=self.device, dtype=torch.float)

    model = VBPRNetwork(n_users=len(train_set.user_map),
                        n_items=len(train_set.item_map),
                        features_dim=items_features.shape[1],
                        gamma_dim=self.gamma_dim,
                        theta_dim=self.theta_dim,
                        device=self.device)

    optimizer = self.train_optimizer([
        model.beta_items,
        model.gamma_users,
        model.gamma_items,
        model.theta_users,
        model.E,
        model.beta_prime
    ], **self.train_optimizer_parameters)

    train_dataset = TriplesDataset(train_set, self.seed)

    train_dl = torch.utils.data.DataLoader(train_dataset, **self.dl_parameters)

    model.train()

    logger.info("Starting VBPR training!")
    for epoch in range(self.epochs):

        train_loss = 0
        n_user_processed = 0

        with get_progbar(train_dl) as pbar:

            pbar.set_description(f"Starting {epoch + 1}/{self.epochs} epoch...")

            for i, batch in enumerate(pbar):

                user_idx = batch[0].long()
                pos_idx = batch[1].long()
                neg_idx = batch[2].long()

                n_user_processed += len(user_idx)

                positive_features = items_features[pos_idx]
                negative_features = items_features[neg_idx]

                model_input = (
                    user_idx.to(self.device),
                    pos_idx.to(self.device),
                    neg_idx.to(self.device),
                    positive_features.to(self.device),
                    negative_features.to(self.device)
                )

                Xuij, (gamma_u, theta_u), (beta_i_pos, beta_i_neg), (gamma_i_pos, gamma_i_neg) = model(model_input)
                loss = - self.train_loss(Xuij).sum()

                reg = (
                        _l2_loss(gamma_u, gamma_i_pos, gamma_i_neg, theta_u) * self.lambda_w
                        + _l2_loss(beta_i_pos) * self.lambda_b_pos
                        + _l2_loss(beta_i_neg) * self.lambda_b_neg
                        + _l2_loss(model.E, model.beta_prime) * self.lambda_e
                )

                loss = loss + reg
                train_loss += loss.item()

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                if (i + 1) % 100 == 0 or (i + 1) == len(train_dl):
                    pbar.set_description(f'[Epoch {epoch + 1}/{self.epochs}, '
                                         f'Batch {i + 1}/{len(train_dl)}, '
                                         f'Loss: {train_loss / n_user_processed:.3f}]')

    logger.info("Training complete!")

    logger.info("Computing visual bias and theta items for faster ranking...")
    with torch.no_grad():
        model.theta_items = items_features.mm(model.E.data).cpu()
        model.visual_bias = items_features.mm(model.beta_prime.data).squeeze().cpu()
        model.cpu()

    logger.info("Done!")

    return model

fit_predict(train_set, test_set, items_directory, user_idx_list, methodology, num_cpus, save_fit)

VBPR is not a score prediction algorithm, calling this method will raise the NotPredictionAlg exception!

RAISES DESCRIPTION
NotPredictionAlg

exception raised since the VBPR algorithm is not a score prediction algorithm

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py
482
483
484
485
486
487
488
489
490
491
492
def fit_predict(self, train_set: Ratings, test_set: Ratings, items_directory: str, user_idx_list: Set[int],
                methodology: Methodology,
                num_cpus: int, save_fit: bool) -> Tuple[Optional[VBPRNetwork], List[np.ndarray]]:
    """
    VBPR is not a score prediction algorithm, calling this method will raise the `NotPredictionAlg` exception!

    Raises:
        NotPredictionAlg: exception raised since the VBPR algorithm is not a score prediction algorithm
    """

    raise NotPredictionAlg("VBPR is not a Score Prediction Algorithm!")

fit_rank(train_set, test_set, items_directory, user_idx_list, n_recs, methodology, num_cpus, save_fit)

Method used to both fit and calculate ranking for all users in user_idx_list parameter. The algorithm will first be fit considering all users in the user_idx_list which should contain user id mapped to their integer!

With the save_fit parameter you can specify if you need the function to return the algorithm fit (in case you want to perform multiple calls to the predict() or rank() function). If set to True, the first value returned by this function will be the fit algorithm and the second will be the list of uir matrices with predictions for each user. Otherwise, if save_fit is False, the first value returned by this function will be None

PARAMETER DESCRIPTION
train_set

Ratings object which contains the train set of each user

TYPE: Ratings

test_set

Ratings object which represents the ground truth of the split considered

TYPE: Ratings

items_directory

Path where complexly represented items are serialized by the Content Analyzer

TYPE: str

user_idx_list

Set of user idx (int representation) for which a recommendation list must be generated. Users should be represented with their mapped integer!

TYPE: Set[int]

n_recs

Number of the top items that will be present in the ranking of each user. If None all candidate items will be returned for the user. Default is 10 (top-10 for each user will be computed)

TYPE: Optional[int]

methodology

Methodology object which governs the candidate item selection. Default is TestRatingsMethodology. If None, AllItemsMethodology() will be used

TYPE: Methodology

save_fit

Boolean value which let you choose if the fit algorithm should be saved and returned by this function. If True, the first value returned by this function is the fit algorithm. Otherwise, the first value will be None. The second value is always the list of predicted uir matrices

TYPE: bool

num_cpus

number of processors that must be reserved for the method. If set to 0, all cpus available will be used. Be careful though: multiprocessing in python has a substantial memory overhead!

TYPE: int

RETURNS DESCRIPTION
Optional[VBPRNetwork]

The first value is the fit VBPR algorithm (could be None if save_fit == False)

List[np.ndarray]

The second value is a list of predicted uir matrices all sorted in a decreasing order w.r.t. the ranking scores

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
def fit_rank(self, train_set: Ratings, test_set: Ratings, items_directory: str, user_idx_list: Set[int],
             n_recs: Optional[int], methodology: Methodology,
             num_cpus: int, save_fit: bool) -> Tuple[Optional[VBPRNetwork], List[np.ndarray]]:
    """
    Method used to both fit and calculate ranking for all users in `user_idx_list` parameter.
    The algorithm will first be fit considering all users in the `user_idx_list` which should contain user id
    mapped to their integer!

    With the `save_fit` parameter you can specify if you need the function to return the algorithm fit (in case
    you want to perform multiple calls to the `predict()` or `rank()` function). If set to True, the first value
    returned by this function will be the fit algorithm and the second will be the list of uir matrices with
    predictions for each user.
    Otherwise, if `save_fit` is False, the first value returned by this function will be `None`

    Args:
        train_set: `Ratings` object which contains the train set of each user
        test_set: Ratings object which represents the ground truth of the split considered
        items_directory: Path where complexly represented items are serialized by the Content Analyzer
        user_idx_list: Set of user idx (int representation) for which a recommendation list must be generated.
            Users should be represented with their mapped integer!
        n_recs: Number of the top items that will be present in the ranking of each user.
            If `None` all candidate items will be returned for the user. Default is 10 (top-10 for each user
            will be computed)
        methodology: `Methodology` object which governs the candidate item selection. Default is
            `TestRatingsMethodology`. If None, AllItemsMethodology() will be used
        save_fit: Boolean value which let you choose if the fit algorithm should be saved and returned by this
            function. If True, the first value returned by this function is the fit algorithm. Otherwise, the first
            value will be None. The second value is always the list of predicted uir matrices
        num_cpus: number of processors that must be reserved for the method. If set to `0`, all cpus available will
            be used. Be careful though: multiprocessing in python has a substantial memory overhead!

    Returns:
        The first value is the fit VBPR algorithm (could be None if `save_fit == False`)

        The second value is a list of predicted uir matrices all sorted in a decreasing order w.r.t.
            the ranking scores
    """
    vbpr_fit = self.fit(train_set, items_directory, num_cpus)
    rank = self.rank(vbpr_fit, train_set, test_set, items_directory, user_idx_list, n_recs, methodology, num_cpus)

    vbpr_fit = vbpr_fit if save_fit else None

    return vbpr_fit, rank

predict(fit_alg, train_set, test_set, items_directory, user_idx_list, methodology, num_cpus)

VBPR is not a score prediction algorithm, calling this method will raise the NotPredictionAlg exception!

RAISES DESCRIPTION
NotPredictionAlg

exception raised since the VBPR algorithm is not a score prediction algorithm

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py
426
427
428
429
430
431
432
433
434
435
436
def predict(self, fit_alg: VBPRNetwork, train_set: Ratings, test_set: Ratings, items_directory: str,
            user_idx_list: Set[int], methodology: Methodology,
            num_cpus: int) -> List[np.ndarray]:
    """
    VBPR is not a score prediction algorithm, calling this method will raise the `NotPredictionAlg` exception!

    Raises:
        NotPredictionAlg: exception raised since the VBPR algorithm is not a score prediction algorithm
    """

    raise NotPredictionAlg("VBPR is not a Score Prediction Algorithm!")

rank(fit_alg, train_set, test_set, items_directory, user_idx_list, n_recs, methodology, num_cpus)

Method used to calculate ranking for all users in user_idx_list parameter. You must first call the fit() method before you can compute the ranking. The user_idx_list parameter should contain users with mapped to their integer!

The representation of the fit VBPR algorithm is a VBPRNetwork object (torch module which implements the VBPR neural network)

If the n_recs is specified, then the rank will contain the top-n items for the users. Otherwise, the rank will contain all unrated items of the particular users.

Via the methodology parameter you can perform different candidate item selection. By default, the TestRatingsMethodology() is used: so, for each user, items in its test set only will be ranked

PARAMETER DESCRIPTION
fit_alg

a fit VBPRNetwork object (torch module which implements the VBPR neural network)

TYPE: VBPRNetwork

train_set

Ratings object which contains the train set of each user

TYPE: Ratings

test_set

Ratings object which represents the ground truth of the split considered

TYPE: Ratings

items_directory

Path where complexly represented items are serialized by the Content Analyzer

TYPE: str

user_idx_list

Set of user idx (int representation) for which a recommendation list must be generated. Users should be represented with their mapped integer!

TYPE: Set[int]

n_recs

Number of the top items that will be present in the ranking of each user. If None all candidate items will be returned for the user. Default is 10 (top-10 for each user will be computed)

TYPE: Optional[int]

methodology

Methodology object which governs the candidate item selection. Default is TestRatingsMethodology. If None, AllItemsMethodology() will be used

TYPE: Methodology

num_cpus

number of processors that must be reserved for the method. If set to 0, all cpus available will be used. Be careful though: multiprocessing in python has a substantial memory overhead!

TYPE: int

RETURNS DESCRIPTION
List[np.ndarray]

List of uir matrices for each user, where each uir contains predicted interactions between users and unseen items sorted in a descending way w.r.t. the third dimension which is the ranked score

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
def rank(self, fit_alg: VBPRNetwork, train_set: Ratings, test_set: Ratings, items_directory: str,
         user_idx_list: Set[int], n_recs: Optional[int], methodology: Methodology,
         num_cpus: int) -> List[np.ndarray]:
    """
    Method used to calculate ranking for all users in `user_idx_list` parameter.
    You must first call the `fit()` method ***before*** you can compute the ranking.
    The `user_idx_list` parameter should contain users with mapped to their integer!

    The representation of the fit VBPR algorithm is a `VBPRNetwork` object (torch module which implements the
    VBPR neural network)

    If the `n_recs` is specified, then the rank will contain the top-n items for the users.
    Otherwise, the rank will contain all unrated items of the particular users.

    Via the `methodology` parameter you can perform different candidate item selection. By default, the
    `TestRatingsMethodology()` is used: so, for each user, items in its test set only will be ranked

    Args:
        fit_alg: a fit `VBPRNetwork` object (torch module which implements the VBPR neural network)
        train_set: `Ratings` object which contains the train set of each user
        test_set: Ratings object which represents the ground truth of the split considered
        items_directory: Path where complexly represented items are serialized by the Content Analyzer
        user_idx_list: Set of user idx (int representation) for which a recommendation list must be generated.
            Users should be represented with their mapped integer!
        n_recs: Number of the top items that will be present in the ranking of each user.
            If `None` all candidate items will be returned for the user. Default is 10 (top-10 for each user
            will be computed)
        methodology: `Methodology` object which governs the candidate item selection. Default is
            `TestRatingsMethodology`. If None, AllItemsMethodology() will be used
        num_cpus: number of processors that must be reserved for the method. If set to `0`, all cpus available will
            be used. Be careful though: multiprocessing in python has a substantial memory overhead!

    Returns:
        List of uir matrices for each user, where each uir contains predicted interactions between users and unseen
            items sorted in a descending way w.r.t. the third dimension which is the ranked score
    """

    def compute_single_rank(user_idx):
        filter_list = methodology.filter_single(user_idx, train_set, test_set)
        user_rank = fit_alg.return_scores(user_idx, filter_list)
        user_uir = np.array((
            np.full(len(user_rank), user_idx),
            filter_list,
            user_rank
        )).T
        # items are not sorted so we sort them (to have descending order, we invert the values of the user uir
        # score column
        sorted_user_uir = user_uir[(-user_uir[:, 2]).argsort()]
        sorted_user_uir = sorted_user_uir[:n_recs]

        return user_idx, sorted_user_uir

    fit_alg.eval()

    methodology.setup(train_set, test_set)

    uir_rank_list = []
    with get_iterator_parallel(num_cpus,
                               compute_single_rank, user_idx_list,
                               progress_bar=True, total=len(user_idx_list)) as pbar:

        for user_idx, user_rank in pbar:
            pbar.set_description(f"Computing rank for user {user_idx}")
            uir_rank_list.append(user_rank)

    return uir_rank_list