Centroid Vector

`VBPR(item_field, gamma_dim, theta_dim, batch_size, epochs, threshold=0, learning_rate=0.005, lambda_w=0.01, lambda_b_pos=0.01, lambda_b_neg=0.001, lambda_e=0, train_loss=fun.logsigmoid, optimizer_class=torch.optim.Adam, device=None, embedding_combiner=Centroid(), normalize=True, seed=None, additional_opt_parameters=None, additional_dl_parameters=None)`

Bases: ContentBasedAlgorithm

Class that implements recommendation through the VBPR algorithm. It's a ranking algorithm, so it can't do score prediction.

The VBPR algorithm expects features extracted from images and works on implicit feedback, but in theory you could use any embedding representation, and you can use explicit feedback which will be converted into implicit one thanks to the threshold parameter:

All scores \(>= threshold\) are considered positive scores

For more details on VBPR algorithm, please check the relative paper here

PARAMETER	DESCRIPTION
`item_field`	dict where the key is the name of the field that contains the content to use, value is the representation(s) id(s) that will be used for the said item. The value of a field can be a string or a list, use a list if you want to use multiple representations for a particular field. TYPE: `dict`
`gamma_dim`	dimension of latent factors for non-visual parameters TYPE: `int`
`theta_dim`	dimension of latent factors for visual parameters TYPE: `int`
`batch_size`	dimension of each batch of the torch dataloader for the images features TYPE: `int`
`epochs`	number of training epochs TYPE: `int`
`threshold`	float value which is used to distinguish positive from negative items. If None, it will vary for each user, and it will be set to the average rating given by it TYPE: `Optional[float]` DEFAULT: `0`
`learning_rate`	learning rate for the torch optimizer TYPE: `float` DEFAULT: `0.005`
`lambda_w`	weight assigned to the regularization of the loss on \(\gamma_u\), \(\gamma_i\), \(\theta_u\) TYPE: `float` DEFAULT: `0.01`
`lambda_b_pos`	weight assigned to the regularization of the loss on \(\beta_i\) for the positive items TYPE: `float` DEFAULT: `0.01`
`lambda_b_neg`	weight assigned to the regularization of the loss on \(\beta_i\) for the negative items TYPE: `float` DEFAULT: `0.001`
`lambda_e`	weight assigned to the regularization of the loss on \(\beta'\), \(E\) TYPE: `float` DEFAULT: `0`
`train_loss`	loss function for the training phase. Default is logsigmoid TYPE: `Callable[[torch.Tensor], torch.Tensor]` DEFAULT: `fun.logsigmoid`
`optimizer_class`	optimizer torch class for the training phase. It will be instantiated using `additional_opt_parameters` if specified TYPE: `Type[torch.optim.Optimizer]` DEFAULT: `torch.optim.Adam`
`device`	device on which the training will be run. If None and a GPU is available, then the GPU is automatically selected as device to use. Otherwise, the cpu is used TYPE: `str` DEFAULT: `None`
`embedding_combiner`	`CombiningTechnique` used when embeddings representation must be used, but they are in a matrix form instead of a single vector (e.g. WordEmbedding representations have one vector for each word). By default, the `Centroid` of the rows of the matrix is computed TYPE: `CombiningTechnique` DEFAULT: `Centroid()`
`normalize`	Whether to normalize input features or not. If True, the input feature matrix is subtracted to its \(min\) and divided by its \(max + 1e-10\) TYPE: `bool` DEFAULT: `True`
`seed`	random state which will be used for weight initialization and sampling of the negative example TYPE: `int` DEFAULT: `None`
`additional_opt_parameters`	kwargs for the optimizer. If you specify learning rate in this parameter, it will be overwritten by the local `learning_rate` parameter TYPE: `Dict[str, Any]` DEFAULT: `None`
`additional_dl_parameters`	kwargs for the dataloader. If you specify batch size in this parameter, it will be overwritten by the local `batch_size` parameter TYPE: `Dict[str, Any]` DEFAULT: `None`

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py

def __init__(self, item_field: dict,
             gamma_dim: int, theta_dim: int, batch_size: int, epochs: int,
             threshold: Optional[float] = 0,
             learning_rate: float = 0.005,
             lambda_w: float = 0.01, lambda_b_pos: float = 0.01, lambda_b_neg: float = 0.001, lambda_e: float = 0,
             train_loss: Callable[[torch.Tensor], torch.Tensor] = fun.logsigmoid,
             optimizer_class: Type[torch.optim.Optimizer] = torch.optim.Adam,
             device: str = None,
             embedding_combiner: CombiningTechnique = Centroid(),
             normalize: bool = True,
             seed: int = None,
             additional_opt_parameters: Dict[str, Any] = None,
             additional_dl_parameters: Dict[str, Any] = None):

    super().__init__(item_field, threshold)

    if additional_opt_parameters is None:
        additional_opt_parameters = {}

    if additional_dl_parameters is None:
        additional_dl_parameters = {}

    additional_opt_parameters["lr"] = learning_rate
    additional_dl_parameters["batch_size"] = batch_size

    self.device = device if device is not None else "cuda:0" if torch.cuda.is_available() else "cpu"

    self.gamma_dim = gamma_dim
    self.theta_dim = theta_dim

    self.epochs = epochs
    self.train_loss = train_loss
    self.train_optimizer = optimizer_class
    self.train_optimizer_parameters = additional_opt_parameters
    self.normalize = normalize
    self.lambda_w = lambda_w
    self.lambda_b_pos = lambda_b_pos
    self.lambda_b_neg = lambda_b_neg
    self.lambda_e = lambda_e

    self._embedding_combiner = embedding_combiner

    self.seed = seed
    self.dl_parameters = additional_dl_parameters

`fit(train_set, items_directory, num_cpus=-1)`

Method which will fit the VBPR algorithm via neural training with torch

PARAMETER DESCRIPTION

train_set

Ratings object which contains the train set of each user

TYPE: Ratings

items_directory

Path where complexly represented items are serialized by the Content Analyzer

TYPE: str

num_cpus

number of processors that must be reserved for the method. If set to 0, all cpus available will be used. Be careful though: multiprocessing in python has a substantial memory overhead!

TYPE: int DEFAULT: -1

RETURNS	DESCRIPTION
`VBPRNetwork`	A fit VBPRNetwork object (torch module which implements the VBPR neural network)

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py

def fit(self, train_set: Ratings, items_directory: str, num_cpus: int = -1) -> VBPRNetwork:
    """
    Method which will fit the VBPR algorithm via neural training with torch

    Args:
        train_set: `Ratings` object which contains the train set of each user
        items_directory: Path where complexly represented items are serialized by the Content Analyzer
        num_cpus: number of processors that must be reserved for the method. If set to `0`, all cpus available will
            be used. Be careful though: multiprocessing in python has a substantial memory overhead!

    Returns:
        A fit VBPRNetwork object (torch module which implements the VBPR neural network)
    """

    def _l2_loss(*tensors):
        l2_loss = 0
        for tensor in tensors:
            l2_loss += tensor.pow(2).sum()
        return l2_loss / 2

    train_set = self._build_only_positive_ratings(train_set)

    items_features = self._load_items_features(train_set, items_directory)

    self._seed_all()

    items_features = torch.tensor(items_features, device=self.device, dtype=torch.float)

    model = VBPRNetwork(n_users=len(train_set.user_map),
                        n_items=len(train_set.item_map),
                        features_dim=items_features.shape[1],
                        gamma_dim=self.gamma_dim,
                        theta_dim=self.theta_dim,
                        device=self.device)

    optimizer = self.train_optimizer([
        model.beta_items,
        model.gamma_users,
        model.gamma_items,
        model.theta_users,
        model.E,
        model.beta_prime
    ], **self.train_optimizer_parameters)

    train_dataset = TriplesDataset(train_set, self.seed)

    train_dl = torch.utils.data.DataLoader(train_dataset, **self.dl_parameters)

    model.train()

    logger.info("Starting VBPR training!")
    for epoch in range(self.epochs):

        train_loss = 0
        n_user_processed = 0

        with get_progbar(train_dl) as pbar:

            pbar.set_description(f"Starting {epoch + 1}/{self.epochs} epoch...")

            for i, batch in enumerate(pbar):

                user_idx = batch[0].long()
                pos_idx = batch[1].long()
                neg_idx = batch[2].long()

                n_user_processed += len(user_idx)

                positive_features = items_features[pos_idx]
                negative_features = items_features[neg_idx]

                model_input = (
                    user_idx.to(self.device),
                    pos_idx.to(self.device),
                    neg_idx.to(self.device),
                    positive_features.to(self.device),
                    negative_features.to(self.device)
                )

                Xuij, (gamma_u, theta_u), (beta_i_pos, beta_i_neg), (gamma_i_pos, gamma_i_neg) = model(model_input)
                loss = - self.train_loss(Xuij).sum()

                reg = (
                        _l2_loss(gamma_u, gamma_i_pos, gamma_i_neg, theta_u) * self.lambda_w
                        + _l2_loss(beta_i_pos) * self.lambda_b_pos
                        + _l2_loss(beta_i_neg) * self.lambda_b_neg
                        + _l2_loss(model.E, model.beta_prime) * self.lambda_e
                )

                loss = loss + reg
                train_loss += loss.item()

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                if (i + 1) % 100 == 0 or (i + 1) == len(train_dl):
                    pbar.set_description(f'[Epoch {epoch + 1}/{self.epochs}, '
                                         f'Batch {i + 1}/{len(train_dl)}, '
                                         f'Loss: {train_loss / n_user_processed:.3f}]')

    logger.info("Training complete!")

    logger.info("Computing visual bias and theta items for faster ranking...")
    with torch.no_grad():
        model.theta_items = items_features.mm(model.E.data).cpu()
        model.visual_bias = items_features.mm(model.beta_prime.data).squeeze().cpu()
        model.cpu()

    logger.info("Done!")

    return model

`fit_predict(train_set, test_set, items_directory, user_idx_list, methodology, num_cpus, save_fit)`

VBPR is not a score prediction algorithm, calling this method will raise the NotPredictionAlg exception!

RAISES	DESCRIPTION
`NotPredictionAlg`	exception raised since the VBPR algorithm is not a score prediction algorithm

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py

def fit_predict(self, train_set: Ratings, test_set: Ratings, items_directory: str, user_idx_list: Set[int],
                methodology: Methodology,
                num_cpus: int, save_fit: bool) -> Tuple[Optional[VBPRNetwork], List[np.ndarray]]:
    """
    VBPR is not a score prediction algorithm, calling this method will raise the `NotPredictionAlg` exception!

    Raises:
        NotPredictionAlg: exception raised since the VBPR algorithm is not a score prediction algorithm
    """

    raise NotPredictionAlg("VBPR is not a Score Prediction Algorithm!")

`fit_rank(train_set, test_set, items_directory, user_idx_list, n_recs, methodology, num_cpus, save_fit)`

Method used to both fit and calculate ranking for all users in user_idx_list parameter. The algorithm will first be fit considering all users in the user_idx_list which should contain user id mapped to their integer!

With the save_fit parameter you can specify if you need the function to return the algorithm fit (in case you want to perform multiple calls to the predict() or rank() function). If set to True, the first value returned by this function will be the fit algorithm and the second will be the list of uir matrices with predictions for each user. Otherwise, if save_fit is False, the first value returned by this function will be None

PARAMETER	DESCRIPTION
`train_set`	`Ratings` object which contains the train set of each user TYPE: `Ratings`
`test_set`	Ratings object which represents the ground truth of the split considered TYPE: `Ratings`
`items_directory`	Path where complexly represented items are serialized by the Content Analyzer TYPE: `str`
`user_idx_list`	Set of user idx (int representation) for which a recommendation list must be generated. Users should be represented with their mapped integer! TYPE: `Set[int]`
`n_recs`	Number of the top items that will be present in the ranking of each user. If `None` all candidate items will be returned for the user. Default is 10 (top-10 for each user will be computed) TYPE: `Optional[int]`
`methodology`	`Methodology` object which governs the candidate item selection. Default is `TestRatingsMethodology`. If None, AllItemsMethodology() will be used TYPE: `Methodology`
`save_fit`	Boolean value which let you choose if the fit algorithm should be saved and returned by this function. If True, the first value returned by this function is the fit algorithm. Otherwise, the first value will be None. The second value is always the list of predicted uir matrices TYPE: `bool`
`num_cpus`	number of processors that must be reserved for the method. If set to `0`, all cpus available will be used. Be careful though: multiprocessing in python has a substantial memory overhead! TYPE: `int`

RETURNS	DESCRIPTION
`Optional[VBPRNetwork]`	The first value is the fit VBPR algorithm (could be None if `save_fit == False`)
`List[np.ndarray]`	The second value is a list of predicted uir matrices all sorted in a decreasing order w.r.t. the ranking scores

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py

def fit_rank(self, train_set: Ratings, test_set: Ratings, items_directory: str, user_idx_list: Set[int],
             n_recs: Optional[int], methodology: Methodology,
             num_cpus: int, save_fit: bool) -> Tuple[Optional[VBPRNetwork], List[np.ndarray]]:
    """
    Method used to both fit and calculate ranking for all users in `user_idx_list` parameter.
    The algorithm will first be fit considering all users in the `user_idx_list` which should contain user id
    mapped to their integer!

    With the `save_fit` parameter you can specify if you need the function to return the algorithm fit (in case
    you want to perform multiple calls to the `predict()` or `rank()` function). If set to True, the first value
    returned by this function will be the fit algorithm and the second will be the list of uir matrices with
    predictions for each user.
    Otherwise, if `save_fit` is False, the first value returned by this function will be `None`

    Args:
        train_set: `Ratings` object which contains the train set of each user
        test_set: Ratings object which represents the ground truth of the split considered
        items_directory: Path where complexly represented items are serialized by the Content Analyzer
        user_idx_list: Set of user idx (int representation) for which a recommendation list must be generated.
            Users should be represented with their mapped integer!
        n_recs: Number of the top items that will be present in the ranking of each user.
            If `None` all candidate items will be returned for the user. Default is 10 (top-10 for each user
            will be computed)
        methodology: `Methodology` object which governs the candidate item selection. Default is
            `TestRatingsMethodology`. If None, AllItemsMethodology() will be used
        save_fit: Boolean value which let you choose if the fit algorithm should be saved and returned by this
            function. If True, the first value returned by this function is the fit algorithm. Otherwise, the first
            value will be None. The second value is always the list of predicted uir matrices
        num_cpus: number of processors that must be reserved for the method. If set to `0`, all cpus available will
            be used. Be careful though: multiprocessing in python has a substantial memory overhead!

    Returns:
        The first value is the fit VBPR algorithm (could be None if `save_fit == False`)

        The second value is a list of predicted uir matrices all sorted in a decreasing order w.r.t.
            the ranking scores
    """
    vbpr_fit = self.fit(train_set, items_directory, num_cpus)
    rank = self.rank(vbpr_fit, train_set, test_set, items_directory, user_idx_list, n_recs, methodology, num_cpus)

    vbpr_fit = vbpr_fit if save_fit else None

    return vbpr_fit, rank

`predict(fit_alg, train_set, test_set, items_directory, user_idx_list, methodology, num_cpus)`

VBPR is not a score prediction algorithm, calling this method will raise the NotPredictionAlg exception!

RAISES	DESCRIPTION
`NotPredictionAlg`	exception raised since the VBPR algorithm is not a score prediction algorithm

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py

def predict(self, fit_alg: VBPRNetwork, train_set: Ratings, test_set: Ratings, items_directory: str,
            user_idx_list: Set[int], methodology: Methodology,
            num_cpus: int) -> List[np.ndarray]:
    """
    VBPR is not a score prediction algorithm, calling this method will raise the `NotPredictionAlg` exception!

    Raises:
        NotPredictionAlg: exception raised since the VBPR algorithm is not a score prediction algorithm
    """

    raise NotPredictionAlg("VBPR is not a Score Prediction Algorithm!")

`rank(fit_alg, train_set, test_set, items_directory, user_idx_list, n_recs, methodology, num_cpus)`

Method used to calculate ranking for all users in user_idx_list parameter. You must first call the fit() method before you can compute the ranking. The user_idx_list parameter should contain users with mapped to their integer!

The representation of the fit VBPR algorithm is a VBPRNetwork object (torch module which implements the VBPR neural network)

If the n_recs is specified, then the rank will contain the top-n items for the users. Otherwise, the rank will contain all unrated items of the particular users.

Via the methodology parameter you can perform different candidate item selection. By default, the TestRatingsMethodology() is used: so, for each user, items in its test set only will be ranked

PARAMETER	DESCRIPTION
`fit_alg`	a fit `VBPRNetwork` object (torch module which implements the VBPR neural network) TYPE: `VBPRNetwork`
`train_set`	`Ratings` object which contains the train set of each user TYPE: `Ratings`
`test_set`	Ratings object which represents the ground truth of the split considered TYPE: `Ratings`
`items_directory`	Path where complexly represented items are serialized by the Content Analyzer TYPE: `str`
`user_idx_list`	Set of user idx (int representation) for which a recommendation list must be generated. Users should be represented with their mapped integer! TYPE: `Set[int]`
`n_recs`	Number of the top items that will be present in the ranking of each user. If `None` all candidate items will be returned for the user. Default is 10 (top-10 for each user will be computed) TYPE: `Optional[int]`
`methodology`	`Methodology` object which governs the candidate item selection. Default is `TestRatingsMethodology`. If None, AllItemsMethodology() will be used TYPE: `Methodology`
`num_cpus`	number of processors that must be reserved for the method. If set to `0`, all cpus available will be used. Be careful though: multiprocessing in python has a substantial memory overhead! TYPE: `int`

RETURNS	DESCRIPTION
`List[np.ndarray]`	List of uir matrices for each user, where each uir contains predicted interactions between users and unseen items sorted in a descending way w.r.t. the third dimension which is the ranked score

Source code in clayrs/recsys/visual_based_algorithm/vbpr/vbpr_algorithm.py

def rank(self, fit_alg: VBPRNetwork, train_set: Ratings, test_set: Ratings, items_directory: str,
         user_idx_list: Set[int], n_recs: Optional[int], methodology: Methodology,
         num_cpus: int) -> List[np.ndarray]:
    """
    Method used to calculate ranking for all users in `user_idx_list` parameter.
    You must first call the `fit()` method ***before*** you can compute the ranking.
    The `user_idx_list` parameter should contain users with mapped to their integer!

    The representation of the fit VBPR algorithm is a `VBPRNetwork` object (torch module which implements the
    VBPR neural network)

    If the `n_recs` is specified, then the rank will contain the top-n items for the users.
    Otherwise, the rank will contain all unrated items of the particular users.

    Via the `methodology` parameter you can perform different candidate item selection. By default, the
    `TestRatingsMethodology()` is used: so, for each user, items in its test set only will be ranked

    Args:
        fit_alg: a fit `VBPRNetwork` object (torch module which implements the VBPR neural network)
        train_set: `Ratings` object which contains the train set of each user
        test_set: Ratings object which represents the ground truth of the split considered
        items_directory: Path where complexly represented items are serialized by the Content Analyzer
        user_idx_list: Set of user idx (int representation) for which a recommendation list must be generated.
            Users should be represented with their mapped integer!
        n_recs: Number of the top items that will be present in the ranking of each user.
            If `None` all candidate items will be returned for the user. Default is 10 (top-10 for each user
            will be computed)
        methodology: `Methodology` object which governs the candidate item selection. Default is
            `TestRatingsMethodology`. If None, AllItemsMethodology() will be used
        num_cpus: number of processors that must be reserved for the method. If set to `0`, all cpus available will
            be used. Be careful though: multiprocessing in python has a substantial memory overhead!

    Returns:
        List of uir matrices for each user, where each uir contains predicted interactions between users and unseen
            items sorted in a descending way w.r.t. the third dimension which is the ranked score
    """

    def compute_single_rank(user_idx):
        filter_list = methodology.filter_single(user_idx, train_set, test_set)
        user_rank = fit_alg.return_scores(user_idx, filter_list)
        user_uir = np.array((
            np.full(len(user_rank), user_idx),
            filter_list,
            user_rank
        )).T
        # items are not sorted so we sort them (to have descending order, we invert the values of the user uir
        # score column
        sorted_user_uir = user_uir[(-user_uir[:, 2]).argsort()]
        sorted_user_uir = sorted_user_uir[:n_recs]

        return user_idx, sorted_user_uir

    fit_alg.eval()

    methodology.setup(train_set, test_set)

    uir_rank_list = []
    with get_iterator_parallel(num_cpus,
                               compute_single_rank, user_idx_list,
                               progress_bar=True, total=len(user_idx_list)) as pbar:

        for user_idx, user_rank in pbar:
            pbar.set_description(f"Computing rank for user {user_idx}")
            uir_rank_list.append(user_rank)

    return uir_rank_list