Centroid Vector
CentroidVector(item_field, similarity, threshold=None, embedding_combiner=Centroid())
Bases: PerUserCBAlgorithm
Class that implements a centroid-like recommender. It first gets the centroid of the items that the user liked. Then computes the similarity between the centroid and the item of which the ranking score must be predicted. It's a ranking algorithm, so it can't do score prediction
- It computes the centroid vector of the features of items liked by the user
- It computes the similarity between the centroid vector and the items of which the ranking score must be predicted
The items liked by a user are those having a rating higher or equal than a specific threshold. If the threshold is not specified, the average score of all items liked by the user is used.
Examples:
- Interested in only a field representation,
CosineSimilarity
as similarity,threshold
\(= 3\) (Every item with rating \(>= 3\) will be considered as positive)
>>> from clayrs import recsys as rs
>>> alg = rs.CentroidVector({"Plot": 0}, rs.CosineSimilarity(), 3)
- Interested in multiple field representations of the items,
CosineSimilarity
as similarity,threshold
\(= None\) (Every item with rating \(>=\) mean rating of the user will be considered as positive)
>>> alg = rs.CentroidVector(
>>> item_field={"Plot": [0, "tfidf"],
>>> "Genre": [0, 1],
>>> "Director": "doc2vec"},
>>> similarity=rs.CosineSimilarity(),
>>> threshold=None)
Info
After instantiating the CentroidVector
algorithm, pass it in the initialization of
a CBRS and the use its method to calculate ranking for single user or multiple users:
Examples:
>>> cbrs = rs.ContentBasedRS(algorithm=alg, ...)
>>> cbrs.fit_rank(...)
>>> # ...
PARAMETER | DESCRIPTION |
---|---|
item_field |
dict where the key is the name of the field that contains the content to use, value is the representation(s) id(s) that will be used for the said item. The value of a field can be a string or a list, use a list if you want to use multiple representations for a particular field.
TYPE:
|
similarity |
TYPE:
|
threshold |
Threshold for the ratings. If the rating is greater than the threshold, it will be considered as positive. If the threshold is not specified, the average score of all items rated by the user is used.
TYPE:
|
embedding_combiner |
TYPE:
|
Source code in clayrs/recsys/content_based_algorithm/centroid_vector/centroid_vector.py
78 79 80 81 82 83 84 85 |
|
fit_single_user()
The fit process for the CentroidVector consists in computing the centroid for the active user of the features of its positive items ONLY.
This method uses extracted features of the positive items stored in a private attribute, so
process_rated()
must be called before this method.
The built centroid will also be stored in a private attribute.
Source code in clayrs/recsys/content_based_algorithm/centroid_vector/centroid_vector.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
predict_single_user(user_idx, train_ratings, available_loaded_items, filter_list)
CentroidVector is not a score prediction algorithm, calling this method will raise
the NotPredictionAlg
exception!
RAISES | DESCRIPTION |
---|---|
NotPredictionAlg
|
exception raised since the CentroidVector algorithm is not a score prediction algorithm |
Source code in clayrs/recsys/content_based_algorithm/centroid_vector/centroid_vector.py
176 177 178 179 180 181 182 183 184 185 |
|
process_rated(user_idx, train_ratings, available_loaded_items)
Function that extracts features from positive rated items ONLY of a user The extracted features will be used to fit the algorithm (build the centroid).
Features extracted will be stored in a private attribute of the class.
IF there are no rated items available locally or if there are only negative items, an exception is thrown.
PARAMETER | DESCRIPTION |
---|---|
user_idx |
Mapped integer of the active user (the user for which we must fit the algorithm)
TYPE:
|
train_ratings |
TYPE:
|
available_loaded_items |
The LoadedContents interface which contains loaded contents
TYPE:
|
RAISES | DESCRIPTION |
---|---|
EmptyUserRatings
|
Exception raised when the user does not appear in the train set |
NoRatedItems
|
Exception raised when there isn't any item available locally rated by the user |
OnlyNegativeitems
|
Exception raised when there are only negative items available locally for the user (Items that the user disliked) |
Source code in clayrs/recsys/content_based_algorithm/centroid_vector/centroid_vector.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
rank_single_user(user_idx, train_ratings, available_loaded_items, recs_number, filter_list)
Rank the top-n recommended items for the active user, where the top-n items to rank are controlled by the
recs_number
and filter_list
parameter:
- the former one is self-explanatory, the second is a list of items represented with their string ids. Must be necessarily strings and not their mapped integer since items are serialized following their string representation!
If recs_number
is None
, all ranked items will be returned
The filter list parameter is usually the result of the filter_single()
method of a Methodology
object
PARAMETER | DESCRIPTION |
---|---|
user_idx |
Mapped integer of the active user
TYPE:
|
train_ratings |
TYPE:
|
available_loaded_items |
The LoadedContents interface which contains loaded contents
TYPE:
|
recs_number |
number of the top ranked items to return, if None all ranked items will be returned |
filter_list |
list of the items to rank. Should contain string item ids |
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
uir matrix for a single user containing user and item idxs (integer representation) with the ranked score as third dimension sorted in a decreasing order |
Source code in clayrs/recsys/content_based_algorithm/centroid_vector/centroid_vector.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
Similarities implemented
The following are similarities you can use in the similarity
parameter of the CentroidVector
class
CosineSimilarity()
Bases: Similarity
Computes cosine similarity
Source code in clayrs/recsys/content_based_algorithm/centroid_vector/similarities.py
28 29 |
|
perform(v1, v2)
Calculates the cosine similarity between v1 and v2
PARAMETER | DESCRIPTION |
---|---|
v1 |
first numpy array |
v2 |
second numpy array |
Source code in clayrs/recsys/content_based_algorithm/centroid_vector/similarities.py
31 32 33 34 35 36 37 38 39 40 |
|