Index Query
IndexQuery(item_field, classic_similarity=True, threshold=None)
Bases: PerUserCBAlgorithm
Class for the search engine recommender using an index. It firsts builds a query using the representation(s) specified of the positive items, then uses the mentioned query to do an actual search inside the index: every item will have a score of "closeness" in relation to the query, we use this score to rank every item.
Just be sure to use textual representation(s) to build a significant query and to make a significant search!
Examples:
- Interested in only a field representation, classic tfidf similarity,
threshold
\(= 3\) (Every item with rating \(>= 3\) will be considered as positive)
>>> from clayrs import recsys as rs
>>> alg = rs.IndexQuery({"Plot": 0}, threshold=3)
- Interested in multiple field representations of the items, BM25 similarity,
threshold
\(= None\) (Every item with rating \(>=\) mean rating of the user will be considered as positive)
>>> alg = rs.IndexQuery(
>>> item_field={"Plot": [0, "original_text"],
>>> "Genre": [0, 1],
>>> "Director": "preprocessed_text"},
>>> classic_similarity=False,
>>> threshold=3)
Info
After instantiating the IndexQuery algorithm, pass it in the initialization of a CBRS and the use its method to calculate ranking for single user or multiple users:
Examples:
>>> cbrs = rs.ContentBasedRS(algorithm=alg, ...)
>>> cbrs.fit_rank(...)
>>> # ...
PARAMETER | DESCRIPTION |
---|---|
item_field |
dict where the key is the name of the field that contains the content to use, value is the representation(s) id(s) that will be used for the said item, just BE SURE to use textual representation(s). The value of a field can be a string or a list, use a list if you want to use multiple representations for a particular field.
TYPE:
|
classic_similarity |
True if you want to use the classic implementation of tfidf in Whoosh, False if you want BM25F
TYPE:
|
threshold |
Threshold for the ratings. If the rating is greater than the threshold, it will be considered as positive. If the threshold is not specified, the average score of all items rated by the user is used.
TYPE:
|
Source code in clayrs/recsys/content_based_algorithm/index_query/index_query.py
66 67 68 69 70 71 |
|
fit_single_user()
The fit process for the IndexQuery consists in building a query using the features of the positive items ONLY (items that the user liked). The terms relative to these 'positive' items are boosted by the rating he/she/it gave.
This method uses extracted features of the positive items stored in a private attribute, so
process_rated()
must be called before this method.
The built query will also be stored in a private attribute.
Source code in clayrs/recsys/content_based_algorithm/index_query/index_query.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
|
predict_single_user(user_idx, train_ratings, available_loaded_items, filter_list)
IndexQuery is not a Prediction Score Algorithm, so if this method is called, a NotPredictionAlg exception is raised
RAISES | DESCRIPTION |
---|---|
NotPredictionAlg
|
exception raised since the IndexQuery algorithm is not a score prediction algorithm |
Source code in clayrs/recsys/content_based_algorithm/index_query/index_query.py
232 233 234 235 236 237 238 239 240 241 |
|
process_rated(user_idx, train_ratings, available_loaded_items)
Function that extracts features from positive rated items ONLY of a user The extracted features will be used to fit the algorithm (build the query).
Features extracted will be stored in private attributes of the class.
IF there are no rated items available locally or if there are only positive/negative items, an exception is thrown.
PARAMETER | DESCRIPTION |
---|---|
user_idx |
Mapped integer of the active user (the user for which we must fit the algorithm)
TYPE:
|
train_ratings |
TYPE:
|
available_loaded_items |
The LoadedContents interface which contains loaded contents
TYPE:
|
RAISES | DESCRIPTION |
---|---|
EmptyUserRatings
|
Exception raised when the user does not appear in the train set |
OnlyNegativeitems
|
Exception raised when there are only negative items available locally for the user (Items that the user disliked) |
Source code in clayrs/recsys/content_based_algorithm/index_query/index_query.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
rank_single_user(user_idx, train_ratings, available_loaded_items, recs_number, filter_list)
Rank the top-n recommended items for the active user, where the top-n items to rank are controlled by the
recs_number
and filter_list
parameter:
- the former one is self-explanatory, the second is a list of items represented with their string ids. Must be necessarily strings and not their mapped integer since items are serialized following their string representation!
If recs_number
is None
, all ranked items will be returned
The filter list parameter is usually the result of the filter_single()
method of a Methodology
object
PARAMETER | DESCRIPTION |
---|---|
user_idx |
Mapped integer of the active user
TYPE:
|
train_ratings |
TYPE:
|
available_loaded_items |
The LoadedContents interface which contains loaded contents
TYPE:
|
recs_number |
number of the top ranked items to return, if None all ranked items will be returned |
filter_list |
list of the items to rank. Should contain string item ids |
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
uir matrix for a single user containing user and item idxs (integer representation) with the ranked score as third dimension sorted in a decreasing order |
Source code in clayrs/recsys/content_based_algorithm/index_query/index_query.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
|