Linear Predictor
LinearPredictor(item_field, regressor, only_greater_eq=None, embedding_combiner=Centroid())
Bases: PerUserCBAlgorithm
Class that implements recommendation through a specified linear predictor. It's a score prediction algorithm, so it can predict what rating a user would give to an unseen item. As such, it's also a ranking algorithm (it simply ranks in descending order unseen items by the predicted rating)
Examples:
- Interested in only a field representation,
LinearRegression
regressor from sklearn
>>> from clayrs import recsys as rs
>>> alg = rs.LinearPredictor({"Plot": 0}, rs.SkLinearRegression())
- Interested in only a field representation,
Ridge
regressor from sklearn with custom parameters
>>> alg = rs.LinearPredictor({"Plot": 0}, rs.SkRidge(alpha=0.8))
- Interested in multiple field representations of the items, Ridge regressor from sklearn with custom
parameters,
only_greater_eq
\(= 2\) (Every item with rating \(>= 2\) will be discarded and not considered in the ranking/score prediction task)
>>> alg = rs.LinearPredictor(
>>> item_field={"Plot": [0, "tfidf"],
>>> "Genre": [0, 1],
>>> "Director": "doc2vec"},
>>> regressor=rs.SkRidge(alpha=0.8),
>>> only_greater_eq=2)
Info
After instantiating the LinearPredictor
algorithm, pass it in the initialization of
a CBRS and the use its method to predict ratings or calculate ranking for a single user or multiple users:
Examples:
>>> cbrs = rs.ContentBasedRS(algorithm=alg, ...)
>>> cbrs.fit_predict(...)
>>> cbrs.fit_rank(...)
>>> # ...
PARAMETER | DESCRIPTION |
---|---|
item_field |
dict where the key is the name of the field that contains the content to use, value is the representation(s) id(s) that will be used for the said item. The value of a field can be a string or a list, use a list if you want to use multiple representations for a particular field.
TYPE:
|
regressor |
regressor that will be used. Can be one object of the
TYPE:
|
only_greater_eq |
Threshold for the ratings. Only items with rating greater or equal than the threshold will be considered, items with lower rating will be discarded. If None, no item will be filter out
TYPE:
|
embedding_combiner |
TYPE:
|
Source code in clayrs/recsys/content_based_algorithm/regressor/linear_predictor.py
75 76 77 78 79 80 81 |
|
fit_single_user()
Fit the regressor specified in the constructor with the features and labels (rating scores)
extracted with the process_rated()
method.
It uses private attributes to fit the regressor, so process_rated()
must be called
before this method.
Source code in clayrs/recsys/content_based_algorithm/regressor/linear_predictor.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|
predict_single_user(user_idx, train_ratings, available_loaded_items, filter_list)
Predicts how much a user will like unrated items.
The filter list parameter is usually the result of the filter_single()
method of a Methodology
object, and
is a list of items represented with their string ids. Must be necessarily strings and not their mapped integer
since items are serialized following their string representation!
PARAMETER | DESCRIPTION |
---|---|
user_idx |
Mapped integer of the active user
TYPE:
|
train_ratings |
TYPE:
|
available_loaded_items |
The LoadedContents interface which contains loaded contents
TYPE:
|
filter_list |
list of the items to rank. Should contain string item ids |
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
uir matrix for a single user containing user and item idxs (integer representation) with the predicted score as third dimension |
Source code in clayrs/recsys/content_based_algorithm/regressor/linear_predictor.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
|
process_rated(user_idx, train_ratings, available_loaded_items)
Function that extracts features from rated item and labels them. The extracted features will be later used to fit the regressor.
Features and labels (in this case the rating score) will be stored in private attributes of the class.
IF there are no rated items available locally, an exception is thrown.
PARAMETER | DESCRIPTION |
---|---|
user_idx |
Mapped integer of the active user (the user for which we must fit the algorithm)
TYPE:
|
train_ratings |
TYPE:
|
available_loaded_items |
The LoadedContents interface which contains loaded contents
TYPE:
|
RAISES | DESCRIPTION |
---|---|
EmptyUserRatings
|
Exception raised when the user does not appear in the train set |
NoRatedItems
|
Exception raised when there isn't any item available locally rated by the user |
Source code in clayrs/recsys/content_based_algorithm/regressor/linear_predictor.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
rank_single_user(user_idx, train_ratings, available_loaded_items, recs_number, filter_list)
Rank the top-n recommended items for the active user, where the top-n items to rank are controlled by the
recs_number
and filter_list
parameter:
- the former one is self-explanatory, the second is a list of items represented with their string ids. Must be necessarily strings and not their mapped integer since items are serialized following their string representation!
If recs_number
is None
, all ranked items will be returned
The filter list parameter is usually the result of the filter_single()
method of a Methodology
object
PARAMETER | DESCRIPTION |
---|---|
user_idx |
Mapped integer of the active user
TYPE:
|
train_ratings |
TYPE:
|
available_loaded_items |
The LoadedContents interface which contains loaded contents
TYPE:
|
recs_number |
number of the top ranked items to return, if None all ranked items will be returned |
filter_list |
list of the items to rank. Should contain string item ids |
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
uir matrix for a single user containing user and item idxs (integer representation) with the ranked score as third dimension sorted in a decreasing order |
Source code in clayrs/recsys/content_based_algorithm/regressor/linear_predictor.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 |
|
Regressors Implemented
The following are the regressors you can use in the regressor
parameter of the LinearPredictor
class
SkARDRegression(*, n_iter=300, tol=0.001, alpha_1=1e-06, alpha_2=1e-06, lambda_1=1e-06, lambda_2=1e-06, compute_score=False, threshold_lambda=10000.0, fit_intercept=True, normalize='deprecated', copy_X=True, verbose=False)
Bases: Regressor
Class that implements the ARD regressor from sklearn. The parameters one could pass are the same ones you would pass instantiating the regressor ARD directly from sklearn.
Sklearn documentation: here
Source code in clayrs/recsys/content_based_algorithm/regressor/regressors.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
|
SkBayesianRidge(*, n_iter=300, tol=0.001, alpha_1=1e-06, alpha_2=1e-06, lambda_1=1e-06, lambda_2=1e-06, alpha_init=None, lambda_init=None, compute_score=False, fit_intercept=True, normalize='deprecated', copy_X=True, verbose=False)
Bases: Regressor
Class that implements the BayesianRidge regressor from sklearn. The parameters one could pass are the same ones you would pass instantiating the regressor BayesianRidge directly from sklearn.
Sklearn documentation: here
Source code in clayrs/recsys/content_based_algorithm/regressor/regressors.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
SkHuberRegressor(*, epsilon=1.35, max_iter=100, alpha=0.0001, warm_start=False, fit_intercept=True, tol=1e-05)
Bases: Regressor
Class that implements the Huber regressor from sklearn. The parameters one could pass are the same ones you would pass instantiating the regressor Huber directly from sklearn.
Sklearn documentation: here
Source code in clayrs/recsys/content_based_algorithm/regressor/regressors.py
256 257 258 259 260 261 262 263 264 265 266 |
|
SkLinearRegression(*, fit_intercept=True, normalize='deprecated', copy_X=True, n_jobs=None, positive=False)
Bases: Regressor
Class that implements the LinearRegression regressor from sklearn. The parameters one could pass are the same ones you would pass instantiating the regressor LinearRegression directly from sklearn.
Sklearn documentation: here
Source code in clayrs/recsys/content_based_algorithm/regressor/regressors.py
80 81 82 83 84 85 86 87 88 |
|
SkPassiveAggressiveRegressor(*, C=1.0, fit_intercept=True, max_iter=1000, tol=0.001, early_stopping=False, validation_fraction=0.1, n_iter_no_change=5, shuffle=True, verbose=0, loss='epsilon_insensitive', epsilon=DEFAULT_EPSILON, random_state=None, warm_start=False, average=False)
Bases: Regressor
Class that implements the PassiveAggressive regressor from sklearn. The parameters one could pass are the same ones you would pass instantiating the regressor PassiveAggressive directly from sklearn.
Sklearn documentation: here
Source code in clayrs/recsys/content_based_algorithm/regressor/regressors.py
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
|
SkRidge(alpha=1.0, *, fit_intercept=True, normalize='deprecated', copy_X=True, max_iter=None, tol=0.001, solver='auto', positive=False, random_state=None)
Bases: Regressor
Class that implements the Ridge regressor from sklearn. The parameters one could pass are the same ones you would pass instantiating the regressor Ridge directly from sklearn.
Sklearn documentation: here
Source code in clayrs/recsys/content_based_algorithm/regressor/regressors.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
SkSGDRegressor(loss='squared_error', *, penalty='l2', alpha=0.0001, l1_ratio=0.15, fit_intercept=True, max_iter=1000, tol=0.001, shuffle=True, verbose=0, epsilon=DEFAULT_EPSILON, random_state=None, learning_rate='invscaling', eta0=0.01, power_t=0.25, early_stopping=False, validation_fraction=0.1, n_iter_no_change=5, warm_start=False, average=False)
Bases: Regressor
Class that implements the SGD regressor from sklearn. The parameters one could pass are the same ones you would pass instantiating the regressor SGD directly from sklearn.
Sklearn documentation: here
Source code in clayrs/recsys/content_based_algorithm/regressor/regressors.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
|