Experiment class
ContentBasedExperiment(original_ratings, partitioning_technique, algorithm_list, items_directory, users_directory=None, metric_list=None, report=False, output_folder='experiment_result', overwrite_if_exists=False)
Bases: Experiment
The Experiment
class for content based algorithms
It provides an easy interface to perform a complete experiment by comparing different cb-algorithms, starting from
splitting the dataset to evaluating predictions computed.
It is also capable of producing a yml
report for both the recsys phase and the evaluation phase.
Both the evaluation phase and the report are optional and are produced only if specified.
All the results (split, ranking, evaluation results, etc.) will be saved in the folder specified with the
output_folder
parameter. For each algorithm a different sub-folder will be created and named after it:
- If multiple instances of the same algorithm are present in the
algorithm_list
, sub-folders will be disambiguated depending on the order of execution (algName_1
,algName_2
,algName_3
, etc.)
Info
Please remember that by default if a folder with same name of the output_folder
parameter is present,
the experiment won't run and an exception will be raised. To overcome this, simply set the overwrite_if_exists
parameter to True
or change the output_folder
.
Examples:
Suppose you want to compare:
- A
CentroidVector
algorithm - The
SVC
classifier - The
KNN
classifier
For the three different configuration, an HoldOut
partitioning technique should be used and results should
be evaluated on \(Precision\) and \(Recall\)
from clayrs.utils import ContentBasedExperiment
from clayrs import content_analyzer as ca
from clayrs import content_analyzer as rs
original_rat = ca.Ratings(ca.CSVFile(ratings_path))
alg1 = rs.CentroidVector({'Plot': 'tfidf'},
similarity=rs.CosineSimilarity()) # (1)
alg2 = rs.ClassifierRecommender({'Plot': 'tfidf'},
classifier=rs.SkSVC()) # (2)
alg3 = rs.ClassifierRecommender({'Plot': 'tfidf'},
classifier=rs.SkKNN()) # (3)
a = ContentBasedExperiment(
original_ratings=rat,
partitioning_technique=rs.HoldOutPartitioning(),
algorithm_list=[alg1, alg2, alg3],
items_directory=movies_dir,
metric_list=[eva.Precision(), eva.Recall()]
output_folder="my_experiment"
)
a.rank()
- Results will be saved in my_experiment/CentroidVector_1
- Results will be saved in my_experiment/ClassifierRecommender_1
- Results will be saved in my_experiment/ClassifierRecommender_2
PARAMETER | DESCRIPTION |
---|---|
original_ratings |
Ratings object containing original interactions between users and items
TYPE:
|
partitioning_technique |
Partitioning object which specifies how the original ratings should be split
TYPE:
|
algorithm_list |
List of Content Based algorithms for which the whole experiment will be conducted
TYPE:
|
items_directory |
Path to the folder containing serialized complexly represented items
TYPE:
|
users_directory |
Path to the folder containing serialized complexly represented items. Needed only if
one or more algorithms in
TYPE:
|
metric_list |
List of metric with which predictions computed by the CBRS will be evaluated
TYPE:
|
report |
If
TYPE:
|
output_folder |
Path of the folder where all the results of the experiment will be saved
TYPE:
|
overwrite_if_exists |
If
TYPE:
|
Source code in clayrs/recsys/experiment.py
312 313 314 315 316 317 318 319 320 321 322 323 |
|
predict(user_id_list=None, methodology=TestRatingsMethodology(), num_cpus=1, skip_alg_error=True)
Method used to perform an experiment which involves score predictions.
The method will first split the original ratings passed in the constructor in train and test set, then the Recommender System will be fit for each user in the train set.
If the algorithm can't be fit for some users, a warning message is printed and no score predictions will be computed for said user
Info
BE CAREFUL: not all algorithms are able to perform score prediction. In case a pure ranking algorithm
is asked to perform score prediction, the NotPredictionAlg
will be raised. if the skip_alg_error
is set
to True
, then said exception will be caught, a warning will be printed, and the experiment will go on with
the next algorithm
Via the methodology
parameter you can perform different candidate item selection. By default, the
TestRatingsMethodology()
is used: so, for each user, items in its test set only will be considered for score
prediction
PARAMETER | DESCRIPTION |
---|---|
user_id_list |
List of users for which you want to compute the ranking. If None, the ranking will be computed
for all users of the |
methodology |
TYPE:
|
num_cpus |
number of processors that must be reserved for the method. Default is 0, meaning that the number of cpus will be automatically detected.
TYPE:
|
skip_alg_error |
If set to
TYPE:
|
RAISES | DESCRIPTION |
---|---|
NotPredictionAlg
|
When a pure ranking algorithm is asked to perform score prediction and
|
Source code in clayrs/recsys/experiment.py
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
|
rank(n_recs=10, user_id_list=None, methodology=TestRatingsMethodology(), num_cpus=1)
Method used to perform an experiment which involves rankings.
The method will first split the original ratings passed in the constructor in train and test set, then the Recommender System will be fit for each user in the train. If the algorithm can't be fit for some users, a warning message is printed and no ranking will be computed for said user
Via the methodology
parameter you can perform different candidate item selection. By default, the
TestRatingsMethodology()
is used: so, for each user, items in its test set only will be considered as eligible
for ranking
PARAMETER | DESCRIPTION |
---|---|
n_recs |
Number of the top items that will be present in the ranking of each user.
If |
user_id_list |
List of users for which you want to compute the ranking. If None, the ranking will be computed
for all users of the |
methodology |
TYPE:
|
num_cpus |
number of processors that must be reserved for the method. Default is 0, meaning that the number of cpus will be automatically detected
TYPE:
|
Source code in clayrs/recsys/experiment.py
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 |
|
GraphBasedExperiment(original_ratings, partitioning_technique, algorithm_list, items_directory=None, item_exo_properties=None, users_directory=None, user_exo_properties=None, link_label=None, metric_list=None, report=False, output_folder='experiment_result', overwrite_if_exists=False)
Bases: Experiment
The Experiment
class for graph based algorithms
It provides an easy interface to perform a complete experiment by comparing different gb-algorithms, starting from splitting the dataset to evaluating predictions computed.
Every graph based algorithm expects a graph: that's why right before computing ranking/score predictions, a graph will be created depending on the current train and test split (if multiple are available):
- All the nodes from the original graph will be present, The interactions in the test set will be missing (It won't be represented as a link between a user node and an item node)
The class is also capable of producing a yml
report for both the recsys phase and the evaluation phase.
Both the evaluation phase and the report are optional and are produced only if specified.
All the results (split, ranking, evaluation results, etc.) will be saved in the folder specified with the
output_folder
parameter. For each algorithm a different sub-folder will be created and named after it:
- If multiple instances of the same algorithm are present in the
algorithm_list
, sub-folders will be disambiguated depending on the order of execution (algName_1
,algName_2
,algName_3
, etc.)
Info
Please remember that by default if a folder with same name of the output_folder
parameter is present,
the experiment won't run and an exception will be raised. To overcome this, simply set the overwrite_if_exists
parameter to True
or change the output_folder
.
Examples:
Suppose you want to compare:
- The
PageRank
algorithm withalpha=0.8
- The
PageRank
algorithm withalpha=0.9
- The
Personalized PageRank
algorithm
For the three different configuration, a KFold
partitioning technique with three splits should be used and
results should be evaluated on \(Precision\), \(Recall\), \(NDCG\)
from clayrs.utils import GraphBasedExperiment
from clayrs import content_analyzer as ca
from clayrs import content_analyzer as rs
original_rat = ca.Ratings(ca.CSVFile(ratings_path))
alg1 = rs.NXPageRank(alpha=0.8) # (1)
alg2 = rs.NXPageRank(alpha=0.9) # (2)
alg3 = rs.NXPageRank(personalized=True) # (3)
a = GraphBasedExperiment(
original_ratings=rat,
partitioning_technique=rs.KFoldPartitioning(n_splits=3),
algorithm_list=[alg1, alg2, alg3],
items_directory=movies_dir,
metric_list=[eva.Precision(), eva.Recall()]
output_folder="my_experiment"
)
a.rank()
- Results will be saved in my_experiment/NXPageRank_1
- Results will be saved in my_experiment/NXPageRank_2
- Results will be saved in my_experiment/NXPageRank_3
PARAMETER | DESCRIPTION |
---|---|
original_ratings |
Ratings object containing original interactions between users and items
TYPE:
|
partitioning_technique |
Partitioning object which specifies how the original ratings should be split
TYPE:
|
algorithm_list |
List of Graph Based algorithms for which the whole experiment will be conducted
TYPE:
|
items_directory |
Path to the folder containing serialized complexly represented items with one or more exogenous property to load
TYPE:
|
item_exo_properties |
Set or Dict which contains representations to load from items. Use a |
users_directory |
Path to the folder containing serialized complexly represented users with one or more exogenous property to load
TYPE:
|
user_exo_properties |
Set or Dict which contains representations to load from items. Use a |
link_label |
If specified, each link between user and item nodes will be labeled with the given label. Default is None
TYPE:
|
metric_list |
List of metric with which predictions computed by the GBRS will be evaluated
TYPE:
|
report |
If
TYPE:
|
output_folder |
Path of the folder where all the results of the experiment will be saved
TYPE:
|
overwrite_if_exists |
If
TYPE:
|
Source code in clayrs/recsys/experiment.py
537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 |
|
predict(user_id_list=None, methodology=TestRatingsMethodology(), num_cpus=1, skip_alg_error=True)
Method used to perform an experiment which involves score predictions.
The method will first split the original ratings passed in the constructor in train and test set, then a graph will be built depending on them:
- All nodes of the original ratings will be present, but the links (interactions) that are present in the test set will be missing, so to make the training phase fair
Info
BE CAREFUL: not all algorithms are able to perform score prediction. In case a pure ranking algorithm
is asked to perform score prediction, the NotPredictionAlg
will be raised. if the skip_alg_error
is set
to True
, then said exception will be caught, a warning will be printed, and the experiment will go on with
the next algorithm
Via the methodology
parameter you can perform different candidate item selection. By default, the
TestRatingsMethodology()
is used: so, for each user, items in its test set only will be considered for score
prediction
PARAMETER | DESCRIPTION |
---|---|
user_id_list |
List of users for which you want to compute the ranking. If None, the ranking will be computed
for all users of the |
methodology |
TYPE:
|
num_cpus |
number of processors that must be reserved for the method. Default is 0, meaning that the number of cpus will be automatically detected.
TYPE:
|
skip_alg_error |
If set to
TYPE:
|
RAISES | DESCRIPTION |
---|---|
NotPredictionAlg
|
When a pure ranking algorithm is asked to perform score prediction
and |
Source code in clayrs/recsys/experiment.py
557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 |
|
rank(n_recs=10, user_id_list=None, methodology=TestRatingsMethodology(), num_cpus=1)
Method used to perform an experiment which involves rankings.
The method will first split the original ratings passed in the constructor in train and test set, then a graph will be built depending on them:
- All nodes of the original ratings will be present, but the links (interactions) that are present in the test set will be missing, so to make the training phase fair
Via the methodology
parameter you can perform different candidate item selection. By default, the
TestRatingsMethodology()
is used, so for each user, items in its test set only will be eligible for ranking
PARAMETER | DESCRIPTION |
---|---|
n_recs |
Number of the top items that will be present in the ranking of each user.
If |
user_id_list |
List of users for which you want to compute the ranking. If None, the ranking will be computed
for all users of the |
methodology |
TYPE:
|
num_cpus |
number of processors that must be reserved for the method. Default is 0, meaning that the number of cpus will be automatically detected.
TYPE:
|
Source code in clayrs/recsys/experiment.py
624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 |
|