Graph Based RecSys
GraphBasedRS(algorithm, graph)
Bases: RecSys
Class for recommender systems which use a graph in order to make predictions
Every GBRS differ from each other based the algorithm used.
Examples:
In case you perform a splitting of the dataset which returns a single train and test set (e.g. HoldOut technique):
from clayrs import recsys as rs
from clayrs import content_analyzer as ca
original_rat = ca.Ratings(ca.CSVFile(ratings_path))
[train], [test] = rs.HoldOutPartitioning().split_all(original_rat)
alg = rs.NXPageRank() # any gb algorithm
graph = rs.NXBipartiteGraph(original_rat)
# remove from the graph interaction of the test set
for user, item in zip(test.user_id_column, test.item_id_column):
user_node = rs.UserNode(user)
item_node = rs.ItemNode(item)
graph.remove_link(user_node, item_node)
gbrs = rs.GraphBasedRS(alg, graph)
rank = gbrs.rank(test, n_recs=10)
In case you perform a splitting of the dataset which returns a multiple train and test sets (KFold technique):
from clayrs import recsys as rs
from clayrs import content_analyzer as ca
original_rat = ca.Ratings(ca.CSVFile(ratings_path))
train_list, test_list = rs.KFoldPartitioning(n_splits=5).split_all(original_rat)
alg = rs.NXPageRank() # any gb algorithm
for train_set, test_set in zip(train_list, test_list):
graph = rs.NXBipartiteGraph(original_rat)
# remove from the graph interaction of the test set
for user, item in zip(test_set.user_id_column, test_set.item_id_column):
user_node = rs.UserNode(user)
item_node = rs.ItemNode(item)
graph.remove_link(user_node, item_node)
gbrs = rs.GraphBasedRS(alg, graph)
rank_to_append = gbrs.rank(test_set)
result_list.append(rank_to_append)
result_list
will contain recommendation lists for each split
PARAMETER | DESCRIPTION |
---|---|
algorithm |
the graph based algorithm that will be used in order to rank or make score prediction
TYPE:
|
graph |
A graph which models interactions of users and items
TYPE:
|
Source code in clayrs/recsys/recsys.py
556 557 558 559 560 561 |
|
algorithm: GraphBasedAlgorithm
property
The graph based algorithm chosen
graph: FullDiGraph
property
The graph containing interactions
users: Set[UserNode]
property
Set of UserNode objects for each user of the graph
predict(test_set, user_list=None, methodology=TestRatingsMethodology(), num_cpus=1)
Method used to calculate score predictions for all users in test set or all users in user_list
parameter.
The user_list
parameter could contain users with their string id or with their mapped integer
BE CAREFUL: not all algorithms are able to perform score prediction
Via the methodology
parameter you can perform different candidate item selection. By default, the
TestRatingsMethodology()
is used: so for each user items in its test set only will be considered for score
prediction
If the algorithm couldn't perform score prediction for some users, they will be skipped and a warning message is printed showing the number of users for which the alg couldn't produce a score prediction
PARAMETER | DESCRIPTION |
---|---|
test_set |
Ratings object which represents the ground truth of the split considered
TYPE:
|
user_list |
List of users for which you want to compute score prediction. If None, the ranking
will be computed for all users of the |
methodology |
TYPE:
|
num_cpus |
number of processors that must be reserved for the method. If set to
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Prediction
|
Prediction object containing score prediction lists for all users of the test set or for all users in
|
Source code in clayrs/recsys/recsys.py
673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 |
|
rank(test_set, n_recs=10, user_list=None, methodology=TestRatingsMethodology(), num_cpus=1)
Method used to calculate ranking for all users in test set or all users in user_list
parameter.
The user_list
parameter could contain users with their string id or with their mapped integer
If the n_recs
is specified, then the rank will contain the top-n items for the users.
Otherwise, the rank will contain all unrated items of the particular users.
By default the top-10 ranking is computed for each user
Via the methodology
parameter you can perform different candidate item selection. By default, the
TestRatingsMethodology()
is used: so, for each user, items in its test set only will be ranked
If the algorithm couldn't produce a ranking for some users, they will be skipped and a warning message is printed showing the number of users for which the alg couldn't produce a ranking
PARAMETER | DESCRIPTION |
---|---|
test_set |
Ratings object which represents the ground truth of the split considered
TYPE:
|
n_recs |
Number of the top items that will be present in the ranking of each user.
If
TYPE:
|
user_list |
List of users for which you want to compute score prediction. If None, the ranking
will be computed for all users of the |
methodology |
TYPE:
|
num_cpus |
number of processors that must be reserved for the method. If set to
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Rank
|
Rank object containing recommendation lists for all users of the test set or for all users in |
Source code in clayrs/recsys/recsys.py
585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 |
|