Page Rank
NXPageRank(alpha=0.85, personalized=False, max_iter=100, tol=1e-06, nstart=None, weight=True, relevance_threshold=None, rel_items_weight=0.8, rel_items_prop_weight=None, default_nodes_weight=0.2)
Bases: PageRank
Page Rank algorithm based on the networkx implementation. Please note that it can only be used for instantiated NXGraphs
The PageRank can be personalized, in this case the PageRank will be calculated with a personalization vector which depends by the specific user.
Personalized PageRank
In the PageRank, the random surfer has probability \(\alpha\) of following one of the out links of the node it's in, and probability \(1-\alpha\) of visiting another node which is not necessarily linked with an out link to the node it's in.
- In the classical Page Rank, all nodes have uniform probability of being picked by the random surfer when not following the out links
- In the personalized Page Rank, we assign a different probability to certain nodes depending on heuristics when not following the out links
In the recommendation task, the idea is to assign a higher probability to item nodes which are relevant to the user, so that the Page Rank algorithm assigns higher score to item nodes close to relevant items for the user
Several weighting schemas can be applied in order to customize the personalization vector of a user:
- 80/20: 80% prob. that the random surfer ends up in a relevant item node, 20% that it will end up in any other node (default)
- 60/20/20: 60% prob. that the random surfer ends up in a relevant item node, 20% that it will end up in a property node linked to a relevant item, 20% that it will end up in any other node
- 40/40/20: 40% prob. that the random surfer ends up in a relevant item node, 40% that it will end up in a property node linked to a relevant item, 20% that it will end up in any other node
- ...
It's important to note that the weight assigned are then normalized across each node category: this means that if 80% prob. is assigned to relevant items, then this probability is shared among all relevant items (i.e. 80% divided by the total number of relevant items for the user)
PARAMETER | DESCRIPTION |
---|---|
alpha |
Damping parameter for PageRank, default=0.85.
TYPE:
|
personalized |
Boolean value that specifies if the page rank must be calculated considering the user profile as personalization vector. Default is False
TYPE:
|
max_iter |
Maximum number of iterations in power method eigenvalue solver.
TYPE:
|
tol |
Error tolerance used to check convergence in power method solver.
TYPE:
|
nstart |
Starting value of PageRank iteration for each node.
TYPE:
|
weight |
Boolean value which tells the algorithm if weight of the edges must be considered or not. Default is True
TYPE:
|
relevance_threshold |
Threshold which separates relevant and non-relevant items for a user. Can be set globally, but if None the relevance threshold for each user is computed considering the mean rating given by the user in the train set
TYPE:
|
rel_items_weight |
Probability that the random surfer will end up in a relevant item node when not following the
out links of a node. This probability will be normalized and divided by the total number of relevant items
for the user. If None, the |
rel_items_prop_weight |
Probability that the random surfer will end up in a property node linked to a relevant
item when not following the out links of a node. This probability will be normalized and divided by the
total number of property nodes linked to relevant items. If None, the |
default_nodes_weight |
Probability that the random surfer will end up in a node which is not a relevant item
or a property linked to a relevant item when not following the out links of a node.
If |
Source code in clayrs/recsys/graph_based_algorithm/page_rank/nx_page_rank.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
rank(graph, train_set, test_set, user_id_list, recs_number, methodology, num_cpus)
Rank the top-n recommended items for the user. If the recs_number
parameter is set to None,
All unrated items for the user will be ranked among all those selected by the methodology
parameter.
The train set contains basically the interactions modelled in the graph, and it is needed by the methodology object
PARAMETER | DESCRIPTION |
---|---|
graph |
A graph which models interactions of users and items
TYPE:
|
train_set |
a Ratings object containing interactions between users and items
TYPE:
|
recs_number |
number of the top ranked items to return, if None all ranked items will be returned |
test_set |
Ratings object which represents the ground truth of the split considered
TYPE:
|
user_id_list |
List of users for which you want to compute ranking. The list should contain user id as strings and NOT user ids mapped to their integers |
recs_number |
number of the top ranked items to return, if None all ranked items will be returned |
methodology |
TYPE:
|
num_cpus |
number of processors that must be reserved for the method. If set to
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[np.ndarray]
|
List of uir matrices for each user, where each uir contains predicted interactions between users and unseen items sorted in a descending way w.r.t. the third dimension which is the ranked score |
Source code in clayrs/recsys/graph_based_algorithm/page_rank/nx_page_rank.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|