Feature Selection
Via the feature_selecter
function you are able to perform feature selection on a given graph, by keeping properties
that are the most important according to a given feature selection algorithm. Check the documentation of the
method for more and for a usage example
feature_selector(graph, fs_algorithm_user=None, fs_algorithm_item=None, user_target_nodes=None, item_target_nodes=None, inplace=False)
Given a FullGraph, this method performs feature selection on it and returns the "reduced" graph.
You can choose to reduce only user properties (evaluate the fs_algorithm_user
parameter),
to reduce only item properties (evaluate the fs_algorithm_item
parameter) or both (evaluate
the fs_algorithm_user
parameter and the fs_algorithm_item
parameter).
You can also choose different feature selection algorithms* for users and items.
You can also define a custom list of user and item nodes:
- In this case only properties of those nodes will be considered during the feature selection process (instead of using properties of all users and items)
This function changes a copy of the original graph by default, but you can change this behaviour with the
inplace
parameter.
Examples:
# create a full graph
full_graph = rs.NXFullGraph(ratings,
user_contents_dir='users_codified/', # (1)
item_contents_dir='movies_codified/', # (2)
user_exo_properties={0}, # (3)
item_exo_properties={'dbpedia'}, # (4)
link_label='score')
# perform feature selection by keeping only top 5 property labels
# according to page rank algorithm
fs_graph = rs.feature_selector(full_graph,
fs_algorithm_item=rs.TopKPageRank(k=5))
PARAMETER | DESCRIPTION |
---|---|
graph |
Original graph on which feature selection will be performed
TYPE:
|
fs_algorithm_user |
FeatureSelectionAlgorithm that will be performed on user properties. Can be different from
TYPE:
|
fs_algorithm_item |
FeatureSelectionAlgorithm that will be performed on item properties. Can be different from
TYPE:
|
user_target_nodes |
List of user nodes to consider in the feature selection process: only properties of user nodes in this list will be "reduced"
TYPE:
|
item_target_nodes |
List of item nodes to consider in the feature selection process: only properties of item nodes in this list will be "reduced"
TYPE:
|
inplace |
Boolean parameter that let you choose if changes must be performed on the original graph
(
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
FullDiGraph
|
Copy of the original graph from which the less important Property nodes (the ones having edges with less important property labels) will be removed |
Source code in clayrs/recsys/graphs/feature_selection/feature_selection_fn.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
Feature Selection algorithms
The following are the feature selection algorithms you can use in the fs_algorithms_user
and/or in the fs_algorithm_item
TopKPageRank(k=10, alpha=0.85, personalization=None, max_iter=100, tol=1e-06, nstart=None, weight=True, dangling=None)
Bases: TopKFeatureSelection
Computes the PageRank as FeatureSelection algorithm. Property labels of the original graph will be scored with their page rank score and only the top-k labels will be kept in the feature selected graph, while discarding the others
PARAMETER | DESCRIPTION |
---|---|
k |
Top-k property labels to keep in the feature selected graph
TYPE:
|
alpha |
Damping parameter for PageRank, default=0.85.
TYPE:
|
personalization |
The "personalization vector" consisting of a dictionary with a key some subset of graph nodes and personalization value each of those. At least one personalization value must be non-zero. If not specfiied, a nodes personalization value will be zero. By default, a uniform distribution is used.
TYPE:
|
max_iter |
Maximum number of iterations in power method eigenvalue solver.
TYPE:
|
tol |
Error tolerance used to check convergence in power method solver.
TYPE:
|
nstart |
Starting value of PageRank iteration for each node.
TYPE:
|
weight |
Edge data key to use as weight. If None weights are set to 1.
TYPE:
|
dangling |
The outedges to be assigned to any "dangling" nodes, i.e., nodes without any outedges. The dict key is the node the outedge points to and the dict value is the weight of that outedge. By default, dangling nodes are given outedges according to the personalization vector (uniform if not specified). This must be selected to result in an irreducible transition matrix (see notes under google_matrix). It may be common to have the dangling dict to be the same as the personalization dict.
TYPE:
|
Source code in clayrs/recsys/graphs/feature_selection/feature_selection_alg.py
204 205 206 207 208 209 210 211 212 213 214 |
|
TopKEigenVectorCentrality(k=10, max_iter=100, tol=1e-06, nstart=None, weight=False)
Bases: TopKFeatureSelection
Computes the Eigen Vector Centrality as FeatureSelection algorithm. Property labels of the original graph will be scored with their eigen vector centrality score and only the top-k labels will be kept in the feature selected graph, while discarding the others
PARAMETER | DESCRIPTION |
---|---|
k |
Top-k property labels to keep in the feature selected graph
TYPE:
|
max_iter |
Maximum number of iterations in power method.
TYPE:
|
tol |
Error tolerance used to check convergence in power method iteration.
TYPE:
|
nstart |
Starting value of eigenvector iteration for each node.
TYPE:
|
weight |
Boolean value which tells the algorithm if weight of the edges must be considered or not. Default is True
TYPE:
|
Source code in clayrs/recsys/graphs/feature_selection/feature_selection_alg.py
237 238 239 240 241 242 243 |
|
TopKDegreeCentrality(k=10)
Bases: TopKFeatureSelection
Computes the Degree Centrality as FeatureSelection algorithm. Property labels of the original graph will be scored with their degree centrality score and only the top-k labels will be kept in the feature selected graph, while discarding the others
Source code in clayrs/recsys/graphs/feature_selection/feature_selection_alg.py
257 258 |
|