Skip to content

Ratings class

The Ratings class is the main responsible for importing a dataset containing interactions between users and items

Ratings(source, user_id_column=0, item_id_column=1, score_column=2, timestamp_column=None, score_processor=None, item_map=None, user_map=None)

Class responsible for importing an interaction frame into the framework

If the source file contains users, items and ratings in this order, no additional parameters are needed, otherwise the mapping must be explicitly specified using:

  • 'user_id' column,
  • 'item_id' column,
  • 'score' column

The score column can also be processed: in case you would like to consider as score the sentiment of a textual review, or maybe normalizing all scores in \([0, 1]\) range. Check the example below for more

Note that, during the import phase, the user and item ids will be converted to integers and a mapping between the newly created ids and the original string ids will be created. For replicability purposes, it is possible to pass your custom item and user map instead of leaving this task to the framework. Check the example below to see how

Examples:

CSV raw source
user_id,item_id,rating,timestamp,review
u1,i1,4,00112,good movie
u2,i1,3,00113,an average movie
u2,i32,2,00114,a bad movie

As you can see the user id column, item id column and score column are the first three column and are already in sequential order, so no additional parameter is required to the Ratings class:

>>> import clayrs.content_analyzer as ca
>>> ratings_raw_source = ca.CSVFile('ratings.csv')
>>> # add timestamp='timestamp' to the following if
>>> # you want to load also the timestamp
>>> ratings = ca.Ratings(ratings_raw_source)

In case columns in the raw source are not in the above order you must specify an appropriate mapping via positional index (useful in case your raw source doesn't have a header) or via column ids:

>>> # (mapping by index) EQUIVALENT:
>>> ratings = ca.Ratings(
>>> ca.CSVFile('ratings.csv'),
>>> user_id_column=0,  # (1)
>>> item_id_column=1,  # (2)
>>> score_column=2  # (3)
>>> )
  1. First column of raw source is the column containing all user ids
  2. Second column of raw source is the column containing all item ids
  3. Third column of raw source is the column containing all the scores
>>> # (mapping by column name) EQUIVALENT:
>>> ratings = ca.Ratings(
>>> ca.CSVFile('ratings.csv'),
>>> user_id_column='user_id',  # (1)
>>> item_id_column='item_id',  # (2)
>>> score_column='rating'  # (3)
>>> )
  1. The column with id 'user_id' of raw source is the column containing all user ids
  2. The column with id 'item_id' of raw source is the column containing all item ids
  3. The column with id 'rating' of raw source is the column containing all the scores

In case you would like to use the sentiment of the review column of the above raw source as score column, simply specify the appropriate ScoreProcessor object

>>> ratings_raw_source = ca.CSVFile('ratings.csv')
>>> ratings = ca.Ratings(ratings_raw_source,
>>>                      score_column='review',
>>>                      score_processor=ca.TextBlobSentimentAnalysis())

In case you would like to specify the mappings for items or users, simply specify them in the corresponding parameters

>>> ratings_raw_source = ca.CSVFile('ratings.csv')
>>> custom_item_map = {'i1': 0, 'i2': 2, 'i3': 1}
>>> custom_user_map = {'u1': 0, 'u2': 2, 'u3': 1}
>>> ratings = ca.Ratings(ratings_raw_source,
>>>                      item_map=custom_item_map,
>>>                      user_map=custom_user_map)
PARAMETER DESCRIPTION
source

Source containing the raw interaction frame

TYPE: RawInformationSource

user_id_column

Name or positional index of the field of the raw source representing users column

TYPE: Union[str, int] DEFAULT: 0

item_id_column

Name or positional index of the field of the raw source representing items column

TYPE: Union[str, int] DEFAULT: 1

score_column

Name or positional index of the field of the raw source representing score column

TYPE: Union[str, int] DEFAULT: 2

timestamp_column

Name or positional index of the field of the raw source representing timesamp column

TYPE: Union[str, int] DEFAULT: None

score_processor

ScoreProcessor object which will process the score_column accordingly. Useful if you want to perform sentiment analysis on a textual column or you want to normalize all scores in \([0, 1]\) range

TYPE: ScoreProcessor DEFAULT: None

item_map

dictionary with string keys (the item ids) and integer values (the corresponding unique integer ids) used to create the item mapping. If not specified, it will be automatically created internally

TYPE: Dict[str, int] DEFAULT: None

user_map

dictionary with string keys (the user ids) and integer values (the corresponding unique integer ids) used to create the user mapping. If not specified, it will be automatically created internally

TYPE: Dict[str, int] DEFAULT: None

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
def __init__(self, source: RawInformationSource,
             user_id_column: Union[str, int] = 0,
             item_id_column: Union[str, int] = 1,
             score_column: Union[str, int] = 2,
             timestamp_column: Union[str, int] = None,
             score_processor: ScoreProcessor = None,
             item_map: Dict[str, int] = None,
             user_map: Dict[str, int] = None):

    # utility dictionary that will contain each user index as key and a numpy array, containing the indexes of the
    # rows in the uir matrix which refer to an interaction for that user, as value. This is done to optimize
    # performance when requesting all interactions of a certain user
    self._user2rows: Dict

    self._uir: np.ndarray
    self.item_map: StrIntMap
    self.user_map: StrIntMap

    self._import_ratings(source, user_id_column, item_id_column,
                         score_column, timestamp_column, score_processor, item_map, user_map)

item_id_column: np.ndarray property cached

Getter for the 'item_id' column of the interaction frame. This will return the item column "as is", so it will contain duplicate items. Use the 'unique_item_id_column' method to get unique items.

RETURNS DESCRIPTION
np.ndarray

Items column with duplicates (string ids)

item_idx_column: np.ndarray property cached

Getter for the 'item_idx' column of the uir matrix. This will return the item column "as is", so it will contain duplicate items. Use the 'unique_item_idx_column' method to get unique items.

RETURNS DESCRIPTION
np.ndarray

Items column with duplicates (integer ids)

score_column: np.ndarray property cached

Getter for the score column. This will return the score column "as is".

RETURNS DESCRIPTION
np.ndarray

Score column

timestamp_column: np.ndarray property cached

Getter for the timestamp column. This will return the score column "as is". If no timestamp is present then an empty list is returned

RETURNS DESCRIPTION
np.ndarray

Timestamp column or empty list if no timestamp is present

uir: np.ndarray property

Getter for the uir matrix created from the interaction frame. The imported ratings are converted in the form of a numpy ndarray where each row will represent an interaction. This uir matrix can be seen in a tabular representation as follows:

UIR matrix visualized: tabular format
+----------+----------+--------+-----------+
| user_idx | item_idx | score  | timestamp |
+----------+----------+--------+-----------+
| 0.       | 0.       | 4      | np.nan    |
| 0.       | 1.       | 3      | np.nan    |
| 1.       | 4.       | 1      | np.nan    |
+----------+----------+--------+-----------+

Where the 'user_idx' and 'item_idx' columns contain the integer ids from the mapping of the Ratings object itself (these integer ids match the string ids that are in the original interaction frame)

unique_item_id_column: np.ndarray property cached

Getter for the 'item_id' column of the interaction frame. This will return the item column without duplicates.

RETURNS DESCRIPTION
np.ndarray

Items column without duplicates (string ids)

unique_item_idx_column: np.ndarray property cached

Getter for the 'item_idx' column of the uir matrix. This will return the item column without duplicates.

RETURNS DESCRIPTION
np.ndarray

Items column without duplicates (integer ids)

unique_user_id_column: np.ndarray property cached

Getter for the 'user_id' column of the interaction frame. This will return the user column without duplicates.

RETURNS DESCRIPTION
np.ndarray

Users column without duplicates (string ids)

unique_user_idx_column: np.ndarray property cached

Getter for the 'user_idx' column of the uir matrix. This will return the user column without duplicates.

RETURNS DESCRIPTION
np.ndarray

Users column without duplicates (integer ids)

user_id_column: np.ndarray property cached

Getter for the 'user_id' column of the interaction frame. This will return the user column "as is", so it will contain duplicate users. Use the 'unique_user_id_column' method to get unique users.

RETURNS DESCRIPTION
np.ndarray

Users column with duplicates (string ids)

user_idx_column: np.ndarray property cached

Getter for the 'user_idx' column of the uir matrix. This will return the user column "as is", so it will contain duplicate users. Use the 'unique_user_idx_column' method to get unique users.

RETURNS DESCRIPTION
np.ndarray

Users column with duplicates (integer ids)

__iter__()

Note: iteration is done on integer ids, if you want to iterate over string ids you need to iterate over the 'user_id_column' or 'item_id_column'

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
1179
1180
1181
1182
1183
1184
def __iter__(self):
    """
    Note: iteration is done on integer ids, if you want to iterate over string ids you need to iterate over the
    'user_id_column' or 'item_id_column'
    """
    yield from iter(self._uir)

filter_ratings(user_list)

Method which will filter the rating frame by keeping only interactions of users appearing in the user_list. This method will return a new Ratings object without changing the original

Examples:

Starting Rating object
+---------+---------+-------+
| user_id | item_id | score |
+---------+---------+-------+
| u1      | i1      |     4 |
| u1      | i2      |     3 |
| u2      | i5      |     1 |
+---------+---------+-------+
Starting Rating object: corresponding uir matrix
+----------+----------+--------+-----------+
| user_idx | item_idx | score  | timestamp |
+----------+----------+--------+-----------+
| 0.       | 0.       | 4      | np.nan    |
| 0.       | 1.       | 3      | np.nan    |
| 1.       | 4.       | 1      | np.nan    |
+----------+----------+--------+-----------+
>>> rating_frame.filter_ratings([0])
Returned Rating object
+---------+---------+-------+
| user_id | item_id | score |
+---------+---------+-------+
| u1      | i1      |     4 |
| u1      | i2      |     3 |
+---------+---------+-------+
Returned Rating object: corresponding uir matrix
+----------+----------+--------+-----------+
| user_idx | item_idx | score  | timestamp |
+----------+----------+--------+-----------+
| 0.       | 0.       | 4      | np.nan    |
| 0.       | 1.       | 3      | np.nan    |
+----------+----------+--------+-----------+

If you don't know the integer ids for the users, you can obtain them using the user map as follows:

>>> user_idxs = rating_frame.user_map[['u1']]
>>> rating_frame.filter_ratings(user_list=user_idxs)
PARAMETER DESCRIPTION
user_list

List of user integer ids that will be present in the filtered Ratings object

TYPE: Sequence[int]

Returns The filtered Ratings object which contains only interactions of selected users

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
def filter_ratings(self, user_list: Sequence[int]) -> Ratings:
    """
    Method which will filter the rating frame by keeping only interactions of users appearing in the `user_list`.
    This method will return a new `Ratings` object without changing the original

    Examples:

        ```title="Starting Rating object"
        +---------+---------+-------+
        | user_id | item_id | score |
        +---------+---------+-------+
        | u1      | i1      |     4 |
        | u1      | i2      |     3 |
        | u2      | i5      |     1 |
        +---------+---------+-------+
        ```

        ```title="Starting Rating object: corresponding uir matrix"
        +----------+----------+--------+-----------+
        | user_idx | item_idx | score  | timestamp |
        +----------+----------+--------+-----------+
        | 0.       | 0.       | 4      | np.nan    |
        | 0.       | 1.       | 3      | np.nan    |
        | 1.       | 4.       | 1      | np.nan    |
        +----------+----------+--------+-----------+
        ```

        >>> rating_frame.filter_ratings([0])

        ```title="Returned Rating object"
        +---------+---------+-------+
        | user_id | item_id | score |
        +---------+---------+-------+
        | u1      | i1      |     4 |
        | u1      | i2      |     3 |
        +---------+---------+-------+
        ```

        ```title="Returned Rating object: corresponding uir matrix"
        +----------+----------+--------+-----------+
        | user_idx | item_idx | score  | timestamp |
        +----------+----------+--------+-----------+
        | 0.       | 0.       | 4      | np.nan    |
        | 0.       | 1.       | 3      | np.nan    |
        +----------+----------+--------+-----------+
        ```

        If you don't know the integer ids for the users, you can obtain them using the user map as follows:

        >>> user_idxs = rating_frame.user_map[['u1']]
        >>> rating_frame.filter_ratings(user_list=user_idxs)

    Args:
        user_list: List of user integer ids that will be present in the filtered `Ratings` object

    Returns
        The filtered Ratings object which contains only interactions of selected users
    """
    valid_indexes = np.where(np.isin(self.user_idx_column, user_list))
    new_uir = self._uir[valid_indexes]

    return Ratings.from_uir(new_uir, self.user_map.map, self.item_map.map)

from_dataframe(interaction_frame, user_column=0, item_column=1, score_column=2, timestamp_column=None, user_map=None, item_map=None) classmethod

Class method which allows to instantiate a Ratings object by using an existing pandas DataFrame

If the pandas DataFrame contains users, items and ratings in this order, no additional parameters are needed, otherwise the mapping must be explicitly specified using:

  • 'user_id' column,
  • 'item_id' column,
  • 'score' column

Check documentation of the Ratings class for examples on mapping columns explicitly, the functioning is the same

Furthermore, it is also possible to specify the user and item mapping between original string ids and integer ones. However, differently from the Ratings class documentation, it is possible not only to specify them as dictionaries but also as numpy arrays or StrIntMap objects directly. The end result will be the same independently of the type, but it is suggested to check the StrIntMap class documentation to understand the differences between the three possible types

Examples:

>>> ratings_df = pd.DataFrame({'user_id': ['u1', 'u1', 'u1'],
>>>                            'item_id': ['i1', 'i2', 'i3'],
>>>                            'score': [4, 3, 3])
>>> Ratings.from_dataframe(ratings_df)

or

>>> user_map = {'u1': 0}
>>> item_map = {'i1': 0, 'i2': 2, 'i3': 1}
>>> ratings_df = pd.DataFrame({'user_id': ['u1', 'u1', 'u1'],
>>>                            'item_id': ['i1', 'i2', 'i3'],
>>>                            'score': [4, 3, 3])
>>> Ratings.from_dataframe(ratings_df, user_map=user_map, item_map=item_map)
PARAMETER DESCRIPTION
interaction_frame

pandas DataFrame which represents the original interactions frame

TYPE: pd.DataFrame

user_column

Name or positional index of the field of the DataFrame representing users column

TYPE: Union[str, int] DEFAULT: 0

item_column

Name or positional index of the field of the DataFrame representing items column

TYPE: Union[str, int] DEFAULT: 1

score_column

Name or positional index of the field of the DataFrame representing score column

TYPE: Union[str, int] DEFAULT: 2

timestamp_column

Name or positional index of the field of the raw source representing timesamp column

TYPE: Union[str, int] DEFAULT: None

item_map

dictionary with string keys (the item ids) and integer values (the corresponding unique integer ids) used to create the item mapping. If not specified, it will be automatically created internally

TYPE: Union[Dict[str, int], np.ndarray, StrIntMap] DEFAULT: None

user_map

dictionary with string keys (the user ids) and integer values (the corresponding unique integer ids) used to create the user mapping. If not specified, it will be automatically created internally

TYPE: Union[Dict[str, int], np.ndarray, StrIntMap] DEFAULT: None

RETURNS DESCRIPTION
Ratings

Ratings object instantiated thanks to an existing Pandas DataFrame

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
@classmethod
@handler_score_not_float
def from_dataframe(cls, interaction_frame: pd.DataFrame,
                   user_column: Union[str, int] = 0,
                   item_column: Union[str, int] = 1,
                   score_column: Union[str, int] = 2,
                   timestamp_column: Union[str, int] = None,
                   user_map: Union[Dict[str, int], np.ndarray, StrIntMap] = None,
                   item_map: Union[Dict[str, int], np.ndarray, StrIntMap] = None) -> Ratings:
    """
    Class method which allows to instantiate a `Ratings` object by using an existing pandas DataFrame

    **If** the pandas DataFrame contains users, items and ratings in this order,
    no additional parameters are needed, **otherwise** the mapping must be explicitly specified using:

    * **'user_id'** column,
    * **'item_id'** column,
    * **'score'** column

    Check documentation of the `Ratings` class for examples on mapping columns explicitly, the functioning is the
    same

    Furthermore, it is also possible to specify the user and item mapping between original string ids and
    integer ones. However, differently from the `Ratings` class documentation, it is possible not only to specify
    them as dictionaries but also as numpy arrays or `StrIntMap` objects directly. The end result will be the same
    independently of the type, but it is suggested to check the `StrIntMap` class documentation to understand
    the differences between the three possible types

    Examples:

        >>> ratings_df = pd.DataFrame({'user_id': ['u1', 'u1', 'u1'],
        >>>                            'item_id': ['i1', 'i2', 'i3'],
        >>>                            'score': [4, 3, 3])
        >>> Ratings.from_dataframe(ratings_df)

        or

        >>> user_map = {'u1': 0}
        >>> item_map = {'i1': 0, 'i2': 2, 'i3': 1}
        >>> ratings_df = pd.DataFrame({'user_id': ['u1', 'u1', 'u1'],
        >>>                            'item_id': ['i1', 'i2', 'i3'],
        >>>                            'score': [4, 3, 3])
        >>> Ratings.from_dataframe(ratings_df, user_map=user_map, item_map=item_map)

    Args:
        interaction_frame: pandas DataFrame which represents the original interactions frame
        user_column: Name or positional index of the field of the DataFrame representing *users* column
        item_column: Name or positional index of the field of the DataFrame representing *items* column
        score_column: Name or positional index of the field of the DataFrame representing *score* column
        timestamp_column: Name or positional index of the field of the raw source representing *timesamp* column
        item_map: dictionary with string keys (the item ids) and integer values (the corresponding unique integer
            ids) used to create the item mapping. If not specified, it will be automatically created internally
        user_map: dictionary with string keys (the user ids) and integer values (the corresponding unique integer
            ids) used to create the user mapping. If not specified, it will be automatically created internally

    Returns:
        `Ratings` object instantiated thanks to an existing Pandas DataFrame
    """

    def get_value_row_df(row, column, dtype):
        try:
            if isinstance(column, str):
                value = row[column]
            else:
                # it's an int, so we get the column id and then we get the corresponding value in the row
                key_dict = interaction_frame.columns[column]
                value = row[key_dict]
        except (KeyError, IndexError) as e:
            if isinstance(e, KeyError):
                raise KeyError(f"Column {column} not found in interaction frame!")
            else:
                raise IndexError(f"Column {column} not found in interaction frame!")

        return dtype(value) if value is not None else None

    obj = cls.__new__(cls)  # Does not call __init__
    super(Ratings, obj).__init__()  # Don't forget to call any polymorphic base class initializers

    # lists that will contain the original data temporarily
    # this is so that the conversion from string ids to integers will be called only once
    # said lists will also be used to create the mappings if not specified in the parameters
    tmp_user_id_column = []
    tmp_item_id_column = []
    tmp_score_column = []
    tmp_timestamp_column = []

    for i, row in enumerate(interaction_frame.to_dict(orient='records')):
        user_id = get_value_row_df(row, user_column, str)
        item_id = get_value_row_df(row, item_column, str)
        score = get_value_row_df(row, score_column, float)
        timestamp = get_value_row_df(row, timestamp_column, int) if timestamp_column is not None else np.nan

        tmp_user_id_column.append(user_id)
        tmp_item_id_column.append(item_id)
        tmp_score_column.append(score)
        tmp_timestamp_column.append(timestamp)

    # create the item_map from the item_id column if not specified
    if item_map is None:
        obj.item_map = StrIntMap(np.array(list(dict.fromkeys(tmp_item_id_column))))
    else:
        obj.item_map = StrIntMap(item_map)

    # create the user_map from the user_id column if not specified
    if user_map is None:
        obj.user_map = StrIntMap(np.array(list(dict.fromkeys(tmp_user_id_column))))
    else:
        obj.user_map = StrIntMap(user_map)

    tmp_user_id_column = np.array(tmp_user_id_column)

    if np.any(tmp_user_id_column == None):
        raise UserNone('User column cannot contain None values') from None

    tmp_item_id_column = np.array(tmp_item_id_column)

    if np.any(tmp_item_id_column == None):
        raise ItemNone('Item column cannot contain None values') from None

    # convert user and item ids and create the uir matrix
    obj._uir = np.array((
        obj.user_map.convert_seq_str2int(tmp_user_id_column),
        obj.item_map.convert_seq_str2int(tmp_item_id_column),
        tmp_score_column, tmp_timestamp_column
    )).T

    obj._uir[:, 2] = obj._uir[:, 2].astype(float)
    obj._uir[:, 3] = obj._uir[:, 3].astype(float)

    # create the utility dictionary user2rows
    obj._user2rows = {
        user_idx: np.where(obj._uir[:, 0] == user_idx)[0]
        for user_idx in obj.unique_user_idx_column
    }

    return obj

from_list(interaction_list, user_map=None, item_map=None) classmethod

Class method which allows to instantiate a Ratings object by using an existing list of tuples or its generator

Furthermore, it is also possible to specify the user and item mapping between original string ids and integer ones. However, differently from the Ratings class documentation, it is possible not only to specify them as dictionaries but also as numpy arrays or StrIntMap objects directly. The end result will be the same independently of the type, but it is suggested to check the StrIntMap class documentation to understand the differences between the three possible types

Examples:

>>> interactions_list = [('u1', 'i1', 5), ('u2', 'i1', 4)]
>>> Ratings.from_list(interactions_list)

or

>>> user_map = {'u1': 0, 'u2': 1}
>>> item_map = {'i1': 0}
>>> interactions_list = [('u1', 'i1', 5), ('u2', 'i1', 4)]
>>> Ratings.from_list(interactions_list, user_map=user_map, item_map=item_map)
PARAMETER DESCRIPTION
interaction_list

List containing tuples or its generator

TYPE: Union[List[Tuple], Iterator]

item_map

dictionary with string keys (the item ids) and integer values (the corresponding unique integer ids) used to create the item mapping. If not specified, it will be automatically created internally

TYPE: Union[Dict[str, int], np.ndarray, StrIntMap] DEFAULT: None

user_map

dictionary with string keys (the user ids) and integer values (the corresponding unique integer ids) used to create the user mapping. If not specified, it will be automatically created internally

TYPE: Union[Dict[str, int], np.ndarray, StrIntMap] DEFAULT: None

RETURNS DESCRIPTION
Ratings

Ratings object instantiated thanks to an existing interaction list

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
@classmethod
@handler_score_not_float
def from_list(cls, interaction_list: Union[List[Tuple], Iterator],
              user_map: Union[Dict[str, int], np.ndarray, StrIntMap] = None,
              item_map: Union[Dict[str, int], np.ndarray, StrIntMap] = None) -> Ratings:
    """
    Class method which allows to instantiate a `Ratings` object by using an existing list of tuples or its generator

    Furthermore, it is also possible to specify the user and item mapping between original string ids and
    integer ones. However, differently from the `Ratings` class documentation, it is possible not only to specify
    them as dictionaries but also as numpy arrays or `StrIntMap` objects directly. The end result will be the same
    independently of the type, but it is suggested to check the `StrIntMap` class documentation to understand
    the differences between the three possible types

    Examples:

        >>> interactions_list = [('u1', 'i1', 5), ('u2', 'i1', 4)]
        >>> Ratings.from_list(interactions_list)

        or

        >>> user_map = {'u1': 0, 'u2': 1}
        >>> item_map = {'i1': 0}
        >>> interactions_list = [('u1', 'i1', 5), ('u2', 'i1', 4)]
        >>> Ratings.from_list(interactions_list, user_map=user_map, item_map=item_map)

    Args:
        interaction_list: List containing tuples or its generator
        item_map: dictionary with string keys (the item ids) and integer values (the corresponding unique integer
            ids) used to create the item mapping. If not specified, it will be automatically created internally
        user_map: dictionary with string keys (the user ids) and integer values (the corresponding unique integer
            ids) used to create the user mapping. If not specified, it will be automatically created internally

    Returns:
        `Ratings` object instantiated thanks to an existing interaction list
    """
    obj = cls.__new__(cls)  # Does not call __init__
    super(Ratings, obj).__init__()  # Don't forget to call any polymorphic base class initializers

    # lists that will contain the original data temporarily
    # this is so that the conversion from string ids to integers will be called only once
    # said lists will also be used to create the mappings if not specified in the parameters
    tmp_user_id_column = []
    tmp_item_id_column = []
    tmp_score_column = []
    tmp_timestamp_column = []

    for i, interaction in enumerate(interaction_list):

        tmp_user_id_column.append(interaction[0])
        tmp_item_id_column.append(interaction[1])
        tmp_score_column.append(interaction[2])

        if len(interaction) == 4:
            tmp_timestamp_column.append(interaction[3])
        else:
            tmp_timestamp_column.append(np.nan)

    # create the item_map from the item_id column if not specified
    if item_map is None:
        obj.item_map = StrIntMap(np.array(list(dict.fromkeys(tmp_item_id_column))))
    else:
        obj.item_map = StrIntMap(item_map)

    # create the user_map from the user_id column if not specified
    if user_map is None:
        obj.user_map = StrIntMap(np.array(list(dict.fromkeys(tmp_user_id_column))))
    else:
        obj.user_map = StrIntMap(user_map)

    tmp_user_id_column = np.array(tmp_user_id_column)

    if np.any(tmp_user_id_column == None):
        raise UserNone('User column cannot contain None values')

    tmp_item_id_column = np.array(tmp_item_id_column)

    if np.any(tmp_item_id_column == None):
        raise ItemNone('Item column cannot contain None values')

    # convert user and item ids and create the uir matrix
    obj._uir = np.array((
        obj.user_map.convert_seq_str2int(tmp_user_id_column),
        obj.item_map.convert_seq_str2int(tmp_item_id_column),
        tmp_score_column, tmp_timestamp_column
    )).T

    obj._uir[:, 2] = obj._uir[:, 2].astype(float)
    obj._uir[:, 3] = obj._uir[:, 3].astype(float)

    # create the utility dictionary user2rows
    obj._user2rows = {
        user_idx: np.where(obj._uir[:, 0] == user_idx)[0]
        for user_idx in obj.unique_user_idx_column
    }

    return obj

from_uir(uir, user_map, item_map) classmethod

Class method which allows to instantiate a Ratings object by using an existing uir matrix

The uir matrix should be a two-dimensional numpy ndarray where each row represents a user interaction. Each row should be in the following format:

[0. 0. 4] or [0. 0. 4 np.nan] (without or with the timestamp)

In the case of a different format for the rows, a ValueError exception will be raised. Furthermore, if the uir matrix is not of dtype np.float64, a TypeError exception will be raised.

In this case the 'user_map' and 'item_map' parameters MUST be specified, since there is no information regarding the original string ids in the uir matrix

Examples:

>>> uir_matrix = np.array([[0, 0, 4], [1, 0, 3]])
>>> user_map = {'u1': 0, 'u2': 1}
>>> item_map = {'i1': 0}
>>> Ratings.from_uir(uir_matrix, user_map=user_map, item_map=item_map)
PARAMETER DESCRIPTION
uir

uir matrix which will be used to create the new Ratings object

TYPE: np.ndarray

item_map

dictionary with string keys (the item ids) and integer values (the corresponding unique integer ids) used to create the item mapping

TYPE: Union[Dict[str, int], np.ndarray, StrIntMap]

user_map

dictionary with string keys (the user ids) and integer values (the corresponding unique integer ids) used to create the user mapping

TYPE: Union[Dict[str, int], np.ndarray, StrIntMap]

RETURNS DESCRIPTION
Ratings

Ratings object instantiated thanks to an existing uir matrix

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
@classmethod
def from_uir(cls, uir: np.ndarray,
             user_map: Union[Dict[str, int], np.ndarray, StrIntMap],
             item_map: Union[Dict[str, int], np.ndarray, StrIntMap]) -> Ratings:
    """
    Class method which allows to instantiate a `Ratings` object by using an existing uir matrix

    The uir matrix should be a two-dimensional numpy ndarray where each row represents a user interaction.
    Each row should be in the following format:

    ```
    [0. 0. 4] or [0. 0. 4 np.nan] (without or with the timestamp)
    ```

    In the case of a different format for the rows, a ValueError exception will be raised.
    Furthermore, if the uir matrix is not of dtype np.float64, a TypeError exception will be raised.

    In this case the 'user_map' and 'item_map' parameters ***MUST*** be specified, since there is no information
    regarding the original string ids in the uir matrix

    Examples:

        >>> uir_matrix = np.array([[0, 0, 4], [1, 0, 3]])
        >>> user_map = {'u1': 0, 'u2': 1}
        >>> item_map = {'i1': 0}
        >>> Ratings.from_uir(uir_matrix, user_map=user_map, item_map=item_map)

    Args:
        uir: uir matrix which will be used to create the new `Ratings` object
        item_map: dictionary with string keys (the item ids) and integer values (the corresponding unique integer ids)
            used to create the item mapping
        user_map: dictionary with string keys (the user ids) and integer values (the corresponding unique integer
            ids) used to create the user mapping

    Returns:
        `Ratings` object instantiated thanks to an existing uir matrix
    """
    obj = cls.__new__(cls)  # Does not call __init__
    super(Ratings, obj).__init__()  # Don't forget to call any polymorphic base class initializers

    if uir.shape[0] > 0 and uir.shape[1] > 0:
        if uir.shape[1] < 3:
            raise ValueError('User item ratings matrix should have at least 3 rows '
                             '(one for users, one for items and one for ratings scores)')
        elif uir.shape[1] == 3:
            uir = np.append(uir, np.full((uir.shape[0], 1), fill_value=np.nan), axis=1)

        if uir.dtype != np.float64:
            raise TypeError('User id columns and item id columns should be mapped to their respective integer')
    else:
        uir = np.array([])

    obj._uir = uir

    obj.user_map = StrIntMap(user_map)
    obj.item_map = StrIntMap(item_map)

    obj._user2rows = {
        user_idx: np.where(obj._uir[:, 0] == user_idx)[0]
        for user_idx in obj.unique_user_idx_column
    }

    return obj

get_user_interactions(user_idx, head=None, as_indices=False)

Method which returns a two-dimensional numpy array containing all the rows from the uir matrix for a single user, one for each interaction of the user. Then you can easily access the columns of the resulting array to obtain useful information

Examples:

So if the rating frame is the following:

+---------+---------+-------+
| user_id | item_id | score |
+---------+---------+-------+
| u1      | i1      |     4 |
| u1      | i2      |     3 |
| u2      | i5      |     1 |
+---------+---------+-------+

The corresponding uir matrix will be the following:

+----------+----------+--------+-----------+
| user_idx | item_idx | score  | timestamp |
+----------+----------+--------+-----------+
| 0.       | 0.       | 4      | np.nan    |
| 0.       | 1.       | 3      | np.nan    |
| 1.       | 4.       | 1      | np.nan    |
+----------+----------+--------+-----------+
>>> rating_frame.get_user_interactions(0)
np.ndarray([
    [0. 0. 4 np.nan],
    [0. 1. 3 np.nan],
])

So you could easily extract all the ratings that a user has given, for example:

>>> rating_frame.get_user_interactions(0)[:, 2]
np.ndarray([4,
            3])

If you only want the first \(k\) interactions of the user, set head=k. The interactions returned are the first \(k\) according to their order of appearance in the rating frame:

>>> rating_frame.get_user_interactions(0, head=1)
np.ndarray([
    [0. 0. 4 np.nan]
])

If you want to have the indices of the uir matrix corresponding to the user interactions instead of the actual interactions, set as_indices=True. This will return a numpy array containing the indexes of the rows of the uir matrix for the interactions of the specified user

>>> rating_frame.get_user_interactions(0, as_indices=True)
np.ndarray([0, 1])

If you don't know the user_idx for a specific user, you can obtain it using the user map as follows:

>>> user_idx = rating_frame.user_map['u1']
>>> rating_frame.get_user_interactions(user_idx=user_idx)
np.ndarray([
    [0. 0. 4 np.nan],
    [0. 1. 3 np.nan],
])
PARAMETER DESCRIPTION
user_idx

Integer id of the user for which you want to retrieve the interactions

TYPE: int

head

Integer which will cut the list of interactions of the user returned. The interactions returned are the first \(k\) according to their order of appearance

TYPE: int DEFAULT: None

as_indices

Instead of returning the user interactions, the indices of the rows in the uir matrix corresponding to interactions for the specified user will be returned

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
np.ndarray

If as_indices=False, numpy ndarray containing the rows from the uir matrix for the specified user, otherwise numpy array containing the indexes of the rows from the uir matrix for the interactions of the specified user

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
def get_user_interactions(self, user_idx: int, head: int = None, as_indices: bool = False) -> np.ndarray:
    """
    Method which returns a two-dimensional numpy array containing all the rows from the uir matrix for a single
    user, one for each interaction of the user.
    Then you can easily access the columns of the resulting array to obtain useful information

    Examples:

        So if the rating frame is the following:

        ```
        +---------+---------+-------+
        | user_id | item_id | score |
        +---------+---------+-------+
        | u1      | i1      |     4 |
        | u1      | i2      |     3 |
        | u2      | i5      |     1 |
        +---------+---------+-------+
        ```

        The corresponding uir matrix will be the following:

        ```
        +----------+----------+--------+-----------+
        | user_idx | item_idx | score  | timestamp |
        +----------+----------+--------+-----------+
        | 0.       | 0.       | 4      | np.nan    |
        | 0.       | 1.       | 3      | np.nan    |
        | 1.       | 4.       | 1      | np.nan    |
        +----------+----------+--------+-----------+
        ```

        >>> rating_frame.get_user_interactions(0)
        np.ndarray([
            [0. 0. 4 np.nan],
            [0. 1. 3 np.nan],
        ])

        So you could easily extract all the ratings that a user has given, for example:

        >>> rating_frame.get_user_interactions(0)[:, 2]
        np.ndarray([4,
                    3])

        If you only want the first $k$ interactions of the user, set `head=k`. The interactions returned are the
        first $k$ according to their order of appearance in the rating frame:

        >>> rating_frame.get_user_interactions(0, head=1)
        np.ndarray([
            [0. 0. 4 np.nan]
        ])

        If you want to have the indices of the uir matrix corresponding to the user interactions instead of the
        actual interactions, set `as_indices=True`. This will return a numpy array containing the indexes of
        the rows of the uir matrix for the interactions of the specified user

        >>> rating_frame.get_user_interactions(0, as_indices=True)
        np.ndarray([0, 1])

        If you don't know the `user_idx` for a specific user, you can obtain it using the user map as follows:

        >>> user_idx = rating_frame.user_map['u1']
        >>> rating_frame.get_user_interactions(user_idx=user_idx)
        np.ndarray([
            [0. 0. 4 np.nan],
            [0. 1. 3 np.nan],
        ])

    Args:
        user_idx: Integer id of the user for which you want to retrieve the interactions
        head: Integer which will cut the list of interactions of the user returned. The interactions returned are
            the first $k$ according to their order of appearance
        as_indices: Instead of returning the user interactions, the indices of the rows in the uir matrix
            corresponding to interactions for the specified user will be returned

    Returns:
        If `as_indices=False`, numpy ndarray containing the rows from the uir matrix for the specified user,
            otherwise numpy array containing the indexes of the rows from the uir matrix for the interactions of the
            specified user

    """
    user_rows = self._user2rows.get(user_idx, [])[:head]
    return user_rows if as_indices else self._uir[user_rows]

take_head_all(head)

Method which will retain only \(k\) interactions for each user. The \(k\) interactions retained are the first which appear in the rating frame.

This method will return a new Ratings object without changing the original

Examples:

Starting Rating object
+---------+---------+-------+
| user_id | item_id | score |
+---------+---------+-------+
| u1      | i1      |     4 |
| u1      | i2      |     3 |
| u2      | i5      |     1 |
| u2      | i6      |     2 |
+---------+---------+-------+
Starting Rating object: corresponding uir matrix
+----------+----------+--------+-----------+
| user_idx | item_idx | score  | timestamp |
+----------+----------+--------+-----------+
| 0.       | 0.       | 4      | np.nan    |
| 0.       | 1.       | 3      | np.nan    |
| 1.       | 4.       | 1      | np.nan    |
| 1.       | 5.       | 2      | np.nan    |
+----------+----------+--------+-----------+
>>> rating_frame.take_head_all(head=1)
Returned Rating object
+---------+---------+-------+
| user_id | item_id | score |
+---------+---------+-------+
| u1      | i1      |     4 |
| u2      | i5      |     1 |
+---------+---------+-------+
Returned Rating object: corresponding uir matrix
+----------+----------+--------+-----------+
| user_idx | item_idx | score  | timestamp |
+----------+----------+--------+-----------+
| 0.       | 0.       | 4      | np.nan    |
| 1.       | 4.       | 1      | np.nan    |
+----------+----------+--------+-----------+
PARAMETER DESCRIPTION
head

The number of interactions to retain for each user

TYPE: int

RETURNS DESCRIPTION
Ratings

The filtered Ratings object which contains only first \(k\) interactions for each user

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
def take_head_all(self, head: int) -> Ratings:
    """
    Method which will retain only $k$ interactions for each user. The $k$ interactions retained are the first which
    appear in the rating frame.

    This method will return a new `Ratings` object without changing the original

    Examples:

        ```title="Starting Rating object"
        +---------+---------+-------+
        | user_id | item_id | score |
        +---------+---------+-------+
        | u1      | i1      |     4 |
        | u1      | i2      |     3 |
        | u2      | i5      |     1 |
        | u2      | i6      |     2 |
        +---------+---------+-------+
        ```

        ```title="Starting Rating object: corresponding uir matrix"
        +----------+----------+--------+-----------+
        | user_idx | item_idx | score  | timestamp |
        +----------+----------+--------+-----------+
        | 0.       | 0.       | 4      | np.nan    |
        | 0.       | 1.       | 3      | np.nan    |
        | 1.       | 4.       | 1      | np.nan    |
        | 1.       | 5.       | 2      | np.nan    |
        +----------+----------+--------+-----------+
        ```

        >>> rating_frame.take_head_all(head=1)

        ```title="Returned Rating object"
        +---------+---------+-------+
        | user_id | item_id | score |
        +---------+---------+-------+
        | u1      | i1      |     4 |
        | u2      | i5      |     1 |
        +---------+---------+-------+
        ```

        ```title="Returned Rating object: corresponding uir matrix"
        +----------+----------+--------+-----------+
        | user_idx | item_idx | score  | timestamp |
        +----------+----------+--------+-----------+
        | 0.       | 0.       | 4      | np.nan    |
        | 1.       | 4.       | 1      | np.nan    |
        +----------+----------+--------+-----------+
        ```

    Args:
        head: The number of interactions to retain for each user

    Returns:
        The filtered Ratings object which contains only first $k$ interactions for each user
    """
    cut_rows = np.hstack((rows[:head] for rows in self._user2rows.values()))
    new_uir = self._uir[cut_rows]

    return Ratings.from_uir(new_uir, self.user_map.map, self.item_map.map)

to_csv(output_directory='.', file_name='ratings_frame', overwrite=False, ids_as_str=True)

Method which will save the Ratings object to a csv file

PARAMETER DESCRIPTION
output_directory

directory which will contain the csv file

TYPE: str DEFAULT: '.'

file_name

Name of the csv_file

TYPE: str DEFAULT: 'ratings_frame'

overwrite

If set to True and a csv file exists in the same output directory with the same file name, it will be overwritten

TYPE: bool DEFAULT: False

ids_as_str

If True the original string ids for users and items will be used, otherwise their integer ids

TYPE: bool DEFAULT: True

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
def to_csv(self, output_directory: str = '.', file_name: str = 'ratings_frame', overwrite: bool = False,
           ids_as_str: bool = True):
    """
    Method which will save the `Ratings` object to a `csv` file

    Args:
        output_directory: directory which will contain the csv file
        file_name: Name of the csv_file
        overwrite: If set to True and a csv file exists in the same output directory with the same file name, it
            will be overwritten
        ids_as_str: If True the original string ids for users and items will be used, otherwise their integer ids
    """
    Path(output_directory).mkdir(parents=True, exist_ok=True)

    file_name = get_valid_filename(output_directory, file_name, 'csv', overwrite)

    frame = self.to_dataframe(ids_as_str=ids_as_str)
    frame.to_csv(os.path.join(output_directory, file_name), index=False, header=True)

to_dataframe(ids_as_str=True)

Method which will convert the Rating object to a pandas DataFrame object.

The returned DataFrame object will contain the 'user_id', 'item_id' and 'score' column and optionally the 'timestamp' column, if at least one interaction has a timestamp.

PARAMETER DESCRIPTION
ids_as_str

If True, the original string ids for users and items will be used, otherwise their integer ids

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
pd.DataFrame

The rating frame converted to a pandas DataFrame with 'user_id', 'item_id', 'score' column and optionally the 'timestamp' column

Source code in clayrs/content_analyzer/ratings_manager/ratings.py
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
def to_dataframe(self, ids_as_str: bool = True) -> pd.DataFrame:
    """
    Method which will convert the `Rating` object to a `pandas DataFrame object`.

    The returned DataFrame object will contain the 'user_id', 'item_id' and 'score' column and optionally the
    'timestamp' column, if at least one interaction has a timestamp.

    Args:
        ids_as_str: If True, the original string ids for users and items will be used, otherwise their integer ids

    Returns:
        The rating frame converted to a pandas DataFrame with 'user_id', 'item_id', 'score' column and optionally
            the 'timestamp' column

    """
    if ids_as_str:
        will_be_frame = {'user_id': self.user_id_column,
                         'item_id': self.item_id_column,
                         'score': self.score_column}
    else:
        will_be_frame = {'user_id': self.user_idx_column,
                         'item_id': self.item_idx_column,
                         'score': self.score_column}

    if len(self.timestamp_column) != 0:
        will_be_frame['timestamp'] = self.timestamp_column

    return pd.DataFrame(will_be_frame)