Skip to content

Properties from local dataset

PropertiesFromDataset(mode='only_retrieved_evaluated', field_name_list=None)

Bases: ExogenousPropertiesRetrieval

Exogenous technique which expands each content by using as external source the raw source itself

Different modalities are available:

  • If mode='only_retrieved_evaluated' all fields for the content will be retrieved from raw source but discarding the ones with a blank value (i.e. '')
JSON raw source
[{'Title': 'Jumanji', 'Year': 1995},
{'Title': 'Toy Story', 'Year': ''}]
json_file = JSONFile(json_path)
PropertiesFromDataset(mode='only_retrieved_evaluated').get_properties(json_file)
# output is a list of PropertiesDict object with the following values:
# [{'Title': 'Jumanji', 'Year': 1995},
#  {'Title': 'Toy Story'}]
  • If mode='all' all fields for the content will be retrieved from raw source including the ones with a blank value
JSON raw source
[{'Title': 'Jumanji', 'Year': 1995},
{'Title': 'Toy Story', 'Year': ''}]
json_file = JSONFile(json_path)
PropertiesFromDataset(mode='only_retrieved_evaluated').get_properties(json_file)
# output is a list of PropertiesDict object with the following values:
# [{'Title': 'Jumanji', 'Year': 1995},
#  {'Title': 'Toy Story', 'Year': ''}]

You could also choose exactly which fields to use to expand each content with the field_name_list parameter

JSON raw source
[{'Title': 'Jumanji', 'Year': 1995},
{'Title': 'Toy Story', 'Year': ''}]
json_file = JSONFile(json_path)
PropertiesFromDataset(mode='only_retrieved_evaluated',
                      field_name_list=['Title']).get_properties(json_file)
# output is a list of PropertiesDict object with the following values:
# [{'Title': 'Jumanji'},
#  {'Title': 'Toy Story'}]
PARAMETER DESCRIPTION
mode

Parameter which specifies which properties should be retrieved.

Possible values are ['only_retrieved_evaluated', 'all']:

1. 'only retrieved evaluated' will retrieve properties which have a
value, discarding ones with a blank value (i.e. '')
2. 'all' will retrieve all properties, regardless if they have a value
or not

TYPE: str DEFAULT: 'only_retrieved_evaluated'

field_name_list

List of fields from the raw source that will be retrieved. Useful if you want to expand each content with only a subset of available properties from the local dataset

TYPE: List[str] DEFAULT: None

Source code in clayrs/content_analyzer/exogenous_properties_retrieval.py
139
140
141
def __init__(self, mode: str = 'only_retrieved_evaluated', field_name_list: List[str] = None):
    super().__init__(mode)
    self.__field_name_list: List[str] = field_name_list