Skip to content

Properties from DBPedia ontology

DBPediaMappingTechnique(entity_type, label_field, lang='EN', mode='only_retrieved_evaluated', return_prop_as_uri=False, max_timeout=5)

Bases: ExogenousPropertiesRetrieval

Exogenous technique which expands each content by using as external source the DBPedia ontology

It needs the entity of the contents for which a mapping is required (e.g. entity_type=dbo:Film) and the field of the raw source that will be used for the actual mapping:

Different modalities are available:

  • If mode='only_retrieved_evaluated', all properties from DBPedia will be retrieved but discarding the ones with a blank value (i.e. '')

  • If mode='all', all properties in DBPedia + all properties in local raw source will be retrieved. Local properties will be overwritten by dbpedia values if there's a conflict (same property in dbpedia and in local dataset)

  • If mode='all_retrieved', all properties in DBPedia only will be retrieved

  • If mode='original_retrieved', all local properties with their DBPedia value will be retrieved

PARAMETER DESCRIPTION
entity_type

Domain of the contents you want to process (e.g. 'dbo:Film')

TYPE: str

label_field

Field of the raw source that will be used to map each content, DBPedia node with property rdfs:label equal to specified field value will be retrieved

TYPE: str

lang

Language of the rdfs:label that should match with label_field in the raw source

TYPE: str DEFAULT: 'EN'

mode

Parameter which specifies which properties should be retrieved.

Possible values are ['only_retrieved_evaluated', 'all', 'all_retrieved', 'original_retrieved']:

1. 'only retrieved evaluated' will retrieve properties which have a
value, discarding ones with a blank value (i.e. '')
2. 'all' will retrieve all properties from DBPedia + local source,
regardless if they have a value or not
3. 'all_retrieved' will retrieve all properties from DBPedia only
4. 'original_retrieved' will retrieve all local properties with
their DBPedia value

TYPE: str DEFAULT: 'only_retrieved_evaluated'

return_prop_as_uri

If set to True, properties will be returned in their full uri form rather than in their rdfs:label form (e.g. "http://dbpedia.org/ontology/director" rather than "film director")

TYPE: bool DEFAULT: False

max_timeout

Sometimes when mapping content to dbpedia, a batch of query may take longer than the max time allowed by the server due to internet issues: the framework will re-try the exact query max_timeout times before raising a TimeoutError

TYPE: int DEFAULT: 5

Source code in clayrs/content_analyzer/exogenous_properties_retrieval.py
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
def __init__(self, entity_type: str, label_field: str, lang: str = 'EN',
             mode: str = 'only_retrieved_evaluated', return_prop_as_uri: bool = False,
             max_timeout: int = 5):
    super().__init__(mode)

    self._entity_type = entity_type
    self._label_field = label_field
    self._prop_as_uri = return_prop_as_uri
    self._lang = lang
    self._max_timeout = max_timeout

    self._sparql = SPARQLWrapper("https://dbpedia.org/sparql")
    self._sparql.setReturnFormat(JSON)

    self._class_properties = self._get_properties_class()