Raw Source Wrappers

`CSVFile(file_path, separator=',', has_header=True, encoding='utf-8-sig')`

Bases: RawInformationSource

Wrapper for a CSV file. This class is able to read from a CSV file where each entry is separated by the a certain separator (, by default). So by using this class you can also read TSV file for examples, by specifying separator='\t'.

A CSV File most typically has a header: in this case, each entry can be referenced with its column header. In case the CSV File hasn't a header, simply specify has_header=False: in this case, each entry can be referenced with a string representing its positional index (e.g. '0' for entry in the first position, '1' for the entry in the second position, etc.)

You can iterate over the whole content of the raw source with a simple for loop: each row will be returned as a dictionary where keys are strings representing the positional indices, values are the entries

Examples:

Consider the following CSV file with header

movie_id,movie_title,release_year
1,Jumanji,1995
2,Toy Story,1995

>>> file = CSVFile(csv_path)
>>> print(list(file))
[{'movie_id': '1', 'movie_title': 'Jumanji', 'release_year': '1995'},
{'movie_id': '2', 'movie_title': 'Toy Story', 'release_year': '1995'}]

Consider the following TSV file with no header

1   Jumanji 1995
2   Toy Story   1995

>>> file = CSVFile(tsv_path, separator='\t', has_header=False)
>>> print(list(file))
[{'0': '1', '1': 'Jumanji', '2': '1995'},
{'0': '2', '1': 'Toy Story', '2': '1995'}]

PARAMETER	DESCRIPTION
`file_path`	Path of the dat file TYPE: `str`
`separator`	Character which separates each entry. By default is a comma (`,`), but in case you need to read from a TSV file simply change this parameter to `\t` TYPE: `str` DEFAULT: `','`
`has_header`	Boolean value which specifies if the file has an header or not. Default is True TYPE: `bool` DEFAULT: `True`
`encoding`	Define the type of encoding of data stored in the source (example: "utf-8") TYPE: `str` DEFAULT: `'utf-8-sig'`

Source code in clayrs/content_analyzer/raw_information_source.py

def __init__(self, file_path: str, separator: str = ',', has_header: bool = True, encoding: str = "utf-8-sig"):
    super().__init__(file_path, encoding)
    self.__has_header = has_header
    self.__separator = separator

`representative_name: str` `property`

Method which returns a meaningful name for the raw source.

In this case it's simply the file name + its extension

RETURNS	DESCRIPTION
`str`	The representative name for the raw source

`DATFile(file_path, encoding='utf-8')`

Bases: RawInformationSource

Wrapper for a DAT file. This class is able to read from a DAT file where each entry is separated by the :: string. Since a DAT file has no header, each entry can be referenced with a string representing its positional index (e.g. '0' for entry in the first position, '1' for the entry in the second position, etc.)

You can iterate over the whole content of the raw source with a simple for loop: each row will be returned as a dictionary where keys are strings representing the positional indices, values are the entries

Examples:

Consider the following DAT file

10::worker::75011
11::without occupation::76112

>>> file = DATFile(dat_path)
>>> print(list(file))
[{'0': '10', '1': 'worker', '2': '75011'},
{'0': '11', '1': 'without occupation', '2': '76112'}]

PARAMETER DESCRIPTION

file_path

path of the dat file

TYPE: str

encoding

define the type of encoding of data stored in the source (example: "utf-8")

TYPE: str DEFAULT: 'utf-8'

Source code in clayrs/content_analyzer/raw_information_source.py

def __init__(self, file_path: str, encoding: str = "utf-8"):
    super().__init__(file_path, encoding)

`representative_name: str` `property`

Method which returns a meaningful name for the raw source.

In this case it's simply the file name + its extension

RETURNS	DESCRIPTION
`str`	The representative name for the raw source

`JSONFile(file_path, encoding='utf-8')`

Bases: RawInformationSource

Wrapper for a JSON file. This class is able to read from a JSON file where each "row" is a dictionary-like object inside a list

You can iterate over the whole content of the raw source with a simple for loop: each row will be returned as a dictionary

Examples:

Consider the following JSON file

[{"Title":"Jumanji","Year":"1995"},
 {"Title":"Toy Story","Year":"1995"}]

>>> file = JSONFile(json_path)
>>> print(list(file))
[{'Title': 'Jumanji', 'Year': '1995'},
 {'Title': 'Toy Story', 'Year': '1995'}]

PARAMETER DESCRIPTION

file_path

path of the dat file

TYPE: str

encoding

define the type of encoding of data stored in the source (example: "utf-8")

TYPE: str DEFAULT: 'utf-8'

Source code in clayrs/content_analyzer/raw_information_source.py

def __init__(self, file_path: str, encoding: str = "utf-8"):
    super().__init__(file_path, encoding)

`representative_name: str` `property`

Method which returns a meaningful name for the raw source.

In this case it's simply the file name + its extension

RETURNS	DESCRIPTION
`str`	The representative name for the raw source

`SQLDatabase(host, username, password, database_name, table_name, encoding='utf-8')`

Bases: RawInformationSource

Wrapper for a SQL database.

You can iterate over the whole content of the raw source with a simple for loop: each row will be returned as a dictionary where keys are strings representing the positional indices, values are the entries

Examples:

Consider the following SQL table for the databaase 'movies' in localhost

+----------+-------------+--------------+
| Movie ID | Movie Title | Release Year |
+----------+-------------+--------------+
|        1 | Jumanji     |         1995 |
|        2 | Toy Story   |         1995 |
+----------+-------------+--------------+

>>> file = SQLDatabase(host='127.0.0.1', username='root', password='root',
>>>                    database_name='movies', table_name='movies_table')
>>> print(list(file))
[{'Movie ID': '1', 'Movie Title': 'Jumanji', 'Release Year': '1995'},
{'Movie ID': '2', 'Movie Title': 'Toy Story', 'Release Year': '1995'}]

PARAMETER	DESCRIPTION
`host`	host ip of the sql server TYPE: `str`
`username`	username for the access TYPE: `str`
`password`	password for the access TYPE: `str`
`database_name`	name of database TYPE: `str`
`table_name`	name of the database table where data is stored TYPE: `str`
`encoding`	Define the type of encoding of data stored in the source (example: "utf-8") TYPE: `str` DEFAULT: `'utf-8'`

Source code in clayrs/content_analyzer/raw_information_source.py

def __init__(self, host: str,
             username: str,
             password: str,
             database_name: str,
             table_name: str,
             encoding: str = "utf-8"):
    super().__init__('', encoding)
    self.__host: str = host
    self.__username: str = username
    self.__password: str = password
    self.__database_name: str = database_name
    self.__table_name: str = table_name

    conn = mysql.connector.connect(host=self.__host,
                                   user=self.__username,
                                   password=self.__password,
                                   charset=self.encoding)
    cursor = conn.cursor()
    query = """USE """ + self.__database_name + """;"""
    cursor.execute(query)
    conn.commit()
    self.__conn = conn

`representative_name: str` `property`

Method which returns a meaningful name for the raw source.

In this case it's the host name followed by the table name

RETURNS	DESCRIPTION
`str`	The representative name for the raw source

Raw Source Wrappers

CSVFile(file_path, separator=',', has_header=True, encoding='utf-8-sig')

representative_name: str property

DATFile(file_path, encoding='utf-8')

representative_name: str property

JSONFile(file_path, encoding='utf-8')

representative_name: str property

SQLDatabase(host, username, password, database_name, table_name, encoding='utf-8')

representative_name: str property

`CSVFile(file_path, separator=',', has_header=True, encoding='utf-8-sig')`

`representative_name: str` `property`

`DATFile(file_path, encoding='utf-8')`

`representative_name: str` `property`

`JSONFile(file_path, encoding='utf-8')`

`representative_name: str` `property`

`SQLDatabase(host, username, password, database_name, table_name, encoding='utf-8')`

`representative_name: str` `property`