Raw Source Wrappers
CSVFile(file_path, separator=',', has_header=True, encoding='utf-8-sig')
Bases: RawInformationSource
Wrapper for a CSV file. This class is able to read from a CSV file where each entry is separated by the a certain
separator (,
by default). So by using this class you can also read TSV file for examples, by specifying
separator='\t'
.
A CSV File most typically has a header: in this case, each entry can be referenced with its column header.
In case the CSV File hasn't a header, simply specify has_header=False
: in this case, each entry can be referenced
with a string representing its positional index
(e.g. '0' for entry in the first position, '1' for the entry in the second position, etc.)
You can iterate over the whole content of the raw source with a simple for loop: each row will be returned as a dictionary where keys are strings representing the positional indices, values are the entries
Examples:
Consider the following CSV file with header
movie_id,movie_title,release_year
1,Jumanji,1995
2,Toy Story,1995
>>> file = CSVFile(csv_path)
>>> print(list(file))
[{'movie_id': '1', 'movie_title': 'Jumanji', 'release_year': '1995'},
{'movie_id': '2', 'movie_title': 'Toy Story', 'release_year': '1995'}]
Consider the following TSV file with no header
1 Jumanji 1995
2 Toy Story 1995
>>> file = CSVFile(tsv_path, separator='\t', has_header=False)
>>> print(list(file))
[{'0': '1', '1': 'Jumanji', '2': '1995'},
{'0': '2', '1': 'Toy Story', '2': '1995'}]
PARAMETER | DESCRIPTION |
---|---|
file_path |
Path of the dat file
TYPE:
|
separator |
Character which separates each entry. By default is a comma (
TYPE:
|
has_header |
Boolean value which specifies if the file has an header or not. Default is True
TYPE:
|
encoding |
Define the type of encoding of data stored in the source (example: "utf-8")
TYPE:
|
Source code in clayrs/content_analyzer/raw_information_source.py
232 233 234 235 |
|
representative_name: str
property
Method which returns a meaningful name for the raw source.
In this case it's simply the file name + its extension
RETURNS | DESCRIPTION |
---|---|
str
|
The representative name for the raw source |
DATFile(file_path, encoding='utf-8')
Bases: RawInformationSource
Wrapper for a DAT file. This class is able to read from a DAT file where each entry is separated by the ::
string.
Since a DAT file has no header, each entry can be referenced with a string representing its positional index
(e.g. '0' for entry in the first position, '1' for the entry in the second position, etc.)
You can iterate over the whole content of the raw source with a simple for loop: each row will be returned as a dictionary where keys are strings representing the positional indices, values are the entries
Examples:
Consider the following DAT file
10::worker::75011
11::without occupation::76112
>>> file = DATFile(dat_path)
>>> print(list(file))
[{'0': '10', '1': 'worker', '2': '75011'},
{'0': '11', '1': 'without occupation', '2': '76112'}]
PARAMETER | DESCRIPTION |
---|---|
file_path |
path of the dat file
TYPE:
|
encoding |
define the type of encoding of data stored in the source (example: "utf-8")
TYPE:
|
Source code in clayrs/content_analyzer/raw_information_source.py
83 84 |
|
representative_name: str
property
Method which returns a meaningful name for the raw source.
In this case it's simply the file name + its extension
RETURNS | DESCRIPTION |
---|---|
str
|
The representative name for the raw source |
JSONFile(file_path, encoding='utf-8')
Bases: RawInformationSource
Wrapper for a JSON file. This class is able to read from a JSON file where each "row" is a dictionary-like object inside a list
You can iterate over the whole content of the raw source with a simple for loop: each row will be returned as a dictionary
Examples:
Consider the following JSON file
[{"Title":"Jumanji","Year":"1995"},
{"Title":"Toy Story","Year":"1995"}]
>>> file = JSONFile(json_path)
>>> print(list(file))
[{'Title': 'Jumanji', 'Year': '1995'},
{'Title': 'Toy Story', 'Year': '1995'}]
PARAMETER | DESCRIPTION |
---|---|
file_path |
path of the dat file
TYPE:
|
encoding |
define the type of encoding of data stored in the source (example: "utf-8")
TYPE:
|
Source code in clayrs/content_analyzer/raw_information_source.py
150 151 |
|
representative_name: str
property
Method which returns a meaningful name for the raw source.
In this case it's simply the file name + its extension
RETURNS | DESCRIPTION |
---|---|
str
|
The representative name for the raw source |
SQLDatabase(host, username, password, database_name, table_name, encoding='utf-8')
Bases: RawInformationSource
Wrapper for a SQL database.
You can iterate over the whole content of the raw source with a simple for loop: each row will be returned as a dictionary where keys are strings representing the positional indices, values are the entries
Examples:
Consider the following SQL table for the databaase 'movies' in localhost
+----------+-------------+--------------+
| Movie ID | Movie Title | Release Year |
+----------+-------------+--------------+
| 1 | Jumanji | 1995 |
| 2 | Toy Story | 1995 |
+----------+-------------+--------------+
>>> file = SQLDatabase(host='127.0.0.1', username='root', password='root',
>>> database_name='movies', table_name='movies_table')
>>> print(list(file))
[{'Movie ID': '1', 'Movie Title': 'Jumanji', 'Release Year': '1995'},
{'Movie ID': '2', 'Movie Title': 'Toy Story', 'Release Year': '1995'}]
PARAMETER | DESCRIPTION |
---|---|
host |
host ip of the sql server
TYPE:
|
username |
username for the access
TYPE:
|
password |
password for the access
TYPE:
|
database_name |
name of database
TYPE:
|
table_name |
name of the database table where data is stored
TYPE:
|
encoding |
Define the type of encoding of data stored in the source (example: "utf-8")
TYPE:
|
Source code in clayrs/content_analyzer/raw_information_source.py
313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 |
|
representative_name: str
property
Method which returns a meaningful name for the raw source.
In this case it's the host name followed by the table name
RETURNS | DESCRIPTION |
---|---|
str
|
The representative name for the raw source |