Skip to content

Report class

Report(output_dir='.', ca_report_filename='ca_report', rs_report_filename='rs_report', eva_report_filename='eva_report')

Class which will generate a YAML report for the whole experiment (or a part of it) depending on the objects passed to the yaml() function.

A report will be generated for each module used (Content Analyzer, RecSys, Evaluation).

PARAMETER DESCRIPTION
output_dir

Path of the folder where reports generated will be saved

TYPE: str DEFAULT: '.'

ca_report_filename

Filename of the Content Analyzer report

TYPE: str DEFAULT: 'ca_report'

rs_report_filename

Filename of the Recsys report

TYPE: str DEFAULT: 'rs_report'

eva_report_filename

Filename of the evaluation report

TYPE: str DEFAULT: 'eva_report'

Source code in clayrs/utils/report.py
37
38
39
40
41
42
43
44
45
def __init__(self, output_dir: str = '.',
             ca_report_filename: str = 'ca_report',
             rs_report_filename: str = 'rs_report',
             eva_report_filename: str = 'eva_report'):

    self._output_dir = output_dir
    self._ca_report_filename = ca_report_filename
    self._rs_report_filename = rs_report_filename
    self._eva_report_filename = eva_report_filename

yaml(content_analyzer=None, original_ratings=None, partitioning_technique=None, recsys=None, eval_model=None)

Main module responsible of generating the YAML reports based on the objects passed to this function:

  • If content_analyzer is set, then the report for the Content Analyzer will be produced
  • If one between original_ratings, partitioning_technique, recsys is set, then the report for the recsys module will be produced.
  • If eval_model is set, then the report for the evaluation module will be produced

PLEASE NOTE: by setting the recsys parameter, the last experiment conducted will be documented! If no experiment is conducted in the current run, then a ValueError exception is raised!

  • Same goes for the eval_model

Examples:

  • Generate a report for the Content Analyzer module
>>> from clayrs import content_analyzer as ca
>>> from clayrs import utils as ut
>>> # movies_ca_config = ...  # user defined configuration
>>> content_a = ca.ContentAnalyzer(movies_config)
>>> content_a.fit()  # generate and serialize contents
>>> ut.Report().yaml(content_analyzer=content_a)  # generate yaml
  • Generate a partial report for the RecSys module
>>> from clayrs import utils as ut
>>> from clayrs import recsys as rs
>>> ratings = ca.Ratings(ca.CSVFile(ratings_path))
>>> pt = rs.HoldOutPartitioning()
>>> [train], [test] = pt.split_all(ratings)
>>> ut.Report().yaml(original_ratings=ratings, partitioning_technique=pt)
  • Generate a full report for the RecSys module and evaluation module
>>> from clayrs import utils as ut
>>> from clayrs import recsys as rs
>>> from clayrs import evaluation as eva
>>>
>>> # Generate recommendations
>>> ratings = ca.Ratings(ca.CSVFile(ratings_path))
>>> pt = rs.HoldOutPartitioning()
>>> [train], [test] = pt.split_all(ratings)
>>> alg = rs.CentroidVector()
>>> cbrs = rs.ContentBasedRS(alg, train_set=train, items_directory=items_path)
>>> rank = cbrs.fit_rank(test, n_recs=10)
>>>
>>> # Evaluate recommendations and generate report
>>> em = eva.EvalModel([rank], [test], metric_list=[eva.Precision(), eva.Recall()])
>>> ut.Report().yaml(original_ratings=ratings,
>>>                  partitioning_technique=pt,
>>>                  recsys=cbrs,
>>>                  eval_model=em)
PARAMETER DESCRIPTION
content_analyzer

ContentAnalyzer object used to generate complex representation in the experiment

TYPE: ContentAnalyzer DEFAULT: None

original_ratings

Ratings object representing the original dataset

TYPE: Ratings DEFAULT: None

partitioning_technique

Partitioning object used to split the original dataset

TYPE: Partitioning DEFAULT: None

recsys

RecSys object used to produce recommendations/score predictions. Please note that the latest experiment run will be documented. If no experiment is run, then an exception is thrown

TYPE: RecSys DEFAULT: None

eval_model

EvalModel object used to evaluate predictions generated. Please note that the latest evaluation run will be documented. If no evaluation is run, then an exception is thrown

TYPE: EvalModel DEFAULT: None

Source code in clayrs/utils/report.py
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
def yaml(self, content_analyzer: ContentAnalyzer = None,
         original_ratings: Ratings = None,
         partitioning_technique: Partitioning = None,
         recsys: RecSys = None,
         eval_model: EvalModel = None):
    """
    Main module responsible of generating the `YAML` reports based on the objects passed to this function:

    * If `content_analyzer` is set, then the report for the Content Analyzer will be produced
    * If one between `original_ratings`, `partitioning_technique`, `recsys` is set, then the report for the recsys
    module will be produced.
    * If `eval_model` is set, then the report for the evaluation module will be produced

    **PLEASE NOTE**: by setting the `recsys` parameter, the last experiment conducted will be documented! If no
    experiment is conducted in the current run, then a `ValueError` exception is raised!

    * Same goes for the `eval_model`

    Examples:

        * Generate a report for the Content Analyzer module
        >>> from clayrs import content_analyzer as ca
        >>> from clayrs import utils as ut
        >>> # movies_ca_config = ...  # user defined configuration
        >>> content_a = ca.ContentAnalyzer(movies_config)
        >>> content_a.fit()  # generate and serialize contents
        >>> ut.Report().yaml(content_analyzer=content_a)  # generate yaml

        * Generate a partial report for the RecSys module
        >>> from clayrs import utils as ut
        >>> from clayrs import recsys as rs
        >>> ratings = ca.Ratings(ca.CSVFile(ratings_path))
        >>> pt = rs.HoldOutPartitioning()
        >>> [train], [test] = pt.split_all(ratings)
        >>> ut.Report().yaml(original_ratings=ratings, partitioning_technique=pt)

        * Generate a full report for the RecSys module and evaluation module
        >>> from clayrs import utils as ut
        >>> from clayrs import recsys as rs
        >>> from clayrs import evaluation as eva
        >>>
        >>> # Generate recommendations
        >>> ratings = ca.Ratings(ca.CSVFile(ratings_path))
        >>> pt = rs.HoldOutPartitioning()
        >>> [train], [test] = pt.split_all(ratings)
        >>> alg = rs.CentroidVector()
        >>> cbrs = rs.ContentBasedRS(alg, train_set=train, items_directory=items_path)
        >>> rank = cbrs.fit_rank(test, n_recs=10)
        >>>
        >>> # Evaluate recommendations and generate report
        >>> em = eva.EvalModel([rank], [test], metric_list=[eva.Precision(), eva.Recall()])
        >>> ut.Report().yaml(original_ratings=ratings,
        >>>                  partitioning_technique=pt,
        >>>                  recsys=cbrs,
        >>>                  eval_model=em)

    Args:
        content_analyzer: `ContentAnalyzer` object used to generate complex representation in the experiment
        original_ratings: `Ratings` object representing the original dataset
        partitioning_technique: `Partitioning` object used to split the original dataset
        recsys: `RecSys` object used to produce recommendations/score predictions. Please note that the latest
            experiment run will be documented. If no experiment is run, then an exception is thrown
        eval_model: `EvalModel` object used to evaluate predictions generated. Please note that the latest
            evaluation run will be documented. If no evaluation is run, then an exception is thrown
    """

    def represent_none(self, _):
        return self.represent_scalar('tag:yaml.org,2002:null', 'null')

    def dump_yaml(output_dir, data):
        with open(output_dir, 'w') as yaml_file:
            pyaml.dump(data, yaml_file, sort_dicts=False, safe=True,)

    # None values will be represented as 'null' in yaml file.
    # without this, they will simply be represented as an empty string
    pyaml.add_representer(type(None), represent_none)

    if content_analyzer is not None:
        ca_dict = self._report_ca_module(content_analyzer)

        # create folder if it doesn't exist
        Path(self.output_dir).mkdir(parents=True, exist_ok=True)

        output_dir = os.path.join(self.output_dir, f'{self._ca_report_filename}.yml')
        dump_yaml(output_dir, ca_dict)

    if original_ratings is not None or partitioning_technique is not None or recsys is not None:
        rs_dict = self._report_rs_module(original_ratings, partitioning_technique, recsys)

        # create folder if it doesn't exist
        Path(self.output_dir).mkdir(parents=True, exist_ok=True)

        output_dir = os.path.join(self.output_dir, f'{self._rs_report_filename}.yml')
        dump_yaml(output_dir, rs_dict)

    if eval_model is not None:
        eva_dict = self._report_eva_module(eval_model)

        # create folder if it doesn't exist
        Path(self.output_dir).mkdir(parents=True, exist_ok=True)

        output_dir = os.path.join(self.output_dir, f'{self._eva_report_filename}.yml')
        dump_yaml(output_dir, eva_dict)