Word Embeddings

Via the following, you can obtain embeddings of word granularity

from clayrs import content_analyzer as ca

# obtain word embeddings using pre-trained model 'glove-twitter-50'
# from Gensim library
ca.WordEmbeddingTechnique(embedding_source=ca.Gensim('glove-twitter-50'))

`WordEmbeddingTechnique(embedding_source)`

Bases: StandardEmbeddingTechnique

Class that makes use of a word granularity embedding source to produce word embeddings

PARAMETER DESCRIPTION

embedding_source

Any WordEmbedding model

TYPE: Union[WordEmbeddingLoader, WordEmbeddingLearner, str]

Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py

def __init__(self, embedding_source: Union[WordEmbeddingLoader, WordEmbeddingLearner, str]):
    # if isinstance(embedding_source, str):
    #     embedding_source = self.from_str_to_embedding_source(embedding_source, WordEmbeddingLoader)
    super().__init__(embedding_source)

Word Embedding models

`Gensim(model_name='glove-twitter-25')`

Bases: WordEmbeddingLoader

Class that produces word embeddings using gensim pre-trained models.

The model will be automatically downloaded using the gensim downloader api if not present locally.

PARAMETER DESCRIPTION

model_name

Name of the model to load/download

TYPE: str DEFAULT: 'glove-twitter-25'

Source code in clayrs/content_analyzer/embeddings/embedding_loader/gensim.py

def __init__(self, model_name: str = 'glove-twitter-25'):
    super().__init__(model_name)

`GensimDoc2Vec(reference=None, auto_save=True, **kwargs)`

Bases: GensimWordEmbeddingLearner

Class that implements Doc2Vec model thanks to the Gensim library.

If a pre-trained local Word2Vec model must be loaded, put its path in the reference parameter. Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly represent

If you'd like to save the model once trained, set the path in the reference parameter and set auto_save=True. If reference is None, trained model won't be saved after training and will only be used to produce contents in the current run

Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized

PARAMETER DESCRIPTION

reference

Path of the model to load/where the model trained will be saved if auto_save=True. If None the trained model won't be saved after training and will only be used to produce contents in the current run

TYPE: str DEFAULT: None

auto_save

If True, the model will be saved in the path specified in reference parameter

TYPE: bool DEFAULT: True

Source code in clayrs/content_analyzer/embeddings/embedding_learner/doc2vec.py

def __init__(self, reference: str = None, auto_save: bool = True, **kwargs):
    super().__init__(reference, auto_save, ".kv", **kwargs)

`GensimFastText(reference=None, auto_save=True, **kwargs)`

Bases: GensimWordEmbeddingLearner

Class that implements FastText model thanks to the Gensim library.

If a pre-trained local Word2Vec model must be loaded, put its path in the reference parameter. Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly represent

If you'd like to save the model once trained, set the path in the reference parameter and set auto_save=True. If reference is None, trained model won't be saved after training and will only be used to produce contents in the current run

Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized

PARAMETER DESCRIPTION

reference

Path of the model to load/where the model trained will be saved if auto_save=True. If None the trained model won't be saved after training and will only be used to produce contents in the current run

TYPE: str DEFAULT: None

auto_save

If True, the model will be saved in the path specified in reference parameter

TYPE: bool DEFAULT: True

Source code in clayrs/content_analyzer/embeddings/embedding_learner/fasttext.py

def __init__(self, reference: str = None, auto_save: bool = True, **kwargs):
    super().__init__(reference, auto_save, ".kv", **kwargs)

`GensimRandomIndexing(reference=None, auto_save=True, **kwargs)`

Bases: GensimDocumentEmbeddingLearner

Class that implements RandomIndexing model thanks to the Gensim library.

If a pre-trained local Word2Vec model must be loaded, put its path in the reference parameter. Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly represent

If you'd like to save the model once trained, set the path in the reference parameter and set auto_save=True. If reference is None, trained model won't be saved after training and will only be used to produce contents in the current run

Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized

PARAMETER DESCRIPTION

reference

Path of the model to load/where the model trained will be saved if auto_save=True. If None the trained model won't be saved after training and will only be used to produce contents in the current run

TYPE: str DEFAULT: None

auto_save

If True, the model will be saved in the path specified in reference parameter

TYPE: bool DEFAULT: True

Source code in clayrs/content_analyzer/embeddings/embedding_learner/random_indexing.py

def __init__(self, reference: str = None, auto_save: bool = True, **kwargs):
    super().__init__(reference, auto_save, ".model", **kwargs)

`GensimWord2Vec(reference=None, auto_save=True, **kwargs)`

Bases: GensimWordEmbeddingLearner

Class that implements Word2Vec model thanks to the Gensim library.

If a pre-trained local Word2Vec model must be loaded, put its path in the reference parameter. Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly represent

If you'd like to save the model once trained, set the path in the reference parameter and set auto_save=True. If reference is None, trained model won't be saved after training and will only be used to produce contents in the current run

Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized

PARAMETER DESCRIPTION

reference

Path of the model to load/where the model trained will be saved if auto_save=True. If None the trained model won't be saved after training and will only be used to produce contents in the current run

TYPE: str DEFAULT: None

auto_save

If True, the model will be saved in the path specified in reference parameter

TYPE: bool DEFAULT: True

Source code in clayrs/content_analyzer/embeddings/embedding_learner/word2vec.py

def __init__(self, reference: str = None, auto_save: bool = True, **kwargs):
    super().__init__(reference, auto_save, ".kv", **kwargs)