Word Embeddings
Via the following, you can obtain embeddings of word granularity
from clayrs import content_analyzer as ca
# obtain word embeddings using pre-trained model 'glove-twitter-50'
# from Gensim library
ca.WordEmbeddingTechnique(embedding_source=ca.Gensim('glove-twitter-50'))
WordEmbeddingTechnique(embedding_source)
Bases: StandardEmbeddingTechnique
Class that makes use of a word granularity embedding source to produce word embeddings
PARAMETER | DESCRIPTION |
---|---|
embedding_source |
Any |
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
181 182 183 184 |
|
Word Embedding models
Gensim(model_name='glove-twitter-25')
Bases: WordEmbeddingLoader
Class that produces word embeddings using gensim pre-trained models.
The model will be automatically downloaded using the gensim downloader api if not present locally.
PARAMETER | DESCRIPTION |
---|---|
model_name |
Name of the model to load/download
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_loader/gensim.py
18 19 |
|
GensimDoc2Vec(reference=None, auto_save=True, **kwargs)
Bases: GensimWordEmbeddingLearner
Class that implements Doc2Vec model thanks to the Gensim library.
If a pre-trained local Word2Vec model must be loaded, put its path in the reference
parameter.
Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly
represent
If you'd like to save the model once trained, set the path in the reference
parameter and set
auto_save=True
. If reference
is None, trained model won't be saved after training and will only be used to
produce contents in the current run
Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized
PARAMETER | DESCRIPTION |
---|---|
reference |
Path of the model to load/where the model trained will be saved if
TYPE:
|
auto_save |
If True, the model will be saved in the path specified in
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_learner/doc2vec.py
29 30 |
|
GensimFastText(reference=None, auto_save=True, **kwargs)
Bases: GensimWordEmbeddingLearner
Class that implements FastText model thanks to the Gensim library.
If a pre-trained local Word2Vec model must be loaded, put its path in the reference
parameter.
Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly
represent
If you'd like to save the model once trained, set the path in the reference
parameter and set
auto_save=True
. If reference
is None, trained model won't be saved after training and will only be used to
produce contents in the current run
Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized
PARAMETER | DESCRIPTION |
---|---|
reference |
Path of the model to load/where the model trained will be saved if
TYPE:
|
auto_save |
If True, the model will be saved in the path specified in
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_learner/fasttext.py
29 30 |
|
GensimRandomIndexing(reference=None, auto_save=True, **kwargs)
Bases: GensimDocumentEmbeddingLearner
Class that implements RandomIndexing model thanks to the Gensim library.
If a pre-trained local Word2Vec model must be loaded, put its path in the reference
parameter.
Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly
represent
If you'd like to save the model once trained, set the path in the reference
parameter and set
auto_save=True
. If reference
is None, trained model won't be saved after training and will only be used to
produce contents in the current run
Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized
PARAMETER | DESCRIPTION |
---|---|
reference |
Path of the model to load/where the model trained will be saved if
TYPE:
|
auto_save |
If True, the model will be saved in the path specified in
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_learner/random_indexing.py
33 34 |
|
GensimWord2Vec(reference=None, auto_save=True, **kwargs)
Bases: GensimWordEmbeddingLearner
Class that implements Word2Vec model thanks to the Gensim library.
If a pre-trained local Word2Vec model must be loaded, put its path in the reference
parameter.
Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly
represent
If you'd like to save the model once trained, set the path in the reference
parameter and set
auto_save=True
. If reference
is None, trained model won't be saved after training and will only be used to
produce contents in the current run
Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized
PARAMETER | DESCRIPTION |
---|---|
reference |
Path of the model to load/where the model trained will be saved if
TYPE:
|
auto_save |
If True, the model will be saved in the path specified in
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_learner/word2vec.py
29 30 |
|