Document Embeddings
Via the following, you can obtain embeddings of document granularity
from clayrs import content_analyzer as ca
# obtain document embeddings by training LDA model
# on corpus of contents to complexly represent
ca.DocumentEmbeddingTechnique(embedding_source=ca.GensimLDA())
DocumentEmbeddingTechnique(embedding_source)
Bases: StandardEmbeddingTechnique
Class that makes use of a document granularity embedding source to produce document embeddings
PARAMETER | DESCRIPTION |
---|---|
embedding_source |
Any
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
227 228 229 230 |
|
Document Embedding models
GensimLatentSemanticAnalysis(reference=None, auto_save=True, **kwargs)
Bases: GensimDocumentEmbeddingLearner
Class that implements Latent Semantic Analysis (A.K.A. Latent Semantic Indexing) (LSI) thanks to the Gensim library.
If a pre-trained local Word2Vec model must be loaded, put its path in the reference
parameter.
Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly
represent
If you'd like to save the model once trained, set the path in the reference
parameter and set
auto_save=True
. If reference
is None, trained model won't be saved after training and will only be used to
produce contents in the current run
Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized
PARAMETER | DESCRIPTION |
---|---|
reference |
Path of the model to load/where the model trained will be saved if
TYPE:
|
auto_save |
If True, the model will be saved in the path specified in
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_learner/latent_semantic_analysis.py
33 34 |
|
GensimLDA(reference=None, auto_save=True, **kwargs)
Bases: GensimDocumentEmbeddingLearner
Class that implements Latent Dirichlet Allocation (LDA) thanks to the Gensim library.
If a pre-trained local Word2Vec model must be loaded, put its path in the reference
parameter.
Otherwise, a Word2Vec model will be trained from scratch based on the preprocessed corpus of the contents to complexly
represent
If you'd like to save the model once trained, set the path in the reference
parameter and set
auto_save=True
. If reference
is None, trained model won't be saved after training and will only be used to
produce contents in the current run
Additional parameters regarding the model itself could be passed, check gensim documentation to see what else can be customized
PARAMETER | DESCRIPTION |
---|---|
reference |
Path of the model to load/where the model trained will be saved if
TYPE:
|
auto_save |
If True, the model will be saved in the path specified in
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_learner/lda.py
33 34 |
|