Combine Embeddings
Via the following, you can obtain embeddings of coarser granularity from models which return embeddings of finer granularity (e.g. obtain sentence embeddings from a model which returns word embeddings)
from clayrs import content_analyzer as ca
# obtain sentence embeddings combining token embeddings with a
# centroid technique
ca.Word2SentenceEmbedding(embedding_source=ca.Gensim('glove-twitter-50'),
combining_technique=ca.Centroid())
Word2SentenceEmbedding(embedding_source, combining_technique)
Bases: CombiningSentenceEmbeddingTechnique
Class that makes use of a word granularity embedding source to produce sentence embeddings
PARAMETER | DESCRIPTION |
---|---|
embedding_source |
Any |
combining_technique |
Technique used to combine embeddings of finer granularity (word-level) to obtain embeddings of coarser granularity (sentence-level)
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
324 325 326 327 328 |
|
Word2DocEmbedding(embedding_source, combining_technique)
Bases: CombiningDocumentEmbeddingTechnique
Class that makes use of a word granularity embedding source to produce embeddings of document granularity
PARAMETER | DESCRIPTION |
---|---|
embedding_source |
Any |
combining_technique |
Technique used to combine embeddings of finer granularity (word-level) to obtain embeddings of coarser granularity (doc-level)
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
376 377 378 379 380 |
|
Sentence2DocEmbedding(embedding_source, combining_technique)
Bases: CombiningDocumentEmbeddingTechnique
Class that makes use of a sentence granularity embedding source to produce embeddings of document granularity
PARAMETER | DESCRIPTION |
---|---|
embedding_source |
Any
TYPE:
|
combining_technique |
Technique used to combine embeddings of finer granularity (sentence-level) to obtain embeddings of coarser granularity (doc-level)
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
404 405 406 407 408 |
|
Combining Techniques
Centroid
Bases: CombiningTechnique
This class computes the centroid vector of a matrix.
combine(embedding_matrix)
Calculates the centroid of the input matrix
PARAMETER | DESCRIPTION |
---|---|
embedding_matrix |
np bi-dimensional array where rows are words columns are hidden dimension whose centroid will be calculated |
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
Centroid vector of the input matrix |
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/combining_technique.py
34 35 36 37 38 39 40 41 42 43 44 45 |
|
Sum
Bases: CombiningTechnique
This class computes the sum vector of a matrix.
combine(embedding_matrix)
Calculates the sum vector of the input matrix
PARAMETER | DESCRIPTION |
---|---|
embedding_matrix |
np bi-dimensional array where rows are words columns are hidden dimension whose sum vector will be calculated |
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
Sum vector of the input matrix |
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/combining_technique.py
58 59 60 61 62 63 64 65 66 67 68 69 |
|
SingleToken(token_index)
Bases: CombiningTechnique
Class which takes a specific row as representative of the whole matrix
PARAMETER | DESCRIPTION |
---|---|
token_index |
index of the row of the matrix to take
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/combining_technique.py
85 86 87 |
|
combine(embedding_matrix)
Takes the row with index token_index
(set in the constructor) from the input embedding_matrix
PARAMETER | DESCRIPTION |
---|---|
embedding_matrix |
np bi-dimensional array where rows are words columns are hidden dimension from where the single token will be extracted |
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
Single row as representative of the whole matrix |
RAISES | DESCRIPTION |
---|---|
IndexError
|
Exception raised when |
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/combining_technique.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|