Combine Embeddings
Via the following, you can obtain embeddings of coarser granularity from models which return embeddings of finer granularity (e.g. obtain sentence embeddings from a model which returns word embeddings)
from clayrs import content_analyzer as ca
# obtain sentence embeddings combining token embeddings with a
# centroid technique
ca.Word2SentenceEmbedding(embedding_source=ca.Gensim('glove-twitter-50'),
combining_technique=ca.Centroid())
Word2SentenceEmbedding(embedding_source, combining_technique)
Bases: CombiningSentenceEmbeddingTechnique
Class that makes use of a word granularity embedding source to produce sentence embeddings
| PARAMETER | DESCRIPTION |
|---|---|
embedding_source |
Any |
combining_technique |
Technique used to combine embeddings of finer granularity (word-level) to obtain embeddings of coarser granularity (sentence-level)
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
324 325 326 327 328 | |
Word2DocEmbedding(embedding_source, combining_technique)
Bases: CombiningDocumentEmbeddingTechnique
Class that makes use of a word granularity embedding source to produce embeddings of document granularity
| PARAMETER | DESCRIPTION |
|---|---|
embedding_source |
Any |
combining_technique |
Technique used to combine embeddings of finer granularity (word-level) to obtain embeddings of coarser granularity (doc-level)
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
376 377 378 379 380 | |
Sentence2DocEmbedding(embedding_source, combining_technique)
Bases: CombiningDocumentEmbeddingTechnique
Class that makes use of a sentence granularity embedding source to produce embeddings of document granularity
| PARAMETER | DESCRIPTION |
|---|---|
embedding_source |
Any
TYPE:
|
combining_technique |
Technique used to combine embeddings of finer granularity (sentence-level) to obtain embeddings of coarser granularity (doc-level)
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
404 405 406 407 408 | |
Combining Techniques
Centroid
Bases: CombiningTechnique
This class computes the centroid vector of a matrix.
combine(embedding_matrix)
Calculates the centroid of the input matrix
| PARAMETER | DESCRIPTION |
|---|---|
embedding_matrix |
np bi-dimensional array where rows are words columns are hidden dimension whose centroid will be calculated |
| RETURNS | DESCRIPTION |
|---|---|
np.ndarray
|
Centroid vector of the input matrix |
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/combining_technique.py
34 35 36 37 38 39 40 41 42 43 44 45 | |
Sum
Bases: CombiningTechnique
This class computes the sum vector of a matrix.
combine(embedding_matrix)
Calculates the sum vector of the input matrix
| PARAMETER | DESCRIPTION |
|---|---|
embedding_matrix |
np bi-dimensional array where rows are words columns are hidden dimension whose sum vector will be calculated |
| RETURNS | DESCRIPTION |
|---|---|
np.ndarray
|
Sum vector of the input matrix |
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/combining_technique.py
58 59 60 61 62 63 64 65 66 67 68 69 | |
SingleToken(token_index)
Bases: CombiningTechnique
Class which takes a specific row as representative of the whole matrix
| PARAMETER | DESCRIPTION |
|---|---|
token_index |
index of the row of the matrix to take
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/combining_technique.py
85 86 87 | |
combine(embedding_matrix)
Takes the row with index token_index (set in the constructor) from the input embedding_matrix
| PARAMETER | DESCRIPTION |
|---|---|
embedding_matrix |
np bi-dimensional array where rows are words columns are hidden dimension from where the single token will be extracted |
| RETURNS | DESCRIPTION |
|---|---|
np.ndarray
|
Single row as representative of the whole matrix |
| RAISES | DESCRIPTION |
|---|---|
IndexError
|
Exception raised when |
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/combining_technique.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |