Contextualized Embeddings
Via the following, you can obtain embeddings of finer granularity from models which are able to return also embeddings of coarser granularity (e.g. obtain word embeddings from a model which is also able to return sentence embeddings).
For now only models working at sentence and token level are implemented
from clayrs import content_analyzer as ca
# obtain sentence embeddings combining token embeddings with a
# centroid technique
ca.Sentence2WordEmbedding(embedding_source=ca.BertTransformers('bert-base-uncased'))
Sentence2WordEmbedding(embedding_source)
Bases: DecombiningInWordsEmbeddingTechnique
Class that makes use of a sentence granularity embedding source to produce an embedding matrix with word granularity
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
470 471 472 473 |
|
produce_single_repr(field_data)
Produces a single matrix where each row is the embedding representation of each token of the sentence, while the columns are the hidden dimension of the chosen model
PARAMETER | DESCRIPTION |
---|---|
field_data |
textual data to complexly represent |
RETURNS | DESCRIPTION |
---|---|
EmbeddingField
|
Embedding for each token of the sentence |
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 |
|
Model able to return sentence and token embeddings
BertTransformers(model_name='bert-base-uncased', vec_strategy=CatStrategy(1), pooling_strategy=Centroid())
Bases: Transformers
Class that produces sentences/token embeddings using any Bert model from hugging face.
PARAMETER | DESCRIPTION |
---|---|
model_name |
Name of the embeddings model to download or path where the model is stored locally
TYPE:
|
vec_strategy |
Strategy which will be used to combine each output layer to obtain a single one
TYPE:
|
pooling_strategy |
Strategy which will be used to combine the embedding representation of each token into a single one, representing the embedding of the whole sentence
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_loader/transformer.py
83 84 85 86 |
|
T5Transformers(model_name='t5-small', vec_strategy=CatStrategy(1), pooling_strategy=Centroid())
Bases: Transformers
Class that produces sentences/token embeddings using sbert.
PARAMETER | DESCRIPTION |
---|---|
model_name |
Name of the embeddings model to download or path where the model is stored locally
TYPE:
|
vec_strategy |
Strategy which will be used to combine each output layer to obtain a single one
TYPE:
|
pooling_strategy |
Strategy which will be used to combine the embedding representation of each token into a single one, representing the embedding of the whole sentence
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_loader/transformer.py
122 123 124 125 |
|