Sentence Embeddings
Via the following, you can obtain embeddings of sentence granularity
from clayrs import content_analyzer as ca
# obtain sentence embeddings using pre-trained model 'glove-twitter-50'
# from SBERT library
ca.SentenceEmbeddingTechnique(embedding_source=ca.Sbert('paraphrase-distilroberta-base-v1'))
SentenceEmbeddingTechnique(embedding_source)
Bases: StandardEmbeddingTechnique
Class that makes use of a sentence granularity embedding source to produce sentence embeddings
PARAMETER | DESCRIPTION |
---|---|
embedding_source |
Any
TYPE:
|
Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
204 205 206 207 |
|
Sentence Embedding models
BertTransformers(model_name='bert-base-uncased', vec_strategy=CatStrategy(1), pooling_strategy=Centroid())
Bases: Transformers
Class that produces sentences/token embeddings using any Bert model from hugging face.
PARAMETER | DESCRIPTION |
---|---|
model_name |
Name of the embeddings model to download or path where the model is stored locally
TYPE:
|
vec_strategy |
Strategy which will be used to combine each output layer to obtain a single one
TYPE:
|
pooling_strategy |
Strategy which will be used to combine the embedding representation of each token into a single one, representing the embedding of the whole sentence
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_loader/transformer.py
83 84 85 86 |
|
Sbert(model_name_or_file_path='paraphrase-distilroberta-base-v1')
Bases: SentenceEmbeddingLoader
Class that produces sentences embeddings using sbert.
The model will be automatically downloaded if not present locally.
PARAMETER | DESCRIPTION |
---|---|
model_name_or_file_path |
name of the model to download or path where the model is stored locally
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_loader/sbert.py
19 20 |
|
T5Transformers(model_name='t5-small', vec_strategy=CatStrategy(1), pooling_strategy=Centroid())
Bases: Transformers
Class that produces sentences/token embeddings using sbert.
PARAMETER | DESCRIPTION |
---|---|
model_name |
Name of the embeddings model to download or path where the model is stored locally
TYPE:
|
vec_strategy |
Strategy which will be used to combine each output layer to obtain a single one
TYPE:
|
pooling_strategy |
Strategy which will be used to combine the embedding representation of each token into a single one, representing the embedding of the whole sentence
TYPE:
|
Source code in clayrs/content_analyzer/embeddings/embedding_loader/transformer.py
122 123 124 125 |
|