Skip to content

Sentence Embeddings

Via the following, you can obtain embeddings of sentence granularity

from clayrs import content_analyzer as ca

# obtain sentence embeddings using pre-trained model 'glove-twitter-50'
# from SBERT library
ca.SentenceEmbeddingTechnique(embedding_source=ca.Sbert('paraphrase-distilroberta-base-v1'))

SentenceEmbeddingTechnique(embedding_source)

Bases: StandardEmbeddingTechnique

Class that makes use of a sentence granularity embedding source to produce sentence embeddings

PARAMETER DESCRIPTION
embedding_source

Any SentenceEmbedding model

TYPE: Union[SentenceEmbeddingLoader, SentenceEmbeddingLearner, str]

Source code in clayrs/content_analyzer/field_content_production_techniques/embedding_technique/embedding_technique.py
204
205
206
207
def __init__(self, embedding_source: Union[SentenceEmbeddingLoader, SentenceEmbeddingLearner, str]):
    # if isinstance(embedding_source, str):
    #     embedding_source = self.from_str_to_embedding_source(embedding_source, SentenceEmbeddingLoader)
    super().__init__(embedding_source)

Sentence Embedding models

BertTransformers(model_name='bert-base-uncased', vec_strategy=CatStrategy(1), pooling_strategy=Centroid())

Bases: Transformers

Class that produces sentences/token embeddings using any Bert model from hugging face.

PARAMETER DESCRIPTION
model_name

Name of the embeddings model to download or path where the model is stored locally

TYPE: str DEFAULT: 'bert-base-uncased'

vec_strategy

Strategy which will be used to combine each output layer to obtain a single one

TYPE: VectorStrategy DEFAULT: CatStrategy(1)

pooling_strategy

Strategy which will be used to combine the embedding representation of each token into a single one, representing the embedding of the whole sentence

TYPE: CombiningTechnique DEFAULT: Centroid()

Source code in clayrs/content_analyzer/embeddings/embedding_loader/transformer.py
83
84
85
86
def __init__(self, model_name: str = 'bert-base-uncased',
             vec_strategy: VectorStrategy = CatStrategy(1),
             pooling_strategy: CombiningTechnique = Centroid()):
    super().__init__(model_name, vec_strategy, pooling_strategy)

Sbert(model_name_or_file_path='paraphrase-distilroberta-base-v1')

Bases: SentenceEmbeddingLoader

Class that produces sentences embeddings using sbert.

The model will be automatically downloaded if not present locally.

PARAMETER DESCRIPTION
model_name_or_file_path

name of the model to download or path where the model is stored locally

TYPE: str DEFAULT: 'paraphrase-distilroberta-base-v1'

Source code in clayrs/content_analyzer/embeddings/embedding_loader/sbert.py
19
20
def __init__(self, model_name_or_file_path: str = 'paraphrase-distilroberta-base-v1'):
    super().__init__(model_name_or_file_path)

T5Transformers(model_name='t5-small', vec_strategy=CatStrategy(1), pooling_strategy=Centroid())

Bases: Transformers

Class that produces sentences/token embeddings using sbert.

PARAMETER DESCRIPTION
model_name

Name of the embeddings model to download or path where the model is stored locally

TYPE: str DEFAULT: 't5-small'

vec_strategy

Strategy which will be used to combine each output layer to obtain a single one

TYPE: VectorStrategy DEFAULT: CatStrategy(1)

pooling_strategy

Strategy which will be used to combine the embedding representation of each token into a single one, representing the embedding of the whole sentence

TYPE: CombiningTechnique DEFAULT: Centroid()

Source code in clayrs/content_analyzer/embeddings/embedding_loader/transformer.py
122
123
124
125
def __init__(self, model_name: str = 't5-small',
             vec_strategy: VectorStrategy = CatStrategy(1),
             pooling_strategy: CombiningTechnique = Centroid()):
    super().__init__(model_name, vec_strategy, pooling_strategy)