Evolution of semantic similarity—a survey

D Chandrasekaran, V Mago - Acm Computing Surveys (Csur), 2021 - dl.acm.org
Estimating the semantic similarity between text data is one of the challenging and open
research problems in the field of Natural Language Processing (NLP). The versatility of …

C-pack: Packed resources for general chinese embeddings

S **ao, Z Liu, P Zhang, N Muennighoff, D Lian… - Proceedings of the 47th …, 2024 - dl.acm.org
We introduce C-Pack, a package of resources that significantly advances the field of general
text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a …

MTEB: Massive text embedding benchmark

N Muennighoff, N Tazi, L Magne, N Reimers - arxiv preprint arxiv …, 2022 - arxiv.org
Text embeddings are commonly evaluated on a small set of datasets from a single task not
covering their possible applications to other tasks. It is unclear whether state-of-the-art …

Angle-optimized text embeddings

X Li, J Li - arxiv preprint arxiv:2309.12871, 2023 - arxiv.org
High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks,
which are crucial components in Large Language Model (LLM) applications. However, a …

[HTML][HTML] The 2019 n2c2/ohnlp track on clinical semantic textual similarity: overview

Y Wang, S Fu, F Shen, S Henry… - JMIR medical …, 2020 - medinform.jmir.org
Background: Semantic textual similarity is a common task in the general English domain to
assess the degree to which the underlying semantics of 2 text segments are equivalent to …

Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models

J Ni, GH Abrego, N Constant, J Ma, KB Hall… - arxiv preprint arxiv …, 2021 - arxiv.org
We provide the first exploration of sentence embeddings from text-to-text transformers (T5).
Sentence embeddings are broadly useful for language processing tasks. While T5 achieves …

Simcse: Simple contrastive learning of sentence embeddings

T Gao, X Yao, D Chen - arxiv preprint arxiv:2104.08821, 2021 - arxiv.org
This paper presents SimCSE, a simple contrastive learning framework that greatly advances
state-of-the-art sentence embeddings. We first describe an unsupervised approach, which …

Consert: A contrastive framework for self-supervised sentence representation transfer

Y Yan, R Li, S Wang, F Zhang, W Wu, W Xu - arxiv preprint arxiv …, 2021 - arxiv.org
Learning high-quality sentence representations benefits a wide range of natural language
processing tasks. Though BERT-based pre-trained language models achieve high …

On the sentence embeddings from pre-trained language models

B Li, H Zhou, J He, M Wang, Y Yang, L Li - arxiv preprint arxiv:2011.05864, 2020 - arxiv.org
Pre-trained contextual representations like BERT have achieved great success in natural
language processing. However, the sentence embeddings from the pre-trained language …

Whitening sentence representations for better semantics and faster retrieval

J Su, J Cao, W Liu, Y Ou - arxiv preprint arxiv:2103.15316, 2021 - arxiv.org
Pre-training models such as BERT have achieved great success in many natural language
processing tasks. However, how to obtain better sentence representation through these pre …