- Academic Search

N Muennighoff, N Tazi, L Magne, N Reimers - arxiv preprint arxiv …, 2022 - arxiv.org

Text embeddings are commonly evaluated on a small set of datasets from a single task not
covering their possible applications to other tasks. It is unclear whether state-of-the-art …

Opslaan Citeren Geciteerd door 611 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Language-agnostic BERT sentence embedding

F Feng, Y Yang, D Cer, N Arivazhagan… - arxiv preprint arxiv …, 2020 - arxiv.org

While BERT is an effective method for learning monolingual sentence embeddings for
semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019) …

Opslaan Citeren Geciteerd door 993 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation

J Hu, S Ruder, A Siddhant, G Neubig… - International …, 2020 - proceedings.mlr.press

Much recent progress in applications of machine learning models to NLP has been driven
by benchmarks that evaluate models across a wide variety of tasks. However, these broad …

Opslaan Citeren Geciteerd door 957 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond

M Artetxe, H Schwenk - … of the association for computational linguistics, 2019 - direct.mit.edu

We introduce an architecture to learn joint multilingual sentence representations for 93
languages, belonging to more than 30 different families and written in 28 different scripts …

Opslaan Citeren Geciteerd door 1147 Verwante artikelen Alle 9 versies

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Pre-training via paraphrasing

M Lewis, M Ghazvininejad, G Ghosh… - Advances in …, 2020 - proceedings.neurips.cc

We introduce MARGE, a pre-trained sequence-to-sequence model learned with an
unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an …

Opslaan Citeren Geciteerd door 171 Verwante artikelen Alle 8 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CCMatrix: Mining billions of high-quality parallel sentences on the web

H Schwenk, G Wenzek, S Edunov, E Grave… - arxiv preprint arxiv …, 2019 - arxiv.org

We show that margin-based bitext mining in a multilingual sentence space can be applied to
monolingual corpora of billions of sentences. We are using ten snapshots of a curated …

Opslaan Citeren Geciteerd door 241 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rethinking embedding coupling in pre-trained language models

HW Chung, T Fevry, H Tsai, M Johnson… - arxiv preprint arxiv …, 2020 - arxiv.org

We re-evaluate the standard practice of sharing weights between input and output
embeddings in state-of-the-art pre-trained language models. We show that decoupled …

Opslaan Citeren Geciteerd door 151 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Margin-based parallel corpus mining with multilingual sentence embeddings

M Artetxe, H Schwenk - arxiv preprint arxiv:1811.01136, 2018 - arxiv.org

Machine translation is highly sensitive to the size and quality of the training data, which has
led to an increasing interest in collecting and filtering large parallel corpora. In this paper, we …

Opslaan Citeren Geciteerd door 228 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

X-FACTR: Multilingual factual knowledge retrieval from pretrained language models

Z Jiang, A Anastasopoulos, J Araki… - Proceedings of the …, 2020 - aclanthology.org

Abstract Language models (LMs) have proven surprisingly successful at capturing factual
knowledge by completing cloze-style fill-in-the-blank questions such as “Punta Cana is …

Opslaan Citeren Geciteerd door 131 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A primer on pretrained multilingual language models

S Doddapaneni, G Ramesh, MM Khapra… - arxiv preprint arxiv …, 2021 - arxiv.org

Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R,\textit {etc.} have
emerged as a viable option for bringing the power of pretraining to a large number of …

Opslaan Citeren Geciteerd door 81 Verwante artikelen Alle 2 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Overview of the second BUCC shared task: Spotting parallel sentences in comparable corpora

MTEB: Massive text embedding benchmark

Language-agnostic BERT sentence embedding

Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation

Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond

Pre-training via paraphrasing

CCMatrix: Mining billions of high-quality parallel sentences on the web

Rethinking embedding coupling in pre-trained language models

Margin-based parallel corpus mining with multilingual sentence embeddings

X-FACTR: Multilingual factual knowledge retrieval from pretrained language models

A primer on pretrained multilingual language models