الباحث العلمي من Google

KS Kalyan, A Rajasekharan, S Sangeetha - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …‏

حفظ اقتباس تم اقتباسها في عدد: 363 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]

[PDF] mit.edu

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation‏

N Goyal, C Gao, V Chaudhary, PJ Chen… - Transactions of the …, 2022‏ - direct.mit.edu‏

One of the biggest challenges hindering progress in low-resource and multilingual machine
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …‏

حفظ اقتباس تم اقتباسها في عدد: 462 مقالات ذات صلة الإصدارات الـ 9كلها

[Free GPT-4]

[PDF] jmlr.org

Beyond english-centric multilingual machine translation‏

A Fan, S Bhosale, H Schwenk, Z Ma, A El-Kishky… - Journal of Machine …, 2021‏ - jmlr.org‏

Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …‏

حفظ اقتباس تم اقتباسها في عدد: 856 مقالات ذات صلة الإصدارات الـ 9كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Documenting large webtext corpora: A case study on the colossal clean crawled corpus‏

J Dodge, M Sap, A Marasović, W Agnew… - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

Large language models have led to remarkable progress on many NLP tasks, and
researchers are turning to ever-larger text corpora to train them. Some of the largest corpora …‏

حفظ اقتباس تم اقتباسها في عدد: 483 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]

[PDF] ieee.org

Deepnet: Scaling transformers to 1,000 layers‏

H Wang, S Ma, L Dong, S Huang… - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org‏

In this paper, we propose a simple yet effective method to stabilize extremely deep
Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify …‏

حفظ اقتباس تم اقتباسها في عدد: 173 مقالات ذات صلة الإصدارات الـ 4كلها

[Free GPT-4]

[PDF] fbk.eu

Findings of the 2021 conference on machine translation (WMT21)‏

F Akhbardeh, A Arkhangorodsky, M Biesialska… - Proceedings of the sixth …, 2021‏ - cris.fbk.eu‏

This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …‏

حفظ اقتباس تم اقتباسها في عدد: 196 مقالات ذات صلة الإصدارات الـ 19كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia‏

H Schwenk, V Chaudhary, S Sun, H Gong… - arxiv preprint arxiv …, 2019‏ - arxiv.org‏

We present an approach based on multilingual sentence embeddings to automatically
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …‏

حفظ اقتباس تم اقتباسها في عدد: 367 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]

[PDF] mit.edu

Language varieties of Italy: Technology challenges and opportunities‏

A Ramponi - Transactions of the Association for Computational …, 2024‏ - direct.mit.edu‏

Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which
implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its …‏

حفظ اقتباس تم اقتباسها في عدد: 13 مقالات ذات صلة الإصدارات الـ 6كلها

[Free GPT-4]

[PDF] arxiv.org

Pangu-{\Sigma}: Towards trillion parameter language model with sparse heterogeneous computing‏

X Ren, P Zhou, X Meng, X Huang, Y Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

The scaling of large language models has greatly improved natural language
understanding, generation, and reasoning. In this work, we develop a system that trained a …‏

حفظ اقتباس تم اقتباسها في عدد: 78 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]

[PDF] mit.edu

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages‏

G Ramesh, S Doddapaneni, A Bheemaraj… - Transactions of the …, 2022‏ - direct.mit.edu‏

We present Samanantar, the largest publicly available parallel corpora collection for Indic
languages. The collection contains a total of 49.7 million sentence pairs between English …‏

حفظ اقتباس تم اقتباسها في عدد: 130 مقالات ذات صلة الإصدارات الـ 11كلها

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

CCAligned: A massive collection of cross-lingual web-document pairs

Ammus: A survey of transformer-based pretrained models in natural language processing‏

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation‏

Beyond english-centric multilingual machine translation‏

Documenting large webtext corpora: A case study on the colossal clean crawled corpus‏

Deepnet: Scaling transformers to 1,000 layers‏

Findings of the 2021 conference on machine translation (WMT21)‏

Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia‏

Language varieties of Italy: Technology challenges and opportunities‏

Pangu-{\Sigma}: Towards trillion parameter language model with sparse heterogeneous computing‏

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages‏