- Academic Search

R Navigli - ACM computing surveys (CSUR), 2009 - dl.acm.org

Word sense disambiguation (WSD) is the ability to identify the meaning of words in context
in a computational manner. WSD is considered an AI-complete problem, that is, a task …

Speichern Zitieren Zitiert von: 3117 Ähnliche Artikel Alle 23 Versionen

[Free GPT-4]

[PDF] academia.edu

Statistical machine translation

A Lopez - ACM Computing Surveys (CSUR), 2008 - dl.acm.org

Statistical machine translation (SMT) treats the translation of natural language as a machine
learning problem. By examining many samples of human-produced translation, SMT …

Speichern Zitieren Zitiert von: 750 Ähnliche Artikel Alle 10 Versionen

[Free GPT-4]

[PDF] mpg.de

[BUCH][B] Pretrained transformers for text ranking: Bert and beyond

J Lin, R Nogueira, A Yates - 2022 - books.google.com

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in
response to a query. Although the most common formulation of text ranking is search …

Speichern Zitieren Zitiert von: 529 Ähnliche Artikel Alle 11 Versionen Bibliothekssuche

[Free GPT-4]

[PDF] arxiv.org

Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia

H Schwenk, V Chaudhary, S Sun, H Gong… - arxiv preprint arxiv …, 2019 - arxiv.org

We present an approach based on multilingual sentence embeddings to automatically
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …

Speichern Zitieren Zitiert von: 366 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] strath.ac.uk

ParaCrawl: Web-scale acquisition of parallel corpora

M Bañón, P Chen, B Haddow, K Heafield, H Hoang… - 2020 - strathprints.strath.ac.uk

We report on methods to create the largest publicly available parallel corpora by crawling
the web, using open source software. We empirically compare alternative methods and …

Speichern Zitieren Zitiert von: 274 Ähnliche Artikel Alle 17 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

CCMatrix: Mining billions of high-quality parallel sentences on the web

H Schwenk, G Wenzek, S Edunov, E Grave… - arxiv preprint arxiv …, 2019 - arxiv.org

We show that margin-based bitext mining in a multilingual sentence space can be applied to
monolingual corpora of billions of sentences. We are using ten snapshots of a curated …

Speichern Zitieren Zitiert von: 241 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] usc.edu.ph

Revolt: Collaborative crowdsourcing for labeling machine learning datasets

JC Chang, S Amershi, E Kamar - … of the 2017 CHI conference on human …, 2017 - dl.acm.org

Crowdsourcing provides a scalable and efficient way to construct labeled datasets for
training machine learning systems. However, creating comprehensive label guidelines for …

Speichern Zitieren Zitiert von: 342 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] arxiv.org

CCAligned: A massive collection of cross-lingual web-document pairs

A El-Kishky, V Chaudhary, F Guzmán… - arxiv preprint arxiv …, 2019 - arxiv.org

Cross-lingual document alignment aims to identify pairs of documents in two distinct
languages that are of comparable content or translations of each other. In this paper, we …

Speichern Zitieren Zitiert von: 184 Ähnliche Artikel Alle 9 Versionen HTML-Version

[BUCH][B] Translation-driven corpora: Corpus resources for descriptive and applied translation studies

F Zanettin - 2014 - taylorfrancis.com

Electronic texts and text analysis tools have opened up a wealth of opportunities to higher
education and language service providers, but learning to use these resources continues to …

Speichern Zitieren Zitiert von: 465 Ähnliche Artikel Alle 5 Versionen Bibliothekssuche HTML-Version

[BUCH][B] Handbook of natural language processing

N Indurkhya, FJ Damerau - 2010 - taylorfrancis.com

The Handbook of Natural Language Processing, Second Edition presents practical tools
and techniques for implementing natural language processing in computer systems. Along …

Speichern Zitieren Zitiert von: 1031 Ähnliche Artikel Alle 5 Versionen Bibliothekssuche HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

The web as a parallel corpus

Word sense disambiguation: A survey

Statistical machine translation

[BUCH][B] Pretrained transformers for text ranking: Bert and beyond

Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia

ParaCrawl: Web-scale acquisition of parallel corpora

CCMatrix: Mining billions of high-quality parallel sentences on the web

Revolt: Collaborative crowdsourcing for labeling machine learning datasets

CCAligned: A massive collection of cross-lingual web-document pairs

[BUCH][B] Translation-driven corpora: Corpus resources for descriptive and applied translation studies

[BUCH][B] Handbook of natural language processing