Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Four approaches to low-resource multilingual NMT: The Helsinki submission to the AmericasNLP 2023 shared task
The Helsinki-NLP team participated in the AmericasNLP 2023 Shared Task with 6
submissions for all 11 language pairs arising from 4 different multilingual systems. We …
submissions for all 11 language pairs arising from 4 different multilingual systems. We …
FastSpell: the LangId Magic Spell
Language identification is a crucial component in the automated production of language
resources, particularly in multilingual and big data contexts. However, commonly used …
resources, particularly in multilingual and big data contexts. However, commonly used …
LIMIT: Language identification, misidentification, and translation using hierarchical models in 350+ languages
Knowing the language of an input text/audio is a necessary first step for using almost every
NLP tool such as taggers, parsers, or translation systems. Language identification is a well …
NLP tool such as taggers, parsers, or translation systems. Language identification is a well …
Geographically-informed language identification
This paper develops an approach to language identification in which the set of languages
considered by the model depends on the geographic origin of the text in question. Given that …
considered by the model depends on the geographic origin of the text in question. Given that …
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
The need for large text corpora has increased with the advent of pretrained language
models and, in particular, the discovery of scaling laws for these models. Most available …
models and, in particular, the discovery of scaling laws for these models. Most available …
Transliteration Model for Egyptian Words
In this paper, we describe token-based transliteration models for Egyptian words. We
explain how we created them using an automatic alignment method we devised based on …
explain how we created them using an automatic alignment method we devised based on …
[PDF][PDF] Tuning heli-ots for guarani-spanish code switching analysis
This article describes a system created for the first subtask of the GUA-SPA-Guarani-
Spanish Code Switching Analysis shared task held as part of the IberLEF 2023 evaluation …
Spanish Code Switching Analysis shared task held as part of the IberLEF 2023 evaluation …
Script-Agnostic Language Identification
Language identification is used as the first step in many data collection and crawling efforts
because it allows us to sort online text into language-specific buckets. However, many …
because it allows us to sort online text into language-specific buckets. However, many …
Multi-label Scandinavian Language Identification (SLIDE)
M Fedorova, JS Frydenberg, V Handford… - arxiv preprint arxiv …, 2025 - arxiv.org
Identifying closely related languages at sentence level is difficult, in particular because it is
often impossible to assign a sentence to a single language. In this paper, we focus on multi …
often impossible to assign a sentence to a single language. In this paper, we focus on multi …
Murre24: Dialect Identification of Finnish Internet Forum Messages
This paper presents Murre24, a collection of dialectal messages posted on the largest
Finnish internet forum, Suomi24. The messages posted in Finnish on the forum between …
Finnish internet forum, Suomi24. The messages posted in Finnish on the forum between …