Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Survey of low-resource machine translation
We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …
research. There are currently around 7,000 languages spoken in the world and almost all …
Cross-lingual name tagging and linking for 282 languages
The ambitious goal of this work is to develop a cross-lingual name tagging and linking
framework for 282 languages that exist in Wikipedia. Given a document in any of these …
framework for 282 languages that exist in Wikipedia. Given a document in any of these …
Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …
can be analyzed and generated at many granularities. Until recently, most natural language …
Languages through the looking glass of BPE compression
Byte-pair encoding (BPE) is widely used in NLP for performing subword tokenization. It
uncovers redundant patterns for compressing the data, and hence alleviates the sparsity …
uncovers redundant patterns for compressing the data, and hence alleviates the sparsity …
The SIGMORPHON 2022 shared task on morpheme segmentation
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to
decompose a word into a sequence of morphemes and covered most types of morphology …
decompose a word into a sequence of morphemes and covered most types of morphology …
[PDF][PDF] Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English.
The necessity of using a fixed-size word vocabulary in order to control the model complexity
in state-of-the-art neural machine translation (NMT) systems is an important bottleneck on …
in state-of-the-art neural machine translation (NMT) systems is an important bottleneck on …
Unity and disunity in evolutionary sciences: process-based analogies open common research avenues for biology and linguistics
Background For a long time biologists and linguists have been noticing surprising
similarities between the evolution of life forms and languages. Most of the proposed …
similarities between the evolution of life forms and languages. Most of the proposed …
Fortification of neural morphological segmentation models for polysynthetic minimal-resource languages
Morphological segmentation for polysynthetic languages is challenging, because a word
may consist of many individual morphemes and training data can be extremely scarce …
may consist of many individual morphemes and training data can be extremely scarce …
BPE vs. morphological segmentation: A case study on machine translation of four polysynthetic languages
Morphologically-rich polysynthetic languages present a challenge for NLP systems due to
data sparsity, and a common strategy to handle this issue is to apply subword segmentation …
data sparsity, and a common strategy to handle this issue is to apply subword segmentation …
A corpus investigation of syntactic embedding in Pirahã
The Pirahã language has been at the center of recent debates in linguistics, in large part
because it is claimed not to exhibit recursion, a purported universal of human language …
because it is claimed not to exhibit recursion, a purported universal of human language …