Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

CLDFBench: Give your cross-linguistic data a lift

R Forkel, JM List - 12th Conference on Language Resources and …, 2020 - pure.mpg.de
While the amount of cross-linguistic data is constantly increasing, most datasets produced
today and in the past cannot be considered FAIR (findable, accessible, interoperable, and …

[PDF][PDF] Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization

X Ma, F ** a cookbook for MT in crisis situations
W Lewis, R Munro, S Vogel - … of the Sixth Workshop on Statistical …, 2011 - aclanthology.org
In this paper, we propose that MT is an important technology in crisis events, something that
can and should be an integral part of a rapid-response infrastructure. By integrating MT …

Wav2Gloss: Generating interlinear glossed text from speech

T He, K Choi, L Tjuatja, NR Robinson, J Shi… - arxiv preprint arxiv …, 2024 - arxiv.org
Thousands of the world's languages are in danger of extinction--a tremendous threat to
cultural identities and human language diversity. Interlinear Glossed Text (IGT) is a form of …

IGT2P: From interlinear glossed texts to paradigms

S Moeller, L Liu, C Yang… - Proceedings of the …, 2020 - aclanthology.org
An intermediate step in the linguistic analysis of an under-documented language is to find
and organize inflected forms that are attested in natural speech. From this data, linguists …

Practical Natural Language Processing for Low-Resource Languages.

BP King - 2015 - deepblue.lib.umich.edu
As the Internet and World Wide Web have continued to gain widespread adoption, the
linguistic diversity represented has also been growing. Simultaneously the field of …

Linguistic typology in natural language processing

EM Bender - Linguistic Typology, 2016 - degruyter.com
This paper explores the ways in which the field of natural language processing (NLP) can
and does benefit from work in linguistic typology. I describe the recent increase in interest in …

Automating gloss generation in interlinear glossed text

A McMillan-Major - Society for Computation in …, 2020 - openpublishing.library.umass.edu
Abstract Interlinear Glossed Text (IGT) is a rich data type produced by linguists for the
purposes of presenting an analysis of a language\'s semantic and grammatical properties. I …

The Intercontinental Dictionary Series–a rich and principled database for language comparison

L Borin, B Comrie, A Saxena - Approaches to measuring linguistic …, 2013 - degruyter.com
The lexicon of a language is perhaps its most salient characteristic and the most obvious
expression of the connection that language bears to the world. The lexicon also reflects the …