Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019‏ - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Findings of the VarDial evaluation campaign 2017

M Zampieri, S Malmasi, N Ljubešić… - Proceedings of the …, 2017‏ - aclanthology.org
We present the results of the VarDial Evaluation Campaign on Natural Language
Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part …

Discriminating between similar languages and arabic dialect identification: A report on the third dsl shared task

S Malmasi, M Zampieri, N Ljubešić… - Proceedings of the …, 2016‏ - aclanthology.org
We present the results of the third edition of the Discriminating between Similar Languages
(DSL) shared task, which was organized as part of the VarDial'2016 workshop at …

Identifying depression on reddit: The effect of training data

I Pirina, Ç Çöltekin - Proceedings of the 2018 EMNLP workshop …, 2018‏ - aclanthology.org
This paper presents a set of classification experiments for identifying depression in posts
gathered from social media platforms. In addition to the data gathered previously by other …

Enhancing Portuguese Variety Identification with Cross-Domain Approaches

H Sousa, R Almeida, P Silvano, I Cantante… - arxiv preprint arxiv …, 2025‏ - arxiv.org
Recent advances in natural language processing have raised expectations for generative
models to produce coherent text across diverse language varieties. In the particular case of …

From language identification to language distance

P Gamallo, JR Pichel, I Alegria - Physica A: Statistical Mechanics and its …, 2017‏ - Elsevier
In this paper, we define two quantitative distances to measure how far apart two languages
are. The distance measure that we have identified as more accurate is based on the …

Tübingen-oslo at SemEval-2018 task 2: SVMs perform better than RNNs in emoji prediction

Ç Çöltekin, T Rama - … of the 12th international workshop on …, 2018‏ - aclanthology.org
This paper describes our participation in the SemEval-2018 task Multilingual Emoji
Prediction. We participated in both English and Spanish subtasks, experimenting with …

An electrocardiographic system with anthropometrics via machine learning to screen left ventricular hypertrophy among young adults

GM Lin, K Liu - IEEE Journal of Translational Engineering in …, 2020‏ - ieeexplore.ieee.org
The prevalence of physiological and pathological left ventricular hypertrophy (LVH) among
young adults is about 5%. A use of electrocardiographic (ECG) voltage criteria and machine …

Experiments with universal CEFR classification

S Vajjala, T Rama - arxiv preprint arxiv:1804.06636, 2018‏ - arxiv.org
The Common European Framework of Reference (CEFR) guidelines describe language
proficiency of learners on a scale of 6 levels. While the description of CEFR guidelines is …

When sparse traditional models outperform dense neural networks: the curious case of discriminating between similar languages

M Medvedeva, M Kroon, B Plank - … of the Fourth Workshop on NLP …, 2017‏ - aclanthology.org
We present the results of our participation in the VarDial 4 shared task on discriminating
closely related languages. Our submission includes simple traditional models using linear …