Text preprocessing for text mining in organizational research: Review and recommendations

L Hickman, S Thapa, L Tay, M Cao… - Organizational …, 2022 - journals.sagepub.com
Recent advances in text mining have provided new methods for capitalizing on the
voluminous natural language text data created by organizations, their employees, and their …

[PDF][PDF] Word translation without parallel data

G Lample, A Conneau, MA Ranzato… - International …, 2018 - openreview.net
State-of-the-art methods for learning cross-lingual word embeddings have relied on
bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel …

Adversarial training for unsupervised bilingual lexicon induction

M Zhang, Y Liu, H Luan, M Sun - … of the 55th Annual Meeting of …, 2017 - aclanthology.org
Word embeddings are well known to capture linguistic regularities of the language on which
they are trained. Researchers also observe that these regularities can transfer across …

Massively multilingual transfer for NER

A Rahimi, Y Li, T Cohn - arxiv preprint arxiv:1902.00193, 2019 - arxiv.org
In cross-lingual transfer, NLP models over one or more source languages are applied to a
low-resource target language. While most prior work has used a single source model or a …

Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints

N Mrkšić, I Vulić, DÓ Séaghdha, I Leviant… - Transactions of the …, 2017 - direct.mit.edu
Abstract We present Attract-Repel, an algorithm for improving the semantic quality of word
vectors by injecting constraints extracted from lexical resources. Attract-Repel facilitates the …

WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

B Minixhofer, F Paischer, N Rekabsaz - arxiv preprint arxiv:2112.06598, 2021 - arxiv.org
Large pretrained language models (LMs) have become the central building block of many
NLP applications. Training these models requires ever more computational resources and …

Expanding pretrained models to thousands more languages via lexicon-based adaptation

X Wang, S Ruder, G Neubig - arxiv preprint arxiv:2203.09435, 2022 - arxiv.org
The performance of multilingual pretrained models is highly dependent on the availability of
monolingual or parallel text present in a target language. Thus, the majority of the world's …

Modeling language variation and universals: A survey on typological linguistics for natural language processing

EM Ponti, H O'horan, Y Berzak, I Vulić… - Computational …, 2019 - direct.mit.edu
Linguistic typology aims to capture structural and semantic variation across the world's
languages. A large-scale typology could provide excellent guidance for multilingual Natural …