Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

I Vulić, W De Smet, J Tang, MF Moens - Information Processing & …, 2015 - Elsevier
Probabilistic topic models are unsupervised generative models which model document
content as a two-step generation process, that is, documents are observed as mixtures of …

A survey of cross-lingual word embedding models

S Ruder, I Vulić, A Søgaard - Journal of Artificial Intelligence Research, 2019 - jair.org
Cross-lingual representations of words enable us to reason about word meaning in
multilingual contexts and are a key facilitator of cross-lingual transfer when develo** …

Adversarial training for unsupervised bilingual lexicon induction

M Zhang, Y Liu, H Luan, M Sun - … of the 55th Annual Meeting of …, 2017 - aclanthology.org
Word embeddings are well known to capture linguistic regularities of the language on which
they are trained. Researchers also observe that these regularities can transfer across …

Earth mover's distance minimization for unsupervised bilingual lexicon induction

M Zhang, Y Liu, H Luan, M Sun - Proceedings of the 2017 …, 2017 - aclanthology.org
Cross-lingual natural language processing hinges on the premise that there exists
invariance across languages. At the word level, researchers have identified such invariance …

Improving machine translation performance by exploiting non-parallel corpora

DS Munteanu, D Marcu - Computational Linguistics, 2005 - direct.mit.edu
We present a novel method for discovering parallel sentences in comparable, non-parallel
corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably …

[PDF][PDF] Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction

I Vulic, MF Moens - Proceedings of the 53rd Annual Meeting of …, 2015 - lirias.kuleuven.be
We propose a simple yet effective approach to learning bilingual word embeddings (BWEs)
from non-parallel document-aligned data (based on the omnipresent skip-gram model), and …

Improving word translation via two-stage contrastive learning

Y Li, F Liu, N Collier, A Korhonen, I Vulić - arxiv preprint arxiv:2203.08307, 2022 - arxiv.org
Word translation or bilingual lexicon induction (BLI) is a key cross-lingual task, aiming to
bridge the lexical gap between different languages. In this work, we propose a robust and …

[PDF][PDF] Extracting parallel sub-sentential fragments from non-parallel corpora

DS Munteanu, D Marcu - … of the 21st international conference on …, 2006 - aclanthology.org
We present a novel method for extracting parallel sub-sentential fragments from
comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs …

Bilingual distributed word representations from document-aligned comparable data

I Vulić, MF Moens - Journal of Artificial Intelligence Research, 2016 - jair.org
We propose a new model for learning bilingual word representations from nonparallel
document-aligned data. Following the recent advances in word representation learning, our …

On bilingual lexicon induction with large language models

Y Li, A Korhonen, I Vulić - arxiv preprint arxiv:2310.13995, 2023 - arxiv.org
Bilingual Lexicon Induction (BLI) is a core task in multilingual NLP that still, to a large extent,
relies on calculating cross-lingual word representations. Inspired by the global paradigm …