Unsupervised learning of morphology

H Hammarström, L Borin - Computational Linguistics, 2011 - direct.mit.edu
This article surveys work on Unsupervised Learning of Morphology. We define
Unsupervised Learning of Morphology as the problem of inducing a description (of some …

BPE-dropout: Simple and effective subword regularization

I Provilkov, D Emelianenko, E Voita - arxiv preprint arxiv:1910.13267, 2019 - arxiv.org
Subword segmentation is widely used to address the open vocabulary problem in machine
translation. The dominant approach to subword segmentation is Byte Pair Encoding (BPE) …

[PDF][PDF] Inducing the morphological lexicon of a natural language from unannotated text

MJP Creutz, KH Lagus - International and Interdisciplinary …, 2005 - researchportal.helsinki.fi
This work presents an algorithm for the unsupervised learning, or induction, of a simple
morphology of a natural language. A probabilistic maximum a posteriori model is utilized …

Task-adaptive tokenization: Enhancing long-form text generation efficacy in mental health and beyond

S Liu, N Deng, S Sabour, Y Jia, M Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
We propose task-adaptive tokenization as a way to adapt the generation pipeline to the
specifics of a downstream task and enhance long-form generation in mental health. Inspired …

Meaningless yet meaningful: Morphology grounded subword-level NMT

T Banerjee, P Bhattacharyya - Proceedings of the second …, 2018 - aclanthology.org
We explore the use of two independent subsystems Byte Pair Encoding (BPE) and
Morfessor as basic units for subword-level neural machine translation (NMT). We show that …

[PDF][PDF] Combining morpheme-based machine translation with post-processing morpheme prediction

A Clifton, A Sarkar - Proceedings of the 49th Annual Meeting of …, 2011 - aclanthology.org
This paper extends the training and tuning regime for phrase-based statistical machine
translation to obtain fluent translations into morphologically complex languages (we build an …

[LIBRO][B] Induction of the morphology of natural language: Unsupervised morpheme segmentation with application to automatic speech recognition

M Creutz - 2006 - aaltodoc.aalto.fi
In order to develop computer applications that successfully process natural language data
(text and speech), one needs good models of the vocabulary and grammar of as many …

[PDF][PDF] Unsupervised Morpheme Analysis Evaluation by a Comparison to a Linguistic Gold Standard@ Morpho Challenge 2007

M Kurimo, MJP Creutz… - CLEF 2007 Workshop, 2007 - researchportal.helsinki.fi
This paper presents the evaluation of Morpho Challenge Competition 1 (linguistic gold
standard). The Competition 2 (information retrieval) is described in a companion paper. In …

[PDF][PDF] Automated grammar engineering for verbal morphology

DA Wax - 2014 - digital.lib.washington.edu
This study examines the cross-linguistic potential for the automatic analysis of verbal
morphology and creation of implemented formal grammars using the Grammar Matrix …

TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking

SK Nahin, RN Nandi, S Sarker, QS Muhtaseem… - arxiv preprint arxiv …, 2025 - arxiv.org
In this paper, we present TituLLMs, the first large pretrained Bangla LLMs, available in 1B
and 3B parameter sizes. Due to computational constraints during both training and …