An embarrassingly simple method to mitigate undesirable properties of pretrained language model tokenizers

V Hofmann, H Schuetze, JB Pierrehumbert - 2022 - ora.ox.ac.uk
We introduce FLOTA (Few Longest Token Approximation), a simple yet effective method to
improve the tokenization of pretrained language models (PLMs). FLOTA uses the …

A survey on syntactic processing techniques

X Zhang, R Mao, E Cambria - Artificial Intelligence Review, 2023 - Springer
Computational syntactic processing is a fundamental technique in natural language
processing. It normally serves as a pre-processing method to transform natural language …

Encoder-decoder methods for text normalization

M Lusetti, T Ruzsics, A Göhring, T Samardžić, E Stark - 2018 - zora.uzh.ch
Text normalization is the task of map** non-canonical language, typical of speech
transcription and computer-mediated communication, to a standardized writing. It is an up …

Fortification of neural morphological segmentation models for polysynthetic minimal-resource languages

K Kann, M Mager, I Meza-Ruiz, H Schütze - arxiv preprint arxiv …, 2018 - arxiv.org
Morphological segmentation for polysynthetic languages is challenging, because a word
may consist of many individual morphemes and training data can be extremely scarce …

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

A Wiemerslage, M Silfverberg, C Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
Automatic morphological processing can aid downstream natural language processing
applications, especially for low-resource languages, and assist language documentation …

Canonical and surface morphological segmentation for Nguni languages

T Moeng, S Reay, A Daniels, J Buys - Southern African Conference for …, 2021 - Springer
Morphological Segmentation involves decomposing words into morphemes, the smallest
meaning-bearing units of language. This is an important NLP task for morphologically-rich …

Deep convolutional networks for supervised morpheme segmentation of Russian language

A Sorokin, A Kravtsova - Artificial Intelligence and Natural Language: 7th …, 2018 - Springer
Deep Convolutional Networks for Supervised Morpheme Segmentation of Russian Language |
SpringerLink Skip to main content Advertisement Springer Nature Link Account Menu Find a …

Tackling the low-resource challenge for canonical segmentation

M Mager, Ö Çetinoğlu, K Kann - arxiv preprint arxiv:2010.02804, 2020 - arxiv.org
Canonical morphological segmentation consists of dividing words into their standardized
morphemes. Here, we are interested in approaches for the task when training data is limited …

Computational morphology with neural network approaches

L Liu - arxiv preprint arxiv:2105.09404, 2021 - arxiv.org
Neural network approaches have been applied to computational morphology with great
success, improving the performance of most tasks by a large margin and providing new …

Convolutional neural networks for low-resource morpheme segmentation: baseline or state-of-the-art?

A Sorokin - Proceedings of the 16th Workshop on Computational …, 2019 - aclanthology.org
We apply convolutional neural networks to the task of shallow morpheme segmentation
using low-resource datasets for 5 different languages. We show that both in fully supervised …