Review the state-of-the-art technologies of semantic segmentation based on deep learning

Y Mo, Y Wu, X Yang, F Liu, Y Liao - Neurocomputing, 2022 - Elsevier
The goal of semantic segmentation is to segment the input image according to semantic
information and predict the semantic category of each pixel from a given label set. With the …

Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP

SJ Mielke, Z Alyafeai, E Salesky, C Raffel… - arxiv preprint arxiv …, 2021 - arxiv.org
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …

The SIGMORPHON 2022 shared task on morpheme segmentation

K Batsuren, G Bella, A Arora, V Martinović… - arxiv preprint arxiv …, 2022 - arxiv.org
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to
decompose a word into a sequence of morphemes and covered most types of morphology …

Dynamic acoustic unit augmentation with bpe-dropout for low-resource end-to-end speech recognition

A Laptev, A Andrusenko, I Podluzhny, A Mitrofanov… - Sensors, 2021 - mdpi.com
With the rapid development of speech assistants, adapting server-intended automatic
speech recognition (ASR) solutions to a direct device has become crucial. For on-device …

MorphPiece: Moving away from Statistical Language Representation

H Jabbar - arxiv preprint arxiv:2307.07262, 2023 - arxiv.org
Tokenization is a critical part of modern NLP pipelines. However, contemporary tokenizers
for Large Language Models are based on statistical analysis of text corpora, without much …

Exploring segmentation approaches for neural machine translation of code-switched Egyptian Arabic-English text

M Gaser, M Mager, I Hamed, N Habash… - arxiv preprint arxiv …, 2022 - arxiv.org
Data sparsity is one of the main challenges posed by code-switching (CS), which is further
exacerbated in the case of morphologically rich languages. For the task of machine …

Research on the Uyghur morphological segmentation model with an attention mechanism

G Abudouwaili, K Abiderexiti, Y Shen… - Connection …, 2022 - Taylor & Francis
Morphological segmentation is a basic task in agglutinative language information
processing, dividing words into the smallest semantic unit morphemes. There are two types …

Improving tokenisation by alternative treatment of spaces

E Gow-Smith, HT Madabushi, C Scarton… - arxiv preprint arxiv …, 2022 - arxiv.org
Tokenisation is the first step in almost all NLP tasks, and state-of-the-art transformer-based
language models all use subword tokenisation algorithms to process input text. Existing …

The current state of Finnish NLP

M Hämäläinen, K Alnajjar - arxiv preprint arxiv:2109.11326, 2021 - arxiv.org
There are a lot of tools and resources available for processing Finnish. In this paper, we
survey recent papers focusing on Finnish NLP related to many different subcategories of …

The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks

Y Scherrer, SA Grönroos, S Virpioja - Conference on Machine …, 2020 - research.aalto.fi
This paper describes the joint participation of University of Helsinki and Aalto University to
two shared tasks of WMT 2020: the news translation between Inuktitut and English and the …