Review the state-of-the-art technologies of semantic segmentation based on deep learning
The goal of semantic segmentation is to segment the input image according to semantic
information and predict the semantic category of each pixel from a given label set. With the …
information and predict the semantic category of each pixel from a given label set. With the …
Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …
can be analyzed and generated at many granularities. Until recently, most natural language …
The SIGMORPHON 2022 shared task on morpheme segmentation
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to
decompose a word into a sequence of morphemes and covered most types of morphology …
decompose a word into a sequence of morphemes and covered most types of morphology …
Dynamic acoustic unit augmentation with bpe-dropout for low-resource end-to-end speech recognition
With the rapid development of speech assistants, adapting server-intended automatic
speech recognition (ASR) solutions to a direct device has become crucial. For on-device …
speech recognition (ASR) solutions to a direct device has become crucial. For on-device …
MorphPiece: Moving away from Statistical Language Representation
H Jabbar - arxiv preprint arxiv:2307.07262, 2023 - arxiv.org
Tokenization is a critical part of modern NLP pipelines. However, contemporary tokenizers
for Large Language Models are based on statistical analysis of text corpora, without much …
for Large Language Models are based on statistical analysis of text corpora, without much …
Exploring segmentation approaches for neural machine translation of code-switched Egyptian Arabic-English text
Data sparsity is one of the main challenges posed by code-switching (CS), which is further
exacerbated in the case of morphologically rich languages. For the task of machine …
exacerbated in the case of morphologically rich languages. For the task of machine …
Research on the Uyghur morphological segmentation model with an attention mechanism
G Abudouwaili, K Abiderexiti, Y Shen… - Connection …, 2022 - Taylor & Francis
Morphological segmentation is a basic task in agglutinative language information
processing, dividing words into the smallest semantic unit morphemes. There are two types …
processing, dividing words into the smallest semantic unit morphemes. There are two types …
Improving tokenisation by alternative treatment of spaces
Tokenisation is the first step in almost all NLP tasks, and state-of-the-art transformer-based
language models all use subword tokenisation algorithms to process input text. Existing …
language models all use subword tokenisation algorithms to process input text. Existing …
The current state of Finnish NLP
There are a lot of tools and resources available for processing Finnish. In this paper, we
survey recent papers focusing on Finnish NLP related to many different subcategories of …
survey recent papers focusing on Finnish NLP related to many different subcategories of …
The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks
This paper describes the joint participation of University of Helsinki and Aalto University to
two shared tasks of WMT 2020: the news translation between Inuktitut and English and the …
two shared tasks of WMT 2020: the news translation between Inuktitut and English and the …