Encoder-decoder methods for text normalization

M Lusetti, T Ruzsics, A Göhring, T Samardžić, E Stark - 2018 - zora.uzh.ch
Text normalization is the task of map** non-canonical language, typical of speech
transcription and computer-mediated communication, to a standardized writing. It is an up …

ArchiMob-a corpus of spoken Swiss German

T Samardzic, Y Scherrer, E Glaser - Proceedings of the Tenth …, 2016 - aclanthology.org
Swiss dialects of German are, unlike most dialects of well standardised languages, widely
used in everyday communication. Despite this fact, automatic processing of Swiss German is …

From the paft to the fiiture: a fully automatic NMT and word embeddings method for OCR post-correction

M Hämäläinen, S Hengchen - arxiv preprint arxiv:1910.05535, 2019 - arxiv.org
A great deal of historical corpora suffer from errors introduced by the OCR (optical character
recognition) methods used in the digitization process. Correcting these errors manually is a …

Digitising Swiss German: how to process and study a polycentric spoken language

Y Scherrer, T Samardžić, E Glaser - Language Resources and Evaluation, 2019 - Springer
Swiss dialects of German are, unlike many dialects of other standardised languages, widely
used in everyday communication. Despite this fact, automatic processing of Swiss German is …

Machine translation of low-resource spoken dialects: Strategies for normalizing Swiss German

PE Honnet, A Popescu-Belis, C Musat… - arxiv preprint arxiv …, 2017 - arxiv.org
The goal of this work is to design a machine translation (MT) system for a low-resource
family of dialects, collectively known as Swiss German, which are widely spoken in …

[PDF][PDF] Dialect text normalization to normative standard Finnish

N Partanen, M Hämäläinen… - Workshop on Noisy …, 2019 - researchportal.helsinki.fi
We compare different LSTMs and transformer models in terms of their effectiveness in
normalizing dialectal Finnish into the normative standard Finnish. As dialect is the common …

[PDF][PDF] Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation

Y Scherrer, N Ljubešic - Proceedings of the 13th conference on …, 2016 - academia.edu
Abstract The Swiss German dialect corpus Archi-Mob poses great challenges for NLP and
corpus linguistic research due to the massive amount of variation found in the transcriptions …

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …

German dialect identification in interview transcriptions

S Malmasi, M Zampieri - Proceedings of the Fourth Workshop on …, 2017 - aclanthology.org
This paper presents three systems submitted to the German Dialect Identification (GDI) task
at the VarDial Evaluation Campaign 2017. The task consists of training models to identify the …

Normalizing early English letters to present-day English spelling

M Hämäläinen, T Säily, J Rueter… - Proceedings of the …, 2018 - aclanthology.org
This paper presents multiple methods for normalizing the most deviant and infrequent
historical spellings in a corpus consisting of personal correspondence from the 15th to the …