Survey of post-OCR processing approaches

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Multilingual processing of speech via web services

T Kisler, U Reichel, F Schiel - Computer Speech & Language, 2017 - Elsevier
A new software paradigmSoftware as a Service'based on web services is proposed for
multilingual linguistic tools and exemplified with the BAS CLARIN web services. Instead of …

Librispeech: an asr corpus based on public domain audio books

V Panayotov, G Chen, D Povey… - 2015 IEEE international …, 2015 - ieeexplore.ieee.org
This paper introduces a new corpus of read English speech, suitable for training and
evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks …

[PDF][PDF] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding.

P Yenigalla, A Kumar, S Tripathi, C Singh, S Kar… - …, 2018 - abhayk1201.github.io
This paper proposes a speech emotion recognition method based on phoneme sequence
and spectrogram. Both phoneme sequence and spectrogram retain emotion contents of …

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

Report on the 11th IWSLT evaluation campaign

M Cettolo, J Niehues, S Stüker… - Proceedings of the …, 2014 - aclanthology.org
The paper overviews the 11th evaluation campaign organized by the IWSLT workshop. The
2014 evaluation offered multiple tracks on lecture transcription and translation based on the …

Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos

O Koller, NC Camgoz, H Ney… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
In this work we present a new approach to the field of weakly supervised learning in the
video domain. Our method is relevant to sequence learning problems which can be split up …

Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks

K Rao, F Peng, H Sak… - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-
speech systems as they describe how words are pronounced. We propose a G2P model …

Data driven word pronunciation learning and scoring with crowd sourcing based on the word's phonemes pronunciation scores

F Peng, F Beaufays, B Strope, X Lei… - US Patent …, 2017 - Google Patents
Methods, systems, and apparatus, including computer programs encoded on a computer
storage medium, for determining pronunciations for particular terms. The methods, systems …