Survey of post-OCR processing approaches
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …
converting printed documents into machine-readable ones. While OCR engines can do well …
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Multilingual processing of speech via web services
A new software paradigmSoftware as a Service'based on web services is proposed for
multilingual linguistic tools and exemplified with the BAS CLARIN web services. Instead of …
multilingual linguistic tools and exemplified with the BAS CLARIN web services. Instead of …
Librispeech: an asr corpus based on public domain audio books
This paper introduces a new corpus of read English speech, suitable for training and
evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks …
evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks …
[PDF][PDF] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding.
This paper proposes a speech emotion recognition method based on phoneme sequence
and spectrogram. Both phoneme sequence and spectrogram retain emotion contents of …
and spectrogram. Both phoneme sequence and spectrogram retain emotion contents of …
Automatic language identification in texts: A survey
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …
document or part thereof is written in. Automatic LI has been extensively researched for over …
Report on the 11th IWSLT evaluation campaign
The paper overviews the 11th evaluation campaign organized by the IWSLT workshop. The
2014 evaluation offered multiple tracks on lecture transcription and translation based on the …
2014 evaluation offered multiple tracks on lecture transcription and translation based on the …
Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos
In this work we present a new approach to the field of weakly supervised learning in the
video domain. Our method is relevant to sequence learning problems which can be split up …
video domain. Our method is relevant to sequence learning problems which can be split up …
Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks
Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-
speech systems as they describe how words are pronounced. We propose a G2P model …
speech systems as they describe how words are pronounced. We propose a G2P model …
Data driven word pronunciation learning and scoring with crowd sourcing based on the word's phonemes pronunciation scores
Methods, systems, and apparatus, including computer programs encoded on a computer
storage medium, for determining pronunciations for particular terms. The methods, systems …
storage medium, for determining pronunciations for particular terms. The methods, systems …