[PDF][PDF] Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR.

MAB Shaik, AED Mousa, R Schlüter, H Ney - Interspeech, 2011 - academia.edu
German is a highly inflected language with a large number of words derived from the same
root. It makes use of a high degree of word compounding leading to high Out-of-vocabulary …

Hybrid sub-word segmentation for handling long tail in morphologically rich low resource languages

S Manghat, S Manghat, T Schultz - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Dealing with Out Of Vocabulary (OOV) words or unseen words is one of the main issues of
Machine Translation (MT) as well as automatic speech recognition (ASR) systems. For …

Using morpheme and syllable based sub-words for Polish LVCSR

MAB Shaik, AED Mousa, R Schlüter… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
Polish is a synthetic language with a high morpheme-per-word ratio. It makes use of a high
degree of inflection leading to high out-of-vocabulary (OOV) rates, and high Language …

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

HC Ri - International Journal of Speech Technology, 2019 - Springer
In large vocabulary continuous speech recognition (LVCSR), it is important in improving the
system's performance to determine reasonably the recognition unit. In Korean continuous …

[PDF][PDF] Investigating the Use of Mixed-Units Based Modeling for Improving Uyghur Speech Recognition.

P Hu, S Huang, Z Lv - SLTU, 2018 - isca-archive.org
Uyghur is a highly agglutinative language with a large number of words derived from the
same root. For such languages the use of subwords in speech recognition becomes a …

Broadcast news transcription in Central-East European languages

B Tarján, T Mozsolics, A Balog… - 2012 IEEE 3rd …, 2012 - ieeexplore.ieee.org
This paper addresses two main issues. First, how to develop broadcast news transcription
systems for Central-East European languages in a short time if only restricted language …

[PDF][PDF] Morpheme Based Factored Language Models for German LVCSR.

AED Mousa, MAB Shaik, R Schlüter, H Ney - INTERSPEECH, 2011 - researchgate.net
German is a highly inflectional language, where a large number of words can be generated
from the same root. It makes a liberal use of compounding leading to high Out-of-vocabulary …

[PDF][PDF] Feature-rich sub-lexical language models using a maximum entropy approach for German LVCSR.

MAB Shaik, AED Mousa, R Schlüter, H Ney - Interspeech, 2013 - academia.edu
German is a morphologically rich language having a high degree of word inflections,
derivations and compounding. This leads to high out-of-vocabulary (OOV) rates and poor …

[PDF][PDF] RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese.

MAB Shaik, Z Tüske, MA Tahir, M Nußbaum-Thom… - …, 2014 - isca-archive.org
Abstract In this paper, German, Polish, Spanish, and Portuguese large vocabulary
continuous speech recognition (LVCSR) systems developed by the RWTH Aachen …

The RWTH Aachen German and English LVCSR systems for IWSLT-2013

MAB Shaik, Z Tüske, S Wiesler… - Proceedings of the …, 2013 - aclanthology.org
In this paper, German and English large vocabulary continuous speech recognition
(LVCSR) systems developed by the RWTH Aachen University for the IWSLT-2013 …