Sub-lexical language models with word level pronunciation lexicons

H Sak, M Saraclar - US Patent 9,292,489, 2016 - Google Patents
An automatic speech recognition (ASR) system and method are provided for using sub-
lexical language models together with word level pronunciation lexicons. These approaches …

Lattice indexing for spoken term detection

D Can, M Saraclar - IEEE Transactions on Audio, Speech, and …, 2011 - ieeexplore.ieee.org
This paper considers the problem of constructing an efficient inverted index for the spoken
term detection (STD) task. More specifically, we construct a deterministic weighted finite …

Morfessor 2.0: Toolkit for statistical morphological segmentation

P Smit, S Virpioja, SA Grönroos… - The 14th Conference of …, 2014 - aaltodoc.aalto.fi
Morfessor is a family of probabilistic machine learning methods forfinding the morphological
segmentation from raw text data. Recentdevelopments include the development of semi …

Multilingual speech recognition for Turkic languages

S Mussakhojayeva, K Dauletbek, R Yeshpanov… - Information, 2023 - mdpi.com
The primary aim of this study was to contribute to the development of multilingual automatic
speech recognition for lower-resourced Turkic languages. Ten languages—Azerbaijani …

[HTML][HTML] Advances in subword-based HMM-DNN speech recognition across languages

P Smit, S Virpioja, M Kurimo - Computer Speech & Language, 2021 - Elsevier
We describe a novel way to implement subword language models in speech recognition
systems based on weighted finite state transducers, hidden Markov models, and deep …

Spoken content retrieval: A survey of techniques and technologies

M Larson, GJF Jones - Foundations and Trends® in …, 2012 - nowpublishers.com
Speech media, that is, digital audio and video containing spoken content, has blossomed in
recent years. Large collections are accruing on the Internet as well as in private and …

Improved subword modeling for WFST-based speech recognition

P Smit, S Virpioja, M Kurimo - Interspeech, 2017 - research.aalto.fi
Because in agglutinative languages the number of observed word forms is very high,
subword units are often utilized in speech recognition. However, the proper use of subword …

Resources for Turkish natural language processing: A critical survey

Ç Çöltekin, AS Doğruöz, Ö Çetinoğlu - Language Resources and …, 2023 - Springer
This paper presents a comprehensive survey of corpora and lexical resources available for
Turkish. We review a broad range of resources, focusing on the ones that are publicly …

Alternative structures for character-level RNNs

P Bojanowski, A Joulin, T Mikolov - arxiv preprint arxiv:1511.06303, 2015 - arxiv.org
Recurrent neural networks are convenient and efficient models for language modeling.
However, when applied on the level of characters instead of words, they suffer from several …

A detailed survey of Turkish automatic speech recognition

RS Arslan, N BARIŞÇI - Turkish journal of electrical …, 2020 - journals.tubitak.gov.tr
Significant improvements have been made in automatic speech recognition (ASR) systems
in terms of both the general technology and the software used. Despite these …