Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks

R Eloff, A Nortje, B van Niekerk, A Govender… - arxiv preprint arxiv …, 2019 - arxiv.org
For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable
neural networks to unlabelled speech and use the discovered units for speech synthesis …

Early phonetic learning without phonetic categories: Insights from large-scale simulations on realistic input

T Schatz, NH Feldman, S Goldwater… - Proceedings of the …, 2021 - National Acad Sciences
Before they even speak, infants become attuned to the sounds of the language (s) they hear,
processing native phonetic contrasts more easily than nonnative ones. For example …

Do self-supervised speech models develop human-like perception biases?

J Millet, E Dunbar - arxiv preprint arxiv:2205.15819, 2022 - arxiv.org
Self-supervised models for speech processing form representational spaces without using
any external labels. Increasingly, they appear to be a feasible way of at least partially …

How familiar does that sound? Cross-lingual representational similarity analysis of acoustic word embeddings

BM Abdullah, I Zaitova, T Avgustinova… - arxiv preprint arxiv …, 2021 - arxiv.org
How do neural networks" perceive" speech sounds from unknown languages? Does the
typological similarity between the model's training language (L1) and an unknown language …

Automatic speech recognition in taxi call service systems

S Rustamov, N Akhundova, A Valizada - … 2019, London, UK, August 19–20 …, 2019 - Springer
In this research, the application of automatic speech recognition system in taxi call services
is investigated. In comparison with traditional query handling systems such as live agents …

Rediscovering the slavic continuum in representations emerging from neural models of spoken language identification

BM Abdullah, J Kudera, T Avgustinova… - arxiv preprint arxiv …, 2020 - arxiv.org
Deep neural networks have been employed for various spoken language recognition tasks,
including tasks that are multilingual by definition such as spoken language identification. In …

Comparing unsupervised speech learning directly to human performance in speech perception

J Millet, N Jurov, E Dunbar - CogSci 2019-41st Annual Meeting of …, 2019 - hal.science
We compare the performance of humans (English and French listeners) versus an
unsupervised speech model in a perception experiment (ABX discrimination task). Although …