[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
Wavchat: A survey of spoken dialogue models
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …
have captured significant attention in the speech domain. Compared to traditional three-tier …
Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks
We investigate segmenting and clustering speech into low-bitrate phone-like sequences
without supervision. We specifically constrain pretrained self-supervised vector-quantized …
without supervision. We specifically constrain pretrained self-supervised vector-quantized …
Introducing Meta‐analysis in the Evaluation of Computational Models of Infant Language Development
Computational models of child language development can help us understand the cognitive
underpinnings of the language learning process, which occurs along several linguistic …
underpinnings of the language learning process, which occurs along several linguistic …
Unsupervised word segmentation using k nearest neighbors
In this paper, we propose an unsupervised kNN-based approach for word segmentation in
speech utterances. Our method relies on self-supervised pre-trained speech …
speech utterances. Our method relies on self-supervised pre-trained speech …
[PDF][PDF] Evaluation of computational models of infant language development against robust empirical data from meta-analyses: what, why, and how?
Computational models of child language development can help us understand the cognitive
underpinnings of the language learning process. One advantage of computational modeling …
underpinnings of the language learning process. One advantage of computational modeling …
Character-based PCFG induction for modeling the syntactic acquisition of morphologically rich languages
Unsupervised PCFG induction models, which build syntactic structures from raw text, can be
used to evaluate the extent to which syntactic knowledge can be acquired from distributional …
used to evaluate the extent to which syntactic knowledge can be acquired from distributional …
Exploring the encoding of linguistic representations in the Fully-Connected Layer of generative CNNs for Speech
BF Šegedin, G Beguš - arxiv preprint arxiv:2501.07726, 2025 - arxiv.org
Interpretability work on the convolutional layers of CNNs has primarily focused on computer
vision, but some studies also explore correspondences between the latent space and the …
vision, but some studies also explore correspondences between the latent space and the …
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
It has been generally assumed in the automatic speech recognition (ASR) literature that it is
better for models to have access to wider context windows. Yet, many of the potential …
better for models to have access to wider context windows. Yet, many of the potential …
FAtNet: Cost-Effective Approach Towards Mitigating the Linguistic Bias in Speaker Verification Systems
Abstract Linguistic bias in Deep Neural Network (DNN) based Natural Language Processing
(NLP) systems is a critical problem that needs attention. The problem further intensifies in …
(NLP) systems is a critical problem that needs attention. The problem further intensifies in …