[PDF][PDF] The zero resource speech challenge 2015.
Abstract The Interspeech 2015 Zero Resource Speech Challenge aims at discovering
subword and word units from raw speech. The challenge provides the first unified and open …
subword and word units from raw speech. The challenge provides the first unified and open …
Recent developments in spoken term detection: a survey
A Mandal, KR Prasanna Kumar, P Mitra - International Journal of Speech …, 2014 - Springer
Spoken term detection (STD) provides an efficient means for content based indexing of
speech. However, achieving high detection performance, faster speed, detecting ot-of …
speech. However, achieving high detection performance, faster speed, detecting ot-of …
Segmental contrastive predictive coding for unsupervised word segmentation
Automatic detection of phoneme or word-like units is one of the core objectives in zero-
resource speech processing. Recent attempts employ self-supervised training methods …
resource speech processing. Recent attempts employ self-supervised training methods …
Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge
E Dunbar, N Hamilakis… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
Recent progress in self-supervised or unsupervised machine learning has opened the
possibility of building a full speech processing system from raw audio without using any …
possibility of building a full speech processing system from raw audio without using any …
Spoken content retrieval—beyond cascading speech recognition with text retrieval
Spoken content retrieval refers to directly indexing and retrieving spoken content based on
the audio rather than text descriptions. This potentially eliminates the requirement of …
the audio rather than text descriptions. This potentially eliminates the requirement of …
Unsupervised speech segmentation and variable rate representation learning using segmental contrastive predictive coding
Typically, unsupervised segmentation of speech into the phone-and wordlike units are
treated as separate tasks and are often done via different methods which do not fully …
treated as separate tasks and are often done via different methods which do not fully …
[PDF][PDF] Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.
We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic
modeling and represent speech frames with Gaussian posteriorgrams. The model performs …
modeling and represent speech frames with Gaussian posteriorgrams. The model performs …
ODSQA: Open-domain spoken question answering dataset
CH Lee, SM Wang, HC Chang… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Reading comprehension by machine has been widely studied, but machine comprehension
of spoken content is still a less investigated problem. In this paper, we release Open-Domain …
of spoken content is still a less investigated problem. In this paper, we release Open-Domain …
Acoustic segment modeling with spectral clustering methods
This paper presents a study of spectral clustering-based approaches to acoustic segment
modeling (ASM). ASM aims at finding the underlying phoneme-like speech units and …
modeling (ASM). ASM aims at finding the underlying phoneme-like speech units and …
[PDF][PDF] Building an ASR system for a low-research language through the adaptation of a high-resource language ASR system: preliminary results
For many languages in the world, not enough (annotated) speech data is available to train
an ASR system. We here propose a new three-step method to build an ASR system for such …
an ASR system. We here propose a new three-step method to build an ASR system for such …