Spoken content retrieval—beyond cascading speech recognition with text retrieval

L Lee, J Glass, H Lee, C Chan - IEEE/ACM Transactions on …, 2015 - ieeexplore.ieee.org
Spoken content retrieval refers to directly indexing and retrieving spoken content based on
the audio rather than text descriptions. This potentially eliminates the requirement of …

[PDF][PDF] Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.

H Chen, CC Leung, L ** relationships from audio embeddings
DR Liu, KY Chen, HY Lee, L Lee - arxiv preprint arxiv:1804.00316, 2018 - arxiv.org
Unsupervised discovery of acoustic tokens from audio corpora without annotation and
learning vector representations for these tokens have been widely studied. Although these …

The spoken web search task at MediaEval 2012

F Metze, X Anguera, E Barnard… - … on Acoustics, Speech …, 2013 - ieeexplore.ieee.org
In this paper, we describe the “Spoken Web Search” Task, which was held as part of the
2012 MediaEval benchmark evaluation campaign. The purpose of this task was to perform …

Learning acoustic word embeddings with temporal context for query-by-example speech search

Y Yuan, CC Leung, L **e, H Chen, B Ma… - arxiv preprint arxiv …, 2018 - arxiv.org
We propose to learn acoustic word embeddings with temporal context for query-by-example
(QbE) speech search. The temporal context includes the leading and trailing word …

[PDF][PDF] Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection.

H Chen, CC Leung, L **e, B Ma, H Li - Interspeech, 2016 - researchgate.net
We propose a framework which ports Dirichlet Gaussian mixture model (DPGMM) based
labels to deep neural network (DNN). The DNN trained using the unsupervised labels is …

Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection

H Wang, T Lee, CC Leung, B Ma… - 2013 IEEE International …, 2013 - ieeexplore.ieee.org
Recently the posteriorgram-based template matching framework has been successfully
applied to query-by-example spoken term detection tasks for low-resource languages. This …