Deep clustering: Discriminative embeddings for segmentation and separation
We address the problem of" cocktail-party" source separation in a deep learning framework
called deep clustering. Previous deep network approaches to separation have shown …
called deep clustering. Previous deep network approaches to separation have shown …
Audio-visual scene analysis with self-supervised multisensory features
The thud of a bouncing ball, the onset of speech as lips open--when visual and audio events
occur together, it suggests that there might be a common, underlying event that produced …
occur together, it suggests that there might be a common, underlying event that produced …
[BOOK][B] Automatic speech recognition
Automatic Speech Recognition (ASR), which is aimed to enable natural human–machine
interaction, has been an intensive research area for decades. Many core technologies, such …
interaction, has been an intensive research area for decades. Many core technologies, such …
Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique.
uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker …
uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker …
Permutation invariant training of deep models for speaker-independent multi-talker speech separation
We propose a novel deep learning training criterion, named permutation invariant training
(PIT), for speaker independent multi-talker speech separation, commonly known as the …
(PIT), for speaker independent multi-talker speech separation, commonly known as the …
Selective cortical representation of attended speaker in multi-talker speech perception
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker
background,,. How the auditory system manages to extract intelligible speech under such …
background,,. How the auditory system manages to extract intelligible speech under such …
Past review, current progress, and challenges ahead on the cocktail party problem
The cocktail party problem, ie, tracing and recognizing the speech of a specific speaker
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …
Single-channel multi-speaker separation using deep clustering
Deep clustering is a recently introduced deep learning architecture that uses discriminatively
trained embeddings as the basis for clustering. It was recently applied to spectrogram …
trained embeddings as the basis for clustering. It was recently applied to spectrogram …
Recent progresses in deep learning based acoustic models
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …
models and the motivation and insights behind the surveyed techniques. We first discuss …
Treefrogs as animal models for research on auditory scene analysis and the cocktail party problem
MA Bee - International Journal of Psychophysiology, 2015 - Elsevier
The perceptual analysis of acoustic scenes involves binding together sounds from the same
source and separating them from other sounds in the environment. In large social groups …
source and separating them from other sounds in the environment. In large social groups …