Adaptation algorithms for neural network-based speech recognition: An overview
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …
recognition, considering both hybrid hidden Markov model/neural network systems and end …
Fsd50k: an open dataset of human-labeled sound events
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-
specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and …
specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and …
Teacher-student architecture for knowledge learning: A survey
Although Deep Neural Networks (DNNs) have shown a strong capacity to solve large-scale
problems in many areas, such DNNs with voluminous parameters are hard to be deployed …
problems in many areas, such DNNs with voluminous parameters are hard to be deployed …
Music source separation with band-split RNN
The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …
recent years thanks to the development of novel neural network architectures and training …
Show, attend and distill: Knowledge distillation via attention-based feature matching
Abstract Knowledge distillation extracts general knowledge from a pretrained teacher
network and provides guidance to a target student network. Most studies manually tie …
network and provides guidance to a target student network. Most studies manually tie …
Fretal: Generalizing deepfake detection using knowledge distillation and representation learning
As GAN-based video and image manipulation technologies become more sophisticated and
easily accessible, there is an urgent need for effective deepfake detection technologies …
easily accessible, there is an urgent need for effective deepfake detection technologies …
Recent progresses in deep learning based acoustic models
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …
models and the motivation and insights behind the surveyed techniques. We first discuss …
A survey of transformer-based multimodal pre-trained modals
With the broad industrialization of Artificial Intelligence (AI), we observe a large fraction of
real-world AI applications are multimodal in nature in terms of relevant data and ways of …
real-world AI applications are multimodal in nature in terms of relevant data and ways of …
Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition
The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to
learn good speech representations from a large amount of unlabeled speech for the …
learn good speech representations from a large amount of unlabeled speech for the …
A survey of reasoning with foundation models
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …