Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

Fsd50k: an open dataset of human-labeled sound events

E Fonseca, X Favory, J Pons, F Font… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-
specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and …

Teacher-student architecture for knowledge learning: A survey

C Hu, X Li, D Liu, X Chen, J Wang, X Liu - arxiv preprint arxiv:2210.17332, 2022 - arxiv.org
Although Deep Neural Networks (DNNs) have shown a strong capacity to solve large-scale
problems in many areas, such DNNs with voluminous parameters are hard to be deployed …

Music source separation with band-split RNN

Y Luo, J Yu - IEEE/ACM Transactions on Audio, Speech, and …, 2023 - ieeexplore.ieee.org
The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …

Show, attend and distill: Knowledge distillation via attention-based feature matching

M Ji, B Heo, S Park - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
Abstract Knowledge distillation extracts general knowledge from a pretrained teacher
network and provides guidance to a target student network. Most studies manually tie …

Fretal: Generalizing deepfake detection using knowledge distillation and representation learning

M Kim, S Tariq, SS Woo - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
As GAN-based video and image manipulation technologies become more sophisticated and
easily accessible, there is an urgent need for effective deepfake detection technologies …

Recent progresses in deep learning based acoustic models

D Yu, J Li - IEEE/CAA Journal of automatica sinica, 2017 - ieeexplore.ieee.org
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …

A survey of transformer-based multimodal pre-trained modals

X Han, YT Wang, JL Feng, C Deng, ZH Chen… - Neurocomputing, 2023 - Elsevier
With the broad industrialization of Artificial Intelligence (AI), we observe a large fraction of
real-world AI applications are multimodal in nature in terms of relevant data and ways of …

Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition

Y Wang, J Li, H Wang, Y Qian… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to
learn good speech representations from a large amount of unlabeled speech for the …

A survey of reasoning with foundation models

J Sun, C Zheng, E **e, Z Liu, R Chu, J Qiu, J Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …