A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Squeezeformer: An efficient transformer for automatic speech recognition

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

Audio-visual efficient conformer for robust speech recognition

M Burchi, R Timofte - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com
Abstract End-to-end Automatic Speech Recognition (ASR) systems based on neural
networks have seen large improvements in recent years. The availability of large scale hand …

Hi-fi multi-speaker english tts dataset

E Bakhturina, V Lavrukhin, B Ginsburg… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper introduces a new multi-speaker English dataset for training text-to-speech
models. The dataset is based on LibriVox audiobooks and Project Gutenberg texts, both in …

Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition

M Burchi, V Vielzeuf - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
The recently proposed Conformer architecture has shown state-of-the-art performances in
Automatic Speech Recog-nition by combining convolution with attention to model both local …

Zero-query adversarial attack on black-box automatic speech recognition systems

Z Fang, T Wang, L Zhao, S Zhang, B Li, Y Ge… - Proceedings of the …, 2024 - dl.acm.org
In recent years, extensive research has been conducted on the vulnerability of ASR systems,
revealing that black-box adversarial example attacks pose significant threats to real-world …

A comparative study on non-autoregressive modelings for speech-to-text generation

Y Higuchi, N Chen, Y Fujita, H Inaguma… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence,
which significantly reduces the inference speed at the cost of accuracy drop compared to …

Novel speech recognition systems applied to forensics within child exploitation: Wav2vec2. 0 vs. whisper

JC Vásquez-Correa, A Álvarez Muniain - Sensors, 2023 - mdpi.com
The growth in online child exploitation material is a significant challenge for European Law
Enforcement Agencies (LEAs). One of the most important sources of such online information …

Softcorrect: Error correction with soft detection for automatic speech recognition

Y Leng, X Tan, W Liu, K Song, R Wang, XY Li… - Proceedings of the …, 2023 - ojs.aaai.org
Error correction in automatic speech recognition (ASR) aims to correct those incorrect words
in sentences generated by ASR models. Since recent ASR models usually have low word …