A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
SpeechBrain: A general-purpose speech toolkit
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …
research and development of neural speech processing technologies by being simple …
Squeezeformer: An efficient transformer for automatic speech recognition
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …
various downstream speech tasks based on its hybrid attention-convolution architecture that …
Audio-visual efficient conformer for robust speech recognition
Abstract End-to-end Automatic Speech Recognition (ASR) systems based on neural
networks have seen large improvements in recent years. The availability of large scale hand …
networks have seen large improvements in recent years. The availability of large scale hand …
Hi-fi multi-speaker english tts dataset
This paper introduces a new multi-speaker English dataset for training text-to-speech
models. The dataset is based on LibriVox audiobooks and Project Gutenberg texts, both in …
models. The dataset is based on LibriVox audiobooks and Project Gutenberg texts, both in …
Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
The recently proposed Conformer architecture has shown state-of-the-art performances in
Automatic Speech Recog-nition by combining convolution with attention to model both local …
Automatic Speech Recog-nition by combining convolution with attention to model both local …
Zero-query adversarial attack on black-box automatic speech recognition systems
In recent years, extensive research has been conducted on the vulnerability of ASR systems,
revealing that black-box adversarial example attacks pose significant threats to real-world …
revealing that black-box adversarial example attacks pose significant threats to real-world …
A comparative study on non-autoregressive modelings for speech-to-text generation
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence,
which significantly reduces the inference speed at the cost of accuracy drop compared to …
which significantly reduces the inference speed at the cost of accuracy drop compared to …
Novel speech recognition systems applied to forensics within child exploitation: Wav2vec2. 0 vs. whisper
The growth in online child exploitation material is a significant challenge for European Law
Enforcement Agencies (LEAs). One of the most important sources of such online information …
Enforcement Agencies (LEAs). One of the most important sources of such online information …
Softcorrect: Error correction with soft detection for automatic speech recognition
Error correction in automatic speech recognition (ASR) aims to correct those incorrect words
in sentences generated by ASR models. Since recent ASR models usually have low word …
in sentences generated by ASR models. Since recent ASR models usually have low word …