A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Superb: Speech processing universal performance benchmark

S Yang, PH Chi, YS Chuang, CIJ Lai… - arxiv preprint arxiv …, 2021 - arxiv.org
Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …

Decoupled contrastive learning

CH Yeh, CY Hong, YC Hsu, TL Liu, Y Chen… - European conference on …, 2022 - Springer
Contrastive learning (CL) is one of the most successful paradigms for self-supervised
learning (SSL). In a principled way, it considers two augmented “views” of the same image …

Layer-wise analysis of a self-supervised speech representation model

A Pasad, JC Chou, K Livescu - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …

SLURP: A spoken language understanding resource package

E Bastianelli, A Vanzo, P Swietojanski… - arxiv preprint arxiv …, 2020 - arxiv.org
Spoken Language Understanding infers semantic meaning directly from audio data, and
thus promises to reduce error propagation and misunderstandings in end-user applications …

Massive: A 1m-example multilingual natural language understanding dataset with 51 typologically-diverse languages

J FitzGerald, C Hench, C Peris, S Mackie… - arxiv preprint arxiv …, 2022 - arxiv.org
We present the MASSIVE dataset--Multilingual Amazon Slu resource package (SLURP) for
Slot-filling, Intent classification, and Virtual assistant Evaluation. MASSIVE contains 1M …

Ecosystem-level analysis of deployed machine learning reveals homogeneous outcomes

C Toups, R Bommasani, K Creel… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Machine learning is traditionally studied at the model level: researchers measure
and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific …

Espnet-slu: Advancing spoken language understanding through espnet

S Arora, S Dalmia, P Denisov, X Chang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
As Automatic Speech Processing (ASR) systems are getting better, there is an increasing
interest of using the ASR output to do downstream Natural Language Processing (NLP) …