A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

[HTML][HTML] Few-shot learning for medical text: A review of advances, trends, and opportunities

Y Ge, Y Guo, S Das, MA Al-Garadi, A Sarker - Journal of Biomedical …, 2023 - Elsevier
Background: Few-shot learning (FSL) is a class of machine learning methods that require
small numbers of labeled instances for training. With many medical topics having limited …

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

Superb: Speech processing universal performance benchmark

S Yang, PH Chi, YS Chuang, CIJ Lai… - arxiv preprint arxiv …, 2021 - arxiv.org
Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …

Multi-task pre-training for plug-and-play task-oriented dialogue system

Y Su, L Shu, E Mansimov, A Gupta, D Cai… - arxiv preprint arxiv …, 2021 - arxiv.org
Pre-trained language models have been recently shown to benefit task-oriented dialogue
(TOD) systems. Despite their success, existing methods often formulate this task as a …

Data augmentation using pre-trained transformer models

V Kumar, A Choudhary, E Cho - arxiv preprint arxiv:2003.02245, 2020 - arxiv.org
Language model based pre-trained models such as BERT have provided significant gains
across different NLP tasks. In this paper, we study different types of transformer based pre …

Bert for joint intent classification and slot filling

Q Chen, Z Zhuo, W Wang - arxiv preprint arxiv:1902.10909, 2019 - arxiv.org
Intent classification and slot filling are two essential tasks for natural language
understanding. They often suffer from small-scale human-labeled training data, resulting in …

Efficient intent detection with dual sentence encoders

I Casanueva, T Temčinas, D Gerz… - arxiv preprint arxiv …, 2020 - arxiv.org
Building conversational systems in new domains and with added functionality requires
resource-efficient models that work under low-data regimes (ie, in few-shot setups) …

A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding

Y Wang, A Boumadane, A Heba - arxiv preprint arxiv:2111.02735, 2021 - arxiv.org
Speech self-supervised models such as wav2vec 2.0 and HuBERT are making revolutionary
progress in Automatic Speech Recognition (ASR). However, they have not been totally …

SLURP: A spoken language understanding resource package

E Bastianelli, A Vanzo, P Swietojanski… - arxiv preprint arxiv …, 2020 - arxiv.org
Spoken Language Understanding infers semantic meaning directly from audio data, and
thus promises to reduce error propagation and misunderstandings in end-user applications …