A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023‏ - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Fleurs: Few-shot learning evaluation of universal representations of speech

A Conneau, M Ma, S Khanuja, Y Zhang… - 2022 IEEE Spoken …, 2023‏ - ieeexplore.ieee.org
We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of
Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on …

An introduction to deep learning in natural language processing: Models, techniques, and tools

I Lauriola, A Lavelli, F Aiolli - Neurocomputing, 2022‏ - Elsevier
Abstract Natural Language Processing (NLP) is a branch of artificial intelligence that
involves the design and implementation of systems and algorithms able to interact through …

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation

C Wang, M Riviere, A Lee, A Wu, C Talnikar… - arxiv preprint arxiv …, 2021‏ - arxiv.org
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of
unlabelled speech data in 23 languages. It is the largest open data to date for unsupervised …

Gender bias in machine translation

B Savoldi, M Gaido, L Bentivogli, M Negri… - Transactions of the …, 2021‏ - direct.mit.edu
Abstract Machine translation (MT) technology has facilitated our daily tasks by providing
accessible shortcuts for gathering, processing, and communicating information. However, it …

Fast conformer with linearly scalable attention for efficient speech recognition

D Rekesh, NR Koluguri, S Kriman… - 2023 IEEE Automatic …, 2023‏ - ieeexplore.ieee.org
Conformer-based models have become the dominant end-to-end architecture for speech
processing tasks. With the objective of enhancing the conformer architecture for efficient …

Reproducing whisper-style training using an open-source toolkit and publicly available data

Y Peng, J Tian, B Yan, D Berrebbi… - 2023 IEEE Automatic …, 2023‏ - ieeexplore.ieee.org
Pre-training speech models on large volumes of data has achieved remarkable success.
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …

Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022‏ - cris.fbk.eu
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

The multilingual tedx corpus for speech recognition and translation

E Salesky, M Wiesner, J Bremerman, R Cattoni… - arxiv preprint arxiv …, 2021‏ - arxiv.org
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and
speech translation (ST) research across many non-English source languages. The corpus is …

Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Z Chen, H Huang, A Andrusenko… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org
We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …