A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Speech model pre-training for end-to-end spoken language understanding

L Lugosch, M Ravanelli, P Ignoto, VS Tomar… - arxiv preprint arxiv …, 2019 - arxiv.org
Whereas conventional spoken language understanding (SLU) systems map speech to text,
and then text to intent, end-to-end SLU systems map speech directly to intent through a …

Integrated deep learning method for workload and resource prediction in cloud systems

J Bi, S Li, H Yuan, MC Zhou - Neurocomputing, 2021 - Elsevier
Cloud computing providers face several challenges in precisely forecasting large-scale
workload and resource time series. Such prediction can help them to achieve intelligent …

Specaugment on large scale datasets

DS Park, Y Zhang, CC Chiu, Y Chen… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Recently, SpecAugment, an augmentation scheme for automatic speech recognition that
acts directly on the spectrogram of input utterances, has shown to be highly effective in …

[PDF][PDF] Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home.

C Kim, A Misra, KK Chin, T Hughes, A Narayanan… - …, 2017 - research.google.com
We describe the structure and application of an acoustic room simulator to generate large-
scale simulated data for training deep neural networks for far-field speech recognition. The …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Speech processing for digital home assistants: Combining signal processing with deep-learning techniques

R Haeb-Umbach, S Watanabe… - IEEE Signal …, 2019 - ieeexplore.ieee.org
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …

An attention pooling based representation learning method for speech emotion recognition

P Li, Y Song, IV McLoughlin, W Guo, LR Dai - 2018 - kar.kent.ac.uk
This paper proposes an attention pooling based representation learning method for speech
emotion recognition (SER). The emotional representation is learned in an end-to-end …