A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
Speech model pre-training for end-to-end spoken language understanding
Whereas conventional spoken language understanding (SLU) systems map speech to text,
and then text to intent, end-to-end SLU systems map speech directly to intent through a …
and then text to intent, end-to-end SLU systems map speech directly to intent through a …
Integrated deep learning method for workload and resource prediction in cloud systems
Cloud computing providers face several challenges in precisely forecasting large-scale
workload and resource time series. Such prediction can help them to achieve intelligent …
workload and resource time series. Such prediction can help them to achieve intelligent …
Specaugment on large scale datasets
Recently, SpecAugment, an augmentation scheme for automatic speech recognition that
acts directly on the spectrogram of input utterances, has shown to be highly effective in …
acts directly on the spectrogram of input utterances, has shown to be highly effective in …
[PDF][PDF] Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home.
We describe the structure and application of an acoustic room simulator to generate large-
scale simulated data for training deep neural networks for far-field speech recognition. The …
scale simulated data for training deep neural networks for far-field speech recognition. The …
Far-field automatic speech recognition
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …
far-field automatic speech recognition (ASR), has received a significant increase in attention …
Sparks of large audio models: A survey and outlook
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …
challenges in applying large language models to the field of audio signal processing. Audio …
Speech processing for digital home assistants: Combining signal processing with deep-learning techniques
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …
home assistants with a spoken language interface have become a ubiquitous commodity …
An attention pooling based representation learning method for speech emotion recognition
P Li, Y Song, IV McLoughlin, W Guo, LR Dai - 2018 - kar.kent.ac.uk
This paper proposes an attention pooling based representation learning method for speech
emotion recognition (SER). The emotional representation is learned in an end-to-end …
emotion recognition (SER). The emotional representation is learned in an end-to-end …