- Academic Search

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Speichern Zitieren Zitiert von: 227 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]

[PDF] arxiv.org

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier

Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Speichern Zitieren Zitiert von: 419 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]

[PDF] arxiv.org

Speech model pre-training for end-to-end spoken language understanding

L Lugosch, M Ravanelli, P Ignoto, VS Tomar… - arxiv preprint arxiv …, 2019 - arxiv.org

Whereas conventional spoken language understanding (SLU) systems map speech to text,
and then text to intent, end-to-end SLU systems map speech directly to intent through a …

Speichern Zitieren Zitiert von: 385 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]

[PDF] google.com

Integrated deep learning method for workload and resource prediction in cloud systems

J Bi, S Li, H Yuan, MC Zhou - Neurocomputing, 2021 - Elsevier

Cloud computing providers face several challenges in precisely forecasting large-scale
workload and resource time series. Such prediction can help them to achieve intelligent …

Speichern Zitieren Zitiert von: 129 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

Specaugment on large scale datasets

DS Park, Y Zhang, CC Chiu, Y Chen… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Recently, SpecAugment, an augmentation scheme for automatic speech recognition that
acts directly on the spectrogram of input utterances, has shown to be highly effective in …

Speichern Zitieren Zitiert von: 174 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] google.com

[PDF][PDF] Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home.

C Kim, A Misra, KK Chin, T Hughes, A Narayanan… - …, 2017 - research.google.com

We describe the structure and application of an acoustic room simulator to generate large-
scale simulated data for training deep neural networks for far-field speech recognition. The …

Speichern Zitieren Zitiert von: 276 Ähnliche Artikel Alle 15 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org

The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

Speichern Zitieren Zitiert von: 121 Ähnliche Artikel Alle 8 Versionen

[Free GPT-4]

[PDF] arxiv.org

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Speichern Zitieren Zitiert von: 35 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] uni-paderborn.de

Speech processing for digital home assistants: Combining signal processing with deep-learning techniques

R Haeb-Umbach, S Watanabe… - IEEE Signal …, 2019 - ieeexplore.ieee.org

Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital
home assistants with a spoken language interface have become a ubiquitous commodity …

Speichern Zitieren Zitiert von: 199 Ähnliche Artikel Alle 9 Versionen

[Free GPT-4]

[PDF] kent.ac.uk

An attention pooling based representation learning method for speech emotion recognition

P Li, Y Song, IV McLoughlin, W Guo, LR Dai - 2018 - kar.kent.ac.uk

This paper proposes an attention pooling based representation learning method for speech
emotion recognition (SER). The emotional representation is learned in an end-to-end …

Speichern Zitieren Zitiert von: 197 Ähnliche Artikel Alle 6 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Acoustic Modeling for Google Home.

A review of deep learning techniques for speech processing

A review of speaker diarization: Recent advances with deep learning

Speech model pre-training for end-to-end spoken language understanding

Integrated deep learning method for workload and resource prediction in cloud systems

Specaugment on large scale datasets

[PDF][PDF] Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home.

Far-field automatic speech recognition

Sparks of large audio models: A survey and outlook

Speech processing for digital home assistants: Combining signal processing with deep-learning techniques

An attention pooling based representation learning method for speech emotion recognition