- Academic Search

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Speichern Zitieren Zitiert von: 228 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]

[PDF] nowpublishers.com

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Speichern Zitieren Zitiert von: 438 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Speichern Zitieren Zitiert von: 1820 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] arxiv.org

Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing

J Ao, R Wang, L Zhou, C Wang, S Ren, Y Wu… - arxiv preprint arxiv …, 2021 - arxiv.org

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural
language processing models, we propose a unified-modal SpeechT5 framework that …

Speichern Zitieren Zitiert von: 249 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding

Y Wang, A Boumadane, A Heba - arxiv preprint arxiv:2111.02735, 2021 - arxiv.org

Speech self-supervised models such as wav2vec 2.0 and HuBERT are making revolutionary
progress in Automatic Speech Recognition (ASR). However, they have not been totally …

Speichern Zitieren Zitiert von: 183 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Mfa-conformer: Multi-scale feature aggregation conformer for automatic speaker verification

Y Zhang, Z Lv, H Wu, S Zhang, P Hu, Z Wu… - arxiv preprint arxiv …, 2022 - arxiv.org

In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an
easy-to-implement, simple but effective backbone for automatic speaker verification based …

Speichern Zitieren Zitiert von: 151 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Squeezeformer: An efficient transformer for automatic speech recognition

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc

The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

Speichern Zitieren Zitiert von: 132 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] cell.com Full View

Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com

Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

Speichern Zitieren Zitiert von: 127 Ähnliche Artikel Alle 12 Versionen

[Free GPT-4]

[PDF] arxiv.org

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arxiv preprint arxiv …, 2023 - arxiv.org

The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Speichern Zitieren Zitiert von: 68 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Chatvideo: A tracklet-centric multimodal and versatile video understanding system

J Wang, D Chen, C Luo, X Dai, L Yuan, Z Wu… - arxiv preprint arxiv …, 2023 - arxiv.org

Existing deep video models are limited by specific tasks, fixed input-output spaces, and poor
generalization capabilities, making it difficult to deploy them in real-world scenarios. In this …

Speichern Zitieren Zitiert von: 51 Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Unispeech: Unified speech representation learning with labeled and unlabeled data

A review of deep learning techniques for speech processing

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing

A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding

Mfa-conformer: Multi-scale feature aggregation conformer for automatic speaker verification

Squeezeformer: An efficient transformer for automatic speech recognition

Audio self-supervised learning: A survey

Transformers in speech processing: A survey

Chatvideo: A tracklet-centric multimodal and versatile video understanding system