Академия Google

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Сохранить Цитировать Цитируется: 240 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Сохранить Цитировать Цитируется: 406 Похожие статьи Все версии статьи (10)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Сохранить Цитировать Цитируется: 3928 Похожие статьи Все версии статьи (11) В виде HTML

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …

Сохранить Цитировать Цитируется: 141 Похожие статьи Все версии статьи (8)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Сохранить Цитировать Цитируется: 1857 Похожие статьи Все версии статьи (7)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Сохранить Цитировать Цитируется: 1006 Похожие статьи Все версии статьи (20) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wenetspeech: A 10000+ hours multi-domain mandarin corpus for speech recognition

B Zhang, H Lv, P Guo, Q Shao, C Yang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of
10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about …

Сохранить Цитировать Цитируется: 221 Похожие статьи Все версии статьи (6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

Сохранить Цитировать Цитируется: 58 Похожие статьи Все версии статьи (4)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Funcodec: A fundamental, reproducible and integrable open-source toolkit for neural speech codec

Z Du, S Zhang, K Hu, S Zheng - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

This paper presents FunCodec, a fundamental neural speech codec toolkit, which is an
extension of the open-source speech processing toolkit FunASR. FunCodec provides …

Сохранить Цитировать Цитируется: 53 Похожие статьи Все версии статьи (3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Connecting speech encoder and large language model for asr

W Yu, C Tang, G Sun, X Chen, T Tan… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

The impressive capability and versatility of large language models (LLMs) have aroused
increasing attention in automatic speech recognition (ASR), with several pioneering studies …

Сохранить Цитировать Цитируется: 49 Похожие статьи Все версии статьи (3)

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Gigaspeech: An evolving, multi-domain asr corpus with 10,000 hours of transcribed audio

A review of deep learning techniques for speech processing

Self-supervised speech representation learning: A review

Robust speech recognition via large-scale weak supervision

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

Ego4d: Around the world in 3,000 hours of egocentric video

Wenetspeech: A 10000+ hours multi-domain mandarin corpus for speech recognition

The singing voice conversion challenge 2023

Funcodec: A fundamental, reproducible and integrable open-source toolkit for neural speech codec

Connecting speech encoder and large language model for asr