Google Академія

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Зберегти Послатися Цитовано в 242 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arxiv preprint arxiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

Зберегти Послатися Цитовано в 210 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Зберегти Послатися Цитовано в 1868 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arxiv preprint arxiv …, 2021 - arxiv.org

This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

Зберегти Послатися Цитовано в 719 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Зберегти Послатися Цитовано в 409 джерелах Пов’язані статті Кількість версій: 10

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Superb: Speech processing universal performance benchmark

S Yang, PH Chi, YS Chuang, CIJ Lai… - arxiv preprint arxiv …, 2021 - arxiv.org

Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …

Зберегти Послатися Цитовано в 970 джерелах Пов’язані статті Кількість версій: 11 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning audio-visual speech representation by masked multimodal cluster prediction

B Shi, WN Hsu, K Lakhotia, A Mohamed - arxiv preprint arxiv:2201.02184, 2022 - arxiv.org

Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …

Зберегти Послатися Цитовано в 331 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Ssast: Self-supervised audio spectrogram transformer

Y Gong, CI Lai, YA Chung, J Glass - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

Recently, neural networks based purely on self-attention, such as the Vision Transformer
(ViT), have been shown to outperform deep learning models constructed with convolutional …

Зберегти Послатися Цитовано в 317 джерелах Пов’язані статті Кількість версій: 11 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

Digital medicine and the curse of dimensionality

V Berisha, C Krantsevich, PR Hahn, S Hahn… - NPJ digital …, 2021 - nature.com

Digital health data are multimodal and high-dimensional. A patient's health state can be
characterized by a multitude of signals including medical imaging, clinical variables …

Зберегти Послатися Цитовано в 263 джерелах Пов’язані статті Кількість версій: 10

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wav2clip: Learning robust audio representations from clip

HH Wu, P Seetharaman, K Kumar… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

We propose Wav2CLIP, a robust audio representation learning method by distilling from
Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on …

Зберегти Послатися Цитовано в 289 джерелах Пов’язані статті Кількість версій: 10

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Multi-task self-supervised learning for robust speech recognition

A review of deep learning techniques for speech processing

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

XLS-R: Self-supervised cross-lingual speech representation learning at scale

Self-supervised speech representation learning: A review

Superb: Speech processing universal performance benchmark

Learning audio-visual speech representation by masked multimodal cluster prediction

Ssast: Self-supervised audio spectrogram transformer

Digital medicine and the curse of dimensionality

Wav2clip: Learning robust audio representations from clip