Google Akademik

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Kaydet Alıntı yap Alıntılanma sayısı: 406 İlgili makaleler 10 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Kaydet Alıntı yap Alıntılanma sayısı: 35 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Kaydet Alıntı yap Alıntılanma sayısı: 2447 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

G Team, P Georgiev, VI Lei, R Burnell, L Bai… - arxiv preprint arxiv …, 2024 - arxiv.org

In this report, we introduce the Gemini 1.5 family of models, representing the next generation
of highly compute-efficient multimodal models capable of recalling and reasoning over fine …

Kaydet Alıntı yap Alıntılanma sayısı: 997 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

Kaydet Alıntı yap Alıntılanma sayısı: 297 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Kaydet Alıntı yap Alıntılanma sayısı: 3882 İlgili makaleler 11 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

Kaydet Alıntı yap Alıntılanma sayısı: 298 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Kaydet Alıntı yap Alıntılanma sayısı: 1856 İlgili makaleler 5 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arxiv preprint arxiv …, 2021 - arxiv.org

This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

Kaydet Alıntı yap Alıntılanma sayısı: 723 İlgili makaleler 5 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

Kaydet Alıntı yap Alıntılanma sayısı: 335 İlgili makaleler 8 sürümün hepsi

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised...

Self-supervised speech representation learning: A review

Sparks of large audio models: A survey and outlook

The llama 3 herd of models

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Scaling speech technology to 1,000+ languages

Robust speech recognition via large-scale weak supervision

Google usm: Scaling automatic speech recognition beyond 100 languages

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

XLS-R: Self-supervised cross-lingual speech representation learning at scale

Dawn of the transformer era in speech emotion recognition: closing the valence gap