Študovňa Google

X Zhang, RR Chowdhury, RK Gupta… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have seen significant use in domains such as natural
language processing and computer vision. Going beyond text, image and graphics, LLMs …

Uložiť Citovať Citované 51-krát Súvisiace články Všetky verzie 10 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Uložiť Citovať Citované 35-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

Uložiť Citovať Citované 264-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arxiv preprint arxiv:2210.13438, 2022 - arxiv.org

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

Uložiť Citovať Citované 725-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mert: Acoustic music understanding model with large-scale self-supervised training

Y Li, R Yuan, G Zhang, Y Ma, X Chen, H Yin… - arxiv preprint arxiv …, 2023 - arxiv.org

Self-supervised learning (SSL) has recently emerged as a promising paradigm for training
generalisable models on large-scale data in the fields of vision, text, and speech. Although …

Uložiť Citovať Citované 101-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audio flamingo: A novel audio language model with few-shot learning and dialogue abilities

Z Kong, A Goel, R Badlani, W **, R Valle… - arxiv preprint arxiv …, 2024 - arxiv.org

Augmenting large language models (LLMs) to understand audio--including non-speech
sounds and non-verbal speech--is critically important for diverse real-world applications of …

Uložiť Citovať Citované 70-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

The internet of sounds: Convergent trends, insights, and future directions

L Turchet, M Lagrange, C Rottondi… - IEEE Internet of …, 2023 - ieeexplore.ieee.org

Current sound-based practices and systems developed in both academia and industry point
to convergent research trends that bring together the field of sound and music Computing …

Uložiť Citovať Citované 75-krát Súvisiace články Všetky verzie 14

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The voiceprivacy 2024 challenge evaluation plan

N Tomashenko, X Miao, P Champion, S Meyer… - arxiv preprint arxiv …, 2024 - arxiv.org

The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …

Uložiť Citovať Citované 99-krát Súvisiace články Všetky verzie 18 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling

S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo… - arxiv preprint arxiv …, 2024 - arxiv.org

Language models have been effectively applied to modeling natural signals, such as
images, video, speech, and audio. A crucial component of these models is the codec …

Uložiť Citovať Citované 27-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Music understanding llama: Advancing text-to-music generation with question answering and captioning

S Liu, AS Hussain, C Sun… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale
publicly available music datasets with natural language captions. To address this, we …

Uložiť Citovať Citované 49-krát Súvisiace články Všetky verzie 5

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

The mtg-jamendo dataset for automatic music tagging

Large language models for time series: A survey

Sparks of large audio models: A survey and outlook

High-fidelity audio compression with improved rvqgan

High fidelity neural audio compression

Mert: Acoustic music understanding model with large-scale self-supervised training

Audio flamingo: A novel audio language model with few-shot learning and dialogue abilities

The internet of sounds: Convergent trends, insights, and future directions

The voiceprivacy 2024 challenge evaluation plan

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling

Music understanding llama: Advancing text-to-music generation with question answering and captioning