- Academic Search

Mert: Acoustic music understanding model with large-scale self-supervised training

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …

Spara Citera Citerat av 141 Relaterade artiklar Alla 8 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Spara Citera Citerat av 12 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audio flamingo: A novel audio language model with few-shot learning and dialogue abilities

Z Kong, A Goel, R Badlani, W **, R Valle… - arxiv preprint arxiv …, 2024 - arxiv.org

Augmenting large language models (LLMs) to understand audio--including non-speech
sounds and non-verbal speech--is critically important for diverse real-world applications of …

Spara Citera Citerat av 66 Relaterade artiklar Alla 8 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal pretraining, adaptation, and generation for recommendation: A survey

Q Liu, J Zhu, Y Yang, Q Dai, Z Du, XM Wu… - Proceedings of the 30th …, 2024 - dl.acm.org

Personalized recommendation serves as a ubiquitous channel for users to discover
information tailored to their interests. However, traditional recommendation models primarily …

Spara Citera Citerat av 23 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adapting frechet audio distance for generative music evaluation

A Gui, H Gamper, S Braun… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

The growing popularity of generative music models underlines the need for perceptually
relevant, objective music quality metrics. The Frechet Audio Distance (FAD) is commonly …

Spara Citera Citerat av 53 Relaterade artiklar Alla 4 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Music understanding llama: Advancing text-to-music generation with question answering and captioning

S Liu, AS Hussain, C Sun… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale
publicly available music datasets with natural language captions. To address this, we …

Spara Citera Citerat av 44 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MUGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models

S Liu, AS Hussain, C Sun, Y Shan - arxiv preprint arxiv:2311.11255, 2023 - arxiv.org

The current landscape of research leveraging large language models (LLMs) is
experiencing a surge. Many works harness the powerful reasoning capabilities of these …

Spara Citera Citerat av 31 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Marble: Music audio representation benchmark for universal evaluation

R Yuan, Y Ma, Y Li, G Zhang, X Chen… - Advances in …, 2023 - proceedings.neurips.cc

In the era of extensive intersection between art and Artificial Intelligence (AI), such as image
generation and fiction co-creation, AI for music remains relatively nascent, particularly in …

Spara Citera Citerat av 24 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llms meet multimodal generation and editing: A survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

Spara Citera Citerat av 16 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

Spara Citera Citerat av 2 Relaterade artiklar Alla 2 versionerna Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Mert: Acoustic music understanding model with large-scale self-supervised training

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

Foundation models for music: A survey

Audio flamingo: A novel audio language model with few-shot learning and dialogue abilities

Multimodal pretraining, adaptation, and generation for recommendation: A survey

Adapting frechet audio distance for generative music evaluation

Music understanding llama: Advancing text-to-music generation with question answering and captioning

MUGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models

Marble: Music audio representation benchmark for universal evaluation

Llms meet multimodal generation and editing: A survey

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey