Google Академик

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Сачувај Цитирај 99 пута наведен Сродни чланци

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards audio language modeling--an overview

H Wu, X Chen, YC Lin, K Chang, HL Chung… - arxiv preprint arxiv …, 2024 - arxiv.org

Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …

Сачувај Цитирај 29 пута наведен Сродни чланци Све верзије (3) HTML верзија

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …

Сачувај Цитирај 142 пута наведен Сродни чланци Све верзије (8)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiobox: Unified audio generation with natural language prompts

A Vyas, B Shi, M Le, A Tjandra, YC Wu, B Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

Audio is an essential part of our life, but creating it often requires expertise and is time-
consuming. Research communities have made great progress over the past year advancing …

Сачувај Цитирај 91 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Сачувај Цитирај 113 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Discrete flow matching

I Gat, T Remez, N Shaul, F Kreuk… - Advances in …, 2025 - proceedings.neurips.cc

Abstract Despite Flow Matching and diffusion models having emerged as powerful
generative paradigms for continuous variables such as images and videos, their application …

Сачувај Цитирај 34 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Anygpt: Unified multimodal llm with discrete sequence modeling

J Zhan, J Dai, J Ye, Y Zhou, D Zhang, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete
representations for the unified processing of various modalities, including speech, text …

Сачувај Цитирај 94 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Soundstorm: Efficient parallel audio generation

Z Borsos, M Sharifi, D Vincent, E Kharitonov… - arxiv preprint arxiv …, 2023 - arxiv.org

We present SoundStorm, a model for efficient, non-autoregressive audio generation.
SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional …

Сачувај Цитирај 102 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lauragpt: Listen, attend, understand, and regenerate audio with gpt

Z Du, J Wang, Q Chen, Y Chu, Z Gao, Z Li, K Hu… - arxiv preprint arxiv …, 2023 - arxiv.org

Generative Pre-trained Transformer (GPT) models have achieved remarkable performance
on various natural language processing tasks, and have shown great potential as …

Сачувај Цитирај 69 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Music controlnet: Multiple time-varying controls for music generation

SL Wu, C Donahue, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Text-to-music generation models are now capable of generating high-quality music audio in
broad styles. However, text control is primarily suitable for the manipulation of global musical …

Сачувај Цитирај 58 пута наведен Сродни чланци Све верзије (4)

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Simple and controllable music generation

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Towards audio language modeling--an overview

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

Audiobox: Unified audio generation with natural language prompts

Uniaudio: An audio foundation model toward universal audio generation

Discrete flow matching

Anygpt: Unified multimodal llm with discrete sequence modeling

Soundstorm: Efficient parallel audio generation

Lauragpt: Listen, attend, understand, and regenerate audio with gpt

Music controlnet: Multiple time-varying controls for music generation