- Academic Search

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Zapisz Cytuj Cytowane przez 35 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research

X Mei, C Meng, H Liu, Q Kong, T Ko… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

The advancement of audio-language (AL) multimodal learning tasks has been significant in
recent years, yet the limited size of existing audio-language datasets poses challenges for …

Zapisz Cytuj Cytowane przez 155 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiobox: Unified audio generation with natural language prompts

A Vyas, B Shi, M Le, A Tjandra, YC Wu, B Guo… - arxiv preprint arxiv …, 2023 - arxiv.org

Audio is an essential part of our life, but creating it often requires expertise and is time-
consuming. Research communities have made great progress over the past year advancing …

Zapisz Cytuj Cytowane przez 89 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com

The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …

Zapisz Cytuj Cytowane przez 7 Powiązane artykuły Kopia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Masked generative video-to-audio transformers with enhanced synchronicity

S Pascual, C Yeh, I Tsiamas, J Serrà - European Conference on Computer …, 2024 - Springer

Abstract Video-to-audio (V2A) generation leverages visual-only video features to render
plausible sounds that match the scene. Importantly, the generated sound onsets should …

Zapisz Cytuj Cytowane przez 8 Powiązane artykuły Wszystkie wersje 6

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving text-to-audio models with synthetic captions

Z Kong, S Lee, D Ghosal, N Majumder… - arxiv preprint arxiv …, 2024 - arxiv.org

It is an open challenge to obtain high quality training data, especially captions, for text-to-
audio models. Although prior methods have leveraged\textit {text-only language models} to …

Zapisz Cytuj Cytowane przez 9 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ditto: Diffusion inference-time t-optimization for music generation

Z Novack, J McAuley, T Berg-Kirkpatrick… - arxiv preprint arxiv …, 2024 - arxiv.org

We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-
work for controlling pre-trained text-to-music diffusion models at inference-time via …

Zapisz Cytuj Cytowane przez 25 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Picoaudio: Enabling precise timestamp and frequency controllability of audio events in text-to-audio generation

Z **e, X Xu, Z Wu, M Wu - arxiv preprint arxiv:2407.02869, 2024 - arxiv.org

Recently, audio generation tasks have attracted considerable research interests. Precise
temporal controllability is essential to integrate audio generation with real applications. In …

Zapisz Cytuj Cytowane przez 11 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lcfed: An efficient clustered federated learning framework for heterogeneous data

Y Zhang, H Chen, Z Lin, Z Chen, J Zhao - arxiv preprint arxiv:2501.01850, 2025 - arxiv.org

Clustered federated learning (CFL) addresses the performance challenges posed by data
heterogeneity in federated learning (FL) by organizing edge devices with similar data …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Musicflow: Cascaded flow matching for text guided music generation

KR Prajwal, B Shi, M Lee, A Vyas, A Tjandra… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce MusicFlow, a cascaded text-to-music generation model based on flow
matching. Based on self-supervised representations to bridge between text descriptions and …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

Sparks of large audio models: A survey and outlook

Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research

Audiobox: Unified audio generation with natural language prompts

[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Masked generative video-to-audio transformers with enhanced synchronicity

Improving text-to-audio models with synthetic captions

Ditto: Diffusion inference-time t-optimization for music generation

Picoaudio: Enabling precise timestamp and frequency controllability of audio events in text-to-audio generation

Lcfed: An efficient clustered federated learning framework for heterogeneous data

Musicflow: Cascaded flow matching for text guided music generation