Long-form music generation with latent diffusion

Z Evans, JD Parker, CJ Carr, Z Zukowski… - arxiv preprint arxiv …, 2024 - arxiv.org
Audio-based generative models for music have seen great strides recently, but so far have
not managed to produce full-length music tracks with coherent musical structure from text …

Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Enhancing zero-shot text-to-speech synthesis with human feedback

C Chen, Y Hu, W Wu, H Wang, ES Chng… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, text-to-speech (TTS) technology has witnessed impressive advancements,
particularly with large-scale training datasets, showcasing human-level speech quality and …

Emo-dpo: Controllable emotional speech synthesis through direct preference optimization

X Gao, C Zhang, Y Chen, H Zhang, NF Chen - arxiv preprint arxiv …, 2024 - arxiv.org
Current emotional text-to-speech (TTS) models predominantly conduct supervised training
to learn the conversion from text and desired emotion to its emotional speech, focusing on a …

Crafting Creative Melodies: A User-Centric Approach for Symbolic Music Generation

S Dadman, BA Bremdal - Electronics, 2024 - mdpi.com
Composing coherent and structured music is one of the main challenges in symbolic music
generation. Our research aims to propose a user-centric framework design that promotes a …

Seed-music: A unified framework for high quality and controlled music generation

Y Bai, H Chen, J Chen, Z Chen, Y Deng… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Seed-Music, a suite of music generation systems capable of producing high-
quality music with fine-grained style control. Our unified framework leverages both auto …

MusicScore: A Dataset for Music Score Modeling and Generation

Y Lin, Z Dai, Q Kong - arxiv preprint arxiv:2406.11462, 2024 - arxiv.org
Music scores are written representations of music and contain rich information about musical
components. The visual information on music scores includes notes, rests, staff lines, clefs …

Dynamic normativity: Necessary and sufficient conditions for value alignment

NK Corrêa - arxiv preprint arxiv:2406.11039, 2024 - arxiv.org
The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence
across all Humanities disciplines, revolves around the intricacies of morality and normativity …

Video Echoed in Harmony: Learning and Sampling Video-Integrated Chord Progression Sequences for Controllable Video Background Music Generation

X Tong, S Chen, P Yu, N Liu, H Qv, T Ma… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Automatically generating video background music mitigates the inefficiency and time-
consuming drawbacks of current manual video editing. Two key challenges hinder the …

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

X Di, Z Chen, Y Liang, J Zheng, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale text-to-speech (TTS) models have made significant progress recently. However,
they still fall short in the generation of Chinese dialectal speech. Toaddress this, we propose …