A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Diffusion Models for Image Restoration and Enhancement--A Comprehensive Survey
Image restoration (IR) has been an indispensable and challenging task in the low-level
vision field, which strives to improve the subjective quality of images distorted by various …
vision field, which strives to improve the subjective quality of images distorted by various …
Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models
Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …
text-to-video generation. Its application to audio still lags behind for two main reasons: the …
Voicebox: Text-guided multilingual universal speech generation at scale
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …
community. These models not only generate high fidelity outputs, but are also generalists …
Audiogpt: Understanding and generating speech, music, sound, and talking head
Large language models (LLMs) have exhibited remarkable capabilities across a variety of
domains and tasks, challenging our understanding of learning and cognition. Despite the …
domains and tasks, challenging our understanding of learning and cognition. Despite the …
Universal guidance for diffusion models
Typical diffusion models are trained to accept a particular form of conditioning, most
commonly text, and cannot be conditioned on other modalities without retraining. In this …
commonly text, and cannot be conditioned on other modalities without retraining. In this …
Text-to-audio generation using instruction-tuned llm and latent diffusion model
The immense scale of the recent large language models (LLM) allows many interesting
properties, such as, instruction-and chain-of-thought-based fine-tuning, that has significantly …
properties, such as, instruction-and chain-of-thought-based fine-tuning, that has significantly …
Prodiff: Progressive fast diffusion model for high-quality text-to-speech
Denoising diffusion probabilistic models (DDPMs) have recently achieved leading
performances in many generative tasks. However, the inherited iterative sampling process …
performances in many generative tasks. However, the inherited iterative sampling process …
How to backdoor diffusion models?
Diffusion models are state-of-the-art deep learning empowered generative models that are
trained based on the principle of learning forward and reverse diffusion processes via …
trained based on the principle of learning forward and reverse diffusion processes via …
Editing implicit assumptions in text-to-image diffusion models
Text-to-image diffusion models often make implicit assumptions about the world when
generating images. While some assumptions are useful (eg, the sky is blue), they can also …
generating images. While some assumptions are useful (eg, the sky is blue), they can also …