A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Diffusion Models for Image Restoration and Enhancement--A Comprehensive Survey

X Li, Y Ren, X **, C Lan, X Wang, W Zeng… - arxiv preprint arxiv …, 2023 - arxiv.org
Image restoration (IR) has been an indispensable and challenging task in the low-level
vision field, which strives to improve the subjective quality of images distorted by various …

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

R Huang, J Huang, D Yang, Y Ren… - International …, 2023 - proceedings.mlr.press
Large-scale multimodal generative modeling has created milestones in text-to-image and
text-to-video generation. Its application to audio still lags behind for two main reasons: the …

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

Audiogpt: Understanding and generating speech, music, sound, and talking head

R Huang, M Li, D Yang, J Shi, X Chang, Z Ye… - Proceedings of the …, 2024 - ojs.aaai.org
Large language models (LLMs) have exhibited remarkable capabilities across a variety of
domains and tasks, challenging our understanding of learning and cognition. Despite the …

Universal guidance for diffusion models

A Bansal, HM Chu, A Schwarzschild… - Proceedings of the …, 2023 - openaccess.thecvf.com
Typical diffusion models are trained to accept a particular form of conditioning, most
commonly text, and cannot be conditioned on other modalities without retraining. In this …

Text-to-audio generation using instruction-tuned llm and latent diffusion model

D Ghosal, N Majumder, A Mehrish, S Poria - arxiv preprint arxiv …, 2023 - arxiv.org
The immense scale of the recent large language models (LLM) allows many interesting
properties, such as, instruction-and chain-of-thought-based fine-tuning, that has significantly …

Prodiff: Progressive fast diffusion model for high-quality text-to-speech

R Huang, Z Zhao, H Liu, J Liu, C Cui… - Proceedings of the 30th …, 2022 - dl.acm.org
Denoising diffusion probabilistic models (DDPMs) have recently achieved leading
performances in many generative tasks. However, the inherited iterative sampling process …

How to backdoor diffusion models?

SY Chou, PY Chen, TY Ho - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Diffusion models are state-of-the-art deep learning empowered generative models that are
trained based on the principle of learning forward and reverse diffusion processes via …

Editing implicit assumptions in text-to-image diffusion models

H Orgad, B Kawar, Y Belinkov - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Text-to-image diffusion models often make implicit assumptions about the world when
generating images. While some assumptions are useful (eg, the sky is blue), they can also …