Deep learning modelling techniques: current progress, applications, advantages, and challenges

SF Ahmed, MSB Alam, M Hassan, MR Rozbu… - Artificial Intelligence …, 2023 - Springer
Deep learning (DL) is revolutionizing evidence-based decision-making techniques that can
be applied across various sectors. Specifically, it possesses the ability to utilize two or more …

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Make-a-video: Text-to-video generation without text-video data

U Singer, A Polyak, T Hayes, X Yin, J An… - arxiv preprint arxiv …, 2022 - arxiv.org
We propose Make-A-Video--an approach for directly translating the tremendous recent
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple …

Show-1: Marrying pixel and latent diffusion models for text-to-video generation

DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu… - International Journal of …, 2024 - Springer
Significant advancements have been achieved in the realm of large-scale pre-trained text-to-
video Diffusion Models (VDMs). However, previous methods either rely solely on pixel …

[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation

J Yu, Y Xu, JY Koh, T Luong, G Baid, Z Wang… - arxiv preprint arxiv …, 2022 - 3dvar.com
Abstract We present the Pathways [1] Autoregressive Text-to-Image (Parti) model, which
generates high-fidelity photorealistic images and supports content-rich synthesis involving …

Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering

Y Hu, B Liu, J Kasai, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite thousands of researchers, engineers, and artists actively working on improving text-
to-image generation models, systems often fail to produce images that accurately align with …

Spatext: Spatio-textual representation for controllable image generation

O Avrahami, T Hayes, O Gafni… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent text-to-image diffusion models are able to generate convincing results of
unprecedented quality. However, it is nearly impossible to control the shapes of different …

Make-a-scene: Scene-based text-to-image generation with human priors

O Gafni, A Polyak, O Ashual, S Sheynin… - … on Computer Vision, 2022 - Springer
Recent text-to-image generation methods provide a simple yet exciting conversion capability
between text and image domains. While these methods have incrementally improved the …

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

Layoutdiffusion: Controllable diffusion model for layout-to-image generation

G Zheng, X Zhou, X Li, Z Qi… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, diffusion models have achieved great success in image synthesis. However, when
it comes to the layout-to-image generation where an image often has a complex scene of …