Typology of risks of generative text-to-image models

C Bird, E Ungless, A Kasirzadeh - Proceedings of the 2023 AAAI/ACM …, 2023 - dl.acm.org
This paper investigates the direct risks and harms associated with modern text-to-image
generative models, such as DALL-E and Midjourney, through a comprehensive literature …

Metaverse wearables for immersive digital healthcare: a review

K Kim, H Yang, J Lee, WG Lee - Advanced Science, 2023 - Wiley Online Library
The recent exponential growth of metaverse technology has been instrumental in resha**
a myriad of sectors, not least digital healthcare. This comprehensive review critically …

Text-to-image diffusion models in generative ai: A survey

C Zhang, C Zhang, M Zhang, IS Kweon - arxiv preprint arxiv:2303.07909, 2023 - arxiv.org
This survey reviews text-to-image diffusion models in the context that diffusion models have
emerged to be popular for a wide range of generative tasks. As a self-contained work, this …

Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering

Y Hu, B Liu, J Kasai, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite thousands of researchers, engineers, and artists actively working on improving text-
to-image generation models, systems often fail to produce images that accurately align with …

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer
Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

Agiqa-3k: An open database for ai-generated image quality assessment

C Li, Z Zhang, H Wu, W Sun, X Min… - … on Circuits and …, 2023 - ieeexplore.ieee.org
With the rapid advancements of the text-to-image generative model, AI-generated images
(AGIs) have been widely applied to entertainment, education, social media, etc. However …

[PDF][PDF] Multimodal image synthesis and editing: A survey

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - arxiv preprint arxiv …, 2022 - pure.mpg.de
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models

J Cho, A Zala, M Bansal - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Recently, DALL-E, a multimodal transformer language model, and its variants including
diffusion models have shown high-quality text-to-image generation capabilities. However …

Clipdraw: Exploring text-to-drawing synthesis through language-image encoders

K Frans, L Soros, O Witkowski - Advances in Neural …, 2022 - proceedings.neurips.cc
CLIPDraw is an algorithm that synthesizes novel drawings from natural language input. It
does not require any additional training; rather, a pre-trained CLIP language-image encoder …

Diffusion models, image super-resolution, and everything: A survey

BB Moser, AS Shanbhag, F Raue… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
Diffusion models (DMs) have disrupted the image super-resolution (SR) field and further
closed the gap between image quality and human perceptual preferences. They are easy to …