On evaluating adversarial robustness of large vision-language models

Y Zhao, T Pang, C Du, X Yang, C Li… - Advances in …, 2023 - proceedings.neurips.cc
Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented
performance in response generation, especially with visual inputs, enabling more creative …

Text-to-image diffusion models in generative ai: A survey

C Zhang, C Zhang, M Zhang, IS Kweon - arxiv preprint arxiv:2303.07909, 2023 - arxiv.org
This survey reviews text-to-image diffusion models in the context that diffusion models have
emerged to be popular for a wide range of generative tasks. As a self-contained work, this …

One transformer fits all distributions in multi-modal diffusion at scale

F Bao, S Nie, K Xue, C Li, S Pu… - International …, 2023 - proceedings.mlr.press
This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions
relevant to a set of multi-modal data in one model. Our key insight is–learning diffusion …

Show-o: One single transformer to unify multimodal understanding and generation

J **e, W Mao, Z Bai, DJ Zhang, W Wang, KQ Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
We present a unified transformer, ie, Show-o, that unifies multimodal understanding and
generation. Unlike fully autoregressive models, Show-o unifies autoregressive and …

Online clustered codebook

C Zheng, A Vedaldi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is
increasingly used in representation learning. However, optimizing the codevectors in …

A reparameterized discrete diffusion model for text generation

L Zheng, J Yuan, L Yu, L Kong - arxiv preprint arxiv:2302.05737, 2023 - arxiv.org
This work studies discrete diffusion probabilistic models with applications to natural
language generation. We derive an alternative yet equivalent formulation of the sampling …

Diffusion models for non-autoregressive text generation: A survey

Y Li, K Zhou, WX Zhao, JR Wen - arxiv preprint arxiv:2303.06574, 2023 - arxiv.org
Non-autoregressive (NAR) text generation has attracted much attention in the field of natural
language processing, which greatly reduces the inference latency but has to sacrifice the …

Diffuseq-v2: Bridging discrete and continuous text spaces for accelerated seq2seq diffusion models

S Gong, M Li, J Feng, Z Wu, L Kong - arxiv preprint arxiv:2310.05793, 2023 - arxiv.org
Diffusion models have gained prominence in generating high-quality sequences of text.
Nevertheless, current approaches predominantly represent discrete text within a continuous …

What does stable diffusion know about the 3d scene?

G Zhan, C Zheng, W **e, A Zisserman - 2023 - openreview.net
Recent advances in generative models like Stable Diffusion enable the generation of highly
photo-realistic images. Our objective in this paper is to probe the diffusion network to …

Cocktail: Mixing multi-modality control for text-conditional image generation

M Hu, J Zheng, D Liu, C Zheng, C Wang… - … on Neural Information …, 2023 - openreview.net
Text-conditional diffusion models are able to generate high-fidelity images with diverse
contents. However, linguistic representations frequently exhibit ambiguous descriptions of …