Deepfakes generation and detection: a short survey

Z Akhtar - Journal of Imaging, 2023 - mdpi.com
Advancements in deep learning techniques and the availability of free, large databases
have made it possible, even for non-technical people, to either manipulate or generate …

T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation

K Huang, K Sun, E **e, Z Li… - Advances in Neural …, 2023 - proceedings.neurips.cc
Despite the stunning ability to generate high-quality images by recent text-to-image models,
current approaches often struggle to effectively compose objects with different attributes and …

Conditional image-to-video generation with latent flow diffusion models

H Ni, C Shi, K Li, SX Huang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video
starting from an image (eg, a person's face) and a condition (eg, an action class label like …

Collaborative diffusion for multi-modal face generation and editing

Z Huang, KCK Chan, Y Jiang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Diffusion models arise as a powerful generative tool recently. Despite the great progress,
existing diffusion models mainly focus on uni-modal control, ie, the diffusion process is …

Crepe: Can vision-language foundation models reason compositionally?

Z Ma, J Hong, MO Gul, M Gandhi… - Proceedings of the …, 2023 - openaccess.thecvf.com
A fundamental characteristic common to both human vision and natural language is their
compositional nature. Yet, despite the performance gains contributed by large vision and …

Llmscore: Unveiling the power of large language models in text-to-image synthesis evaluation

Y Lu, X Yang, X Li, XE Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Existing automatic evaluation on text-to-image synthesis can only provide an image-text
matching score, without considering the object-level compositionality, which results in poor …

Toward verifiable and reproducible human evaluation for text-to-image generation

M Otani, R Togashi, Y Sawai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human evaluation is critical for validating the performance of text-to-image generative
models, as this highly cognitive process requires deep comprehension of text and images …

Videogen: A reference-guided latent diffusion approach for high definition text-to-video generation

X Li, W Chu, Y Wu, W Yuan, F Liu, Q Zhang, F Li… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we present VideoGen, a text-to-video generation approach, which can
generate a high-definition video with high frame fidelity and strong temporal consistency …

T2v-compbench: A comprehensive benchmark for compositional text-to-video generation

K Sun, K Huang, X Liu, Y Wu, Z Xu, Z Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-video (T2V) generation models have advanced significantly, yet their ability to
compose different objects, attributes, actions, and motions into a video remains unexplored …

[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …