Language-guided hierarchical fine-grained image forgery detection and localization

X Guo, X Liu, I Masi, X Liu - International Journal of Computer Vision, 2024 - Springer
Differences in forgery attributes of images generated in CNN-synthesized and image-editing
domains are large, and such differences make a unified image forgery detection and …

A review of multimodal explainable artificial intelligence: Past, present and future

S Sun, W An, F Tian, F Nan, Q Liu, J Liu, N Shah… - arxiv preprint arxiv …, 2024 - arxiv.org
Artificial intelligence (AI) has rapidly developed through advancements in computational
power and the growth of massive datasets. However, this progress has also heightened …

Fakeshield: Explainable image forgery detection and localization via multi-modal large language models

Z Xu, X Zhang, R Li, Z Tang, Q Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of generative AI is a double-edged sword, which not only facilitates
content creation but also makes image manipulation easier and more difficult to detect …

A Hitchhiker's Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning

NM Foteinopoulou, E Ghorbel… - Advances in Neural …, 2025 - proceedings.neurips.cc
Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like
face forgery detection, where viewers often struggle to distinguish between real and …

On learning multi-modal forgery representation for diffusion generated video detection

X Song, X Guo, J Zhang, Q Li, L Bai, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large numbers of synthesized videos from diffusion models pose threats to information
security and authenticity, leading to an increasing demand for generated content detection …

Ffaa: Multimodal large language model based explainable open-world face forgery analysis assistant

Z Huang, B **a, Z Lin, Z Mou, W Yang - arxiv preprint arxiv:2408.10072, 2024 - arxiv.org
The rapid advancement of deepfake technologies has sparked widespread public concern,
particularly as face forgery poses a serious threat to public information security. However …

Spartun3d: Situated spatial understanding of 3d world in large language models

Y Zhang, Z Xu, Y Shen, P Kordjamshidi… - arxiv preprint arxiv …, 2024 - arxiv.org
Integrating the 3D world into large language models (3D-based LLMs) has been a
promising research direction for 3D scene understanding. However, current 3D-based LLMs …

Narrowing the gap between vision and action in navigation

Y Zhang, P Kordjamshidi - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
The existing methods for Vision and Language Navigation in the Continuous Environment
(VLN-CE) commonly incorporate a waypoint predictor to discretize the environment. This …

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Z Huang, J Hu, X Li, Y He, X Zhao, B Peng… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of generative models in creating highly realistic images poses
substantial risks for misinformation dissemination. For instance, a synthetic image, when …

MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection

Y Zhang, T Wang, Z Yu, Z Gao, L Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of photo-realistic face generation methods has raised significant
concerns in society and academia, highlighting the urgent need for robust and generalizable …