Language-guided hierarchical fine-grained image forgery detection and localization

X Guo, X Liu, I Masi, X Liu - International Journal of Computer Vision, 2024 - Springer
Differences in forgery attributes of images generated in CNN-synthesized and image-editing
domains are large, and such differences make a unified image forgery detection and …

A review of multimodal explainable artificial intelligence: Past, present and future

S Sun, W An, F Tian, F Nan, Q Liu, J Liu, N Shah… - arxiv preprint arxiv …, 2024 - arxiv.org
Artificial intelligence (AI) has rapidly developed through advancements in computational
power and the growth of massive datasets. However, this progress has also heightened …

Fakeshield: Explainable image forgery detection and localization via multi-modal large language models

Z Xu, X Zhang, R Li, Z Tang, Q Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of generative AI is a double-edged sword, which not only facilitates
content creation but also makes image manipulation easier and more difficult to detect …

On learning multi-modal forgery representation for diffusion generated video detection

X Song, X Guo, J Zhang, Q Li, L Bai, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large numbers of synthesized videos from diffusion models pose threats to information
security and authenticity, leading to an increasing demand for generated content detection …

Ffaa: Multimodal large language model based explainable open-world face forgery analysis assistant

Z Huang, B **a, Z Lin, Z Mou, W Yang - arxiv preprint arxiv:2408.10072, 2024 - arxiv.org
The rapid advancement of deepfake technologies has sparked widespread public concern,
particularly as face forgery poses a serious threat to public information security. However …

Spartun3d: Situated spatial understanding of 3d world in large language models

Y Zhang, Z Xu, Y Shen, P Kordjamshidi… - arxiv preprint arxiv …, 2024 - arxiv.org
Integrating the 3D world into large language models (3D-based LLMs) has been a
promising research direction for 3D scene understanding. However, current 3D-based LLMs …

Narrowing the gap between vision and action in navigation

Y Zhang, P Kordjamshidi - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
The existing methods for Vision and Language Navigation in the Continuous Environment
(VLN-CE) commonly incorporate a waypoint predictor to discretize the environment. This …

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Z Huang, J Hu, X Li, Y He, X Zhao, B Peng… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of generative models in creating highly realistic images poses
substantial risks for misinformation dissemination. For instance, a synthetic image, when …

MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection

Y Zhang, T Wang, Z Yu, Z Gao, L Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of photo-realistic face generation methods has raised significant
concerns in society and academia, highlighting the urgent need for robust and generalizable …

Attention Consistency Refined Masked Frequency Forgery Representation for Generalizing Face Forgery Detection

D Liu, T Chen, C Peng, N Wang, R Hu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Due to the successful development of deep image generation technology, visual data
forgery detection would play a more important role in social and economic security. Existing …