Language-guided hierarchical fine-grained image forgery detection and localization
Differences in forgery attributes of images generated in CNN-synthesized and image-editing
domains are large, and such differences make a unified image forgery detection and …
domains are large, and such differences make a unified image forgery detection and …
A review of multimodal explainable artificial intelligence: Past, present and future
Artificial intelligence (AI) has rapidly developed through advancements in computational
power and the growth of massive datasets. However, this progress has also heightened …
power and the growth of massive datasets. However, this progress has also heightened …
Fakeshield: Explainable image forgery detection and localization via multi-modal large language models
The rapid development of generative AI is a double-edged sword, which not only facilitates
content creation but also makes image manipulation easier and more difficult to detect …
content creation but also makes image manipulation easier and more difficult to detect …
On learning multi-modal forgery representation for diffusion generated video detection
Large numbers of synthesized videos from diffusion models pose threats to information
security and authenticity, leading to an increasing demand for generated content detection …
security and authenticity, leading to an increasing demand for generated content detection …
Ffaa: Multimodal large language model based explainable open-world face forgery analysis assistant
The rapid advancement of deepfake technologies has sparked widespread public concern,
particularly as face forgery poses a serious threat to public information security. However …
particularly as face forgery poses a serious threat to public information security. However …
Spartun3d: Situated spatial understanding of 3d world in large language models
Integrating the 3D world into large language models (3D-based LLMs) has been a
promising research direction for 3D scene understanding. However, current 3D-based LLMs …
promising research direction for 3D scene understanding. However, current 3D-based LLMs …
Narrowing the gap between vision and action in navigation
The existing methods for Vision and Language Navigation in the Continuous Environment
(VLN-CE) commonly incorporate a waypoint predictor to discretize the environment. This …
(VLN-CE) commonly incorporate a waypoint predictor to discretize the environment. This …
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
The rapid advancement of generative models in creating highly realistic images poses
substantial risks for misinformation dissemination. For instance, a synthetic image, when …
substantial risks for misinformation dissemination. For instance, a synthetic image, when …
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection
The rapid development of photo-realistic face generation methods has raised significant
concerns in society and academia, highlighting the urgent need for robust and generalizable …
concerns in society and academia, highlighting the urgent need for robust and generalizable …
Attention Consistency Refined Masked Frequency Forgery Representation for Generalizing Face Forgery Detection
Due to the successful development of deep image generation technology, visual data
forgery detection would play a more important role in social and economic security. Existing …
forgery detection would play a more important role in social and economic security. Existing …