AnyFace++: A unified framework for free-style text-to-face synthesis and manipulation

J Sun, Q Deng, Q Li, M Sun, Y Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Human faces contain rich semantic information that could hardly be described without a
large vocabulary and complex sentence patterns. However, most existing text-to-image …

Advancing Causal Intervention in Image Captioning With Causal Prompt

Y Yu, Y Kim, YM Ro - IEEE Transactions on Neural Networks …, 2024 - ieeexplore.ieee.org
This article introduces a novel approach, called causal prompting network (CPNet), to
enhance the causal intervention in the context of image captioning. By leveraging visual …

Multimodal Sentiment Analysis Based on Causal Reasoning

F Chen, P Huang, X Ge, J Huang, Z Bao - arxiv preprint arxiv:2412.07292, 2024 - arxiv.org
With the rapid development of multimedia, the shift from unimodal textual sentiment analysis
to multimodal image-text sentiment analysis has obtained academic and industrial attention …

VCD: Visual Causality Discovery for Cross-Modal Question Reasoning

Y Liu, Y Tan, J Luo, W Chen - Chinese Conference on Pattern …, 2023 - Springer
Existing visual question reasoning methods usually fail to explicitly discover the inherent
causal mechanism and ignore jointly modeling cross-modal event temporality and causality …

MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments

Y Liu, X Song, K Jiang, W Chen, J Luo, G Li… - arxiv preprint arxiv …, 2024 - arxiv.org
With the surge in the development of large language models, embodied intelligence has
attracted increasing attention. Nevertheless, prior works on embodied intelligence typically …