Hig: Hierarchical interlacement graph approach to scene graph generation in video understanding
Visual interactivity understanding within visual scenes presents a significant challenge in
computer vision. Existing methods focus on complex interactivities while leveraging a simple …
computer vision. Existing methods focus on complex interactivities while leveraging a simple …
[HTML][HTML] Style-aware two-stage learning framework for video captioning
Significant progress has been made in video captioning in recent years. However, most
existing methods directly learn from all given captions without distinguishing the styles of …
existing methods directly learn from all given captions without distinguishing the styles of …
Towards unified multimodal editing with enhanced knowledge collaboration
The swift advancement in Multimodal LLMs (MLLMs) also presents significant challenges for
effective knowledge editing. Current methods, including intrinsic knowledge editing and …
effective knowledge editing. Current methods, including intrinsic knowledge editing and …
Contextual Augmented Global Contrast for Multimodal Intent Recognition
Multimodal intent recognition (MIR) aims to perceive the human intent polarity via language
visual and acoustic modalities. The inherent intent ambiguity makes it challenging to …
visual and acoustic modalities. The inherent intent ambiguity makes it challenging to …
Low-rank Prompt Interaction for Continual Vision-Language Retrieval
Research on continual learning in multi-modal tasks has been receiving increasing
attention. However, most existing work overlooks the explicit cross-modal and cross-task …
attention. However, most existing work overlooks the explicit cross-modal and cross-task …
Semantic Alignment for Multimodal Large Language Models
Research on M ulti-modal L arge L anguage M odel s (MLLMs) towards the multi-image
cross-modal instruction has received increasing attention and made significant progress …
cross-modal instruction has received increasing attention and made significant progress …
Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding
T **, W Yan, Y Wang, S Cai, Q Shuai… - Proceedings of the 32nd …, 2024 - dl.acm.org
In the field of machine learning, continual learning is a crucial concept that allows models to
adapt to non-stationary data distributions. However, most of the existing works focus on uni …
adapt to non-stationary data distributions. However, most of the existing works focus on uni …
CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos
Video scene graph generation (VidSGG) has emerged as a transformative approach to
capturing and interpreting the intricate relationships among objects and their temporal …
capturing and interpreting the intricate relationships among objects and their temporal …
Subject-Oriented Video Captioning
Describing video content according to users' needs is a long-held goal. Although existing
video captioning methods have made significant progress, the generated captions may not …
video captioning methods have made significant progress, the generated captions may not …
: Exploring Embodied Emotion Through A Large-Scale Egocentric Video Dataset
Understanding human emotions is fundamental to enhancing human-computer interaction,
especially for embodied agents that mimic human behavior. Traditional emotion analysis …
especially for embodied agents that mimic human behavior. Traditional emotion analysis …