- Academic Search

TT Nguyen, P Nguyen, K Luu - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Visual interactivity understanding within visual scenes presents a significant challenge in
computer vision. Existing methods focus on complex interactivities while leveraging a simple …

Save Cite Cited by 9 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Style-aware two-stage learning framework for video captioning

Y Ma, Z Zhu, Y Qi, A Beheshti, Y Li, L Qing… - Knowledge-Based Systems, 2024 - Elsevier

Significant progress has been made in video captioning in recent years. However, most
existing methods directly learn from all given captions without distinguishing the styles of …

Save Cite Cited by 5 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Towards unified multimodal editing with enhanced knowledge collaboration

K Pan, Z Fan, J Li, Q Yu, H Fei, S Tang, R Hong… - arxiv preprint arxiv …, 2024 - arxiv.org

The swift advancement in Multimodal LLMs (MLLMs) also presents significant challenges for
effective knowledge editing. Current methods, including intrinsic knowledge editing and …

Save Cite Cited by 4 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Contextual Augmented Global Contrast for Multimodal Intent Recognition

K Sun, Z **e, M Ye, H Zhang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Multimodal intent recognition (MIR) aims to perceive the human intent polarity via language
visual and acoustic modalities. The inherent intent ambiguity makes it challenging to …

Save Cite Cited by 4 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Low-rank Prompt Interaction for Continual Vision-Language Retrieval

W Yan, Y Wang, W Lin, Z Guo, Z Zhao… - Proceedings of the 32nd …, 2024 - dl.acm.org

Research on continual learning in multi-modal tasks has been receiving increasing
attention. However, most existing work overlooks the explicit cross-modal and cross-task …

Save Cite Cited by 3 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Semantic Alignment for Multimodal Large Language Models

T Wu, M Li, J Chen, W Ji, W Lin, J Gao… - Proceedings of the …, 2024 - dl.acm.org

Research on M ulti-modal L arge L anguage M odel s (MLLMs) towards the multi-image
cross-modal instruction has received increasing attention and made significant progress …

Save Cite Cited by 1 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] openreview.net

Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding

T **, W Yan, Y Wang, S Cai, Q Shuai… - Proceedings of the 32nd …, 2024 - dl.acm.org

In the field of machine learning, continual learning is a crucial concept that allows models to
adapt to non-stationary data distributions. However, most of the existing works focus on uni …

[Free GPT-4]

[PDF] arxiv.org

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos

TT Nguyen, P Nguyen, X Li, J Cothren, A Yilmaz… - arxiv preprint arxiv …, 2024 - arxiv.org

Video scene graph generation (VidSGG) has emerged as a transformative approach to
capturing and interpreting the intricate relationships among objects and their temporal …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Subject-Oriented Video Captioning

Y Ma, C Teng, Y Qi, G Li, L Qing, Q Wu… - arxiv preprint arxiv …, 2023 - arxiv.org

Describing video content according to users' needs is a long-held goal. Although existing
video captioning methods have made significant progress, the generated captions may not …

[Free GPT-4]

[PDF] openreview.net

: Exploring Embodied Emotion Through A Large-Scale Egocentric Video Dataset

W Lin, Y Feng, WK Han, T **, Z Zhao, F Wu… - The Thirty-eight … - openreview.net

Understanding human emotions is fundamental to enhancing human-computer interaction,
especially for embodied agents that mimic human behavior. Traditional emotion analysis …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Exploring group video captioning with efficient relational approximation

Hig: Hierarchical interlacement graph approach to scene graph generation in video understanding

[HTML][HTML] Style-aware two-stage learning framework for video captioning

Towards unified multimodal editing with enhanced knowledge collaboration

Contextual Augmented Global Contrast for Multimodal Intent Recognition

Low-rank Prompt Interaction for Continual Vision-Language Retrieval

Semantic Alignment for Multimodal Large Language Models

Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos

Subject-Oriented Video Captioning

: Exploring Embodied Emotion Through A Large-Scale Egocentric Video Dataset