- Academic Search

X Chen, Z Ma, X Zhang, S Xu, S Qian, J Yang… - ar** Video-Language Alignment via LLM-Based Self-Questioning and Answering

J Chen, K Ma, H Huang, J Shen, H Fang… - arxiv preprint arxiv …, 2024 - arxiv.org

The development of multi-modal models has been rapidly advancing, with some
demonstrating remarkable capabilities. However, annotating video-text pairs remains …

[Free GPT-4]

[PDF] arxiv.org

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org

Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …

Save Cite Related articles View as HTML

[Free GPT-4]

[PDF] aclanthology.org

[PDF][PDF] Mitigating Language Bias of LMMs in Social Intelligence Understanding with Virtual Counterfactual Calibration

P Chen, XY Guo, YF Li, X Zhang… - Proceedings of the 2024 …, 2024 - aclanthology.org

Social intelligence is essential for understanding complex human expressions and social
interactions. While large multimodal models (LMMs) have demonstrated remarkable …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Crema: Multimodal compositional video reasoning via efficient modular adaptation and fusion

Multi-object hallucination in vision-language models

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

[PDF][PDF] Mitigating Language Bias of LMMs in Social Intelligence Understanding with Virtual Counterfactual Calibration