- Academic Search

J Fang, J Qiao, J Xue, Z Li - … on Circuits and Systems for Video …, 2023 - ieeexplore.ieee.org

Traffic accident detection and anticipation is an obstinate road safety problem and
painstaking efforts have been devoted. With the rapid growth of video data, Vision-based …

บันทึก อ้างอิง อ้างโดย25 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Lightgt: A light graph transformer for multimedia recommendation

Y Wei, W Liu, F Liu, X Wang, L Nie… - Proceedings of the 46th …, 2023 - dl.acm.org

Multimedia recommendation methods aim to discover the user preference on the multi-
modal information to enhance the collaborative filtering (CF) based recommender system …

บันทึก อ้างอิง อ้างโดย55 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

บันทึก อ้างอิง อ้างโดย35 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Fewer steps, better performance: Efficient cross-modal clip trimming for video moment retrieval using language

X Fang, D Liu, W Fang, P Zhou, Z Xu, W Xu… - Proceedings of the …, 2024 - ojs.aaai.org

Given an untrimmed video and a sentence query, video moment retrieval using language
(VMR) aims to locate a target query-relevant moment. Since the untrimmed video is …

บันทึก อ้างอิง อ้างโดย14 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Gradient-regulated meta-prompt learning for generalizable vision-language models

J Li, M Gao, L Wei, S Tang, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …

บันทึก อ้างอิง อ้างโดย27 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …

บันทึก อ้างอิง อ้างโดย14 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Unified generative and discriminative training for multi-modal large language models

W Chow, J Li, Q Yu, K Pan, H Fei… - Advances in …, 2025 - proceedings.neurips.cc

Abstract In recent times, Vision-Language Models (VLMs) have been trained under two
predominant paradigms. Generative training has enabled Multimodal Large Language …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Online distillation-enhanced multi-modal transformer for sequential recommendation

W Ji, X Liu, A Zhang, Y Wei, Y Ni, X Wang - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Multi-modal recommendation systems, which integrate diverse types of information, have
gained widespread attention in recent years. However, compared to traditional collaborative …

บันทึก อ้างอิง อ้างโดย18 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Not all inputs are valid: Towards open-set video moment retrieval using language

X Fang, W Fang, D Liu, X Qu, J Dong, P Zhou… - Proceedings of the …, 2024 - dl.acm.org

Video Moment Retrieval (VMR) targets to retrieve the specific moment corresponding to a
sentence query from an untrimmed video. Although recent respectable works have made …

บันทึก อ้างอิง อ้างโดย7 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Redundancy-aware transformer for video question answering

Y Li, X Yang, A Zhang, C Feng, X Wang… - Proceedings of the 31st …, 2023 - dl.acm.org

This paper identifies two kinds of redundancy in the current VideoQA paradigm. Specifically,
the current video encoders tend to holistically embed all video clues at different granularities …

บันทึก อ้างอิง อ้างโดย17 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based...

Vision-based traffic accident detection and anticipation: A survey

Lightgt: A light graph transformer for multimedia recommendation

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

Fewer steps, better performance: Efficient cross-modal clip trimming for video moment retrieval using language

Gradient-regulated meta-prompt learning for generalizable vision-language models

Dual learning with dynamic knowledge distillation for partially relevant video retrieval

Unified generative and discriminative training for multi-modal large language models

Online distillation-enhanced multi-modal transformer for sequential recommendation

Not all inputs are valid: Towards open-set video moment retrieval using language

Redundancy-aware transformer for video question answering