Vlap: Efficient video-language alignment via frame prompting and distilling for video question answering
In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA
model addresses both efficient frame sampling and effective cross-modal alignment in a …
model addresses both efficient frame sampling and effective cross-modal alignment in a …
Vila: Efficient video-language alignment for video question answering
We propose an efficient Vi deo-L anguage A lignment (ViLA) network. Our ViLA model
addresses both efficient frame sampling and effective cross-modal alignment in a unified …
addresses both efficient frame sampling and effective cross-modal alignment in a unified …
Auxiliary modality learning with generalized curriculum distillation
Driven by the need from real-world applications, Auxiliary Modality Learning (AML) offers the
possibility to utilize more information from auxiliary data in training, while only requiring data …
possibility to utilize more information from auxiliary data in training, while only requiring data …
Learning-Based Autonomous Driving With Enhanced Data Efficiency and Policy Training
Y Shen - 2023 - search.proquest.com
Autonomous vehicles are capable of sensing the environment and moving around with little
to no human intervention, enhancing efficiency and safety. Self-driving cars, for instance, will …
to no human intervention, enhancing efficiency and safety. Self-driving cars, for instance, will …