Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions
Autonomous driving has achieved significant milestones in research and development over
the last two decades. There is increasing interest in the field as the deployment of …
the last two decades. There is increasing interest in the field as the deployment of …
ChatGPT-like large-scale foundation models for prognostics and health management: A survey and roadmaps
PHM technology is vital in industrial production and maintenance, identifying and predicting
potential equipment failures and damages. This enables proactive maintenance measures …
potential equipment failures and damages. This enables proactive maintenance measures …
Openeqa: Embodied question answering in the era of foundation models
We present a modern formulation of Embodied Question Answering (EQA) as the task of
understanding an environment well enough to answer questions about it in natural …
understanding an environment well enough to answer questions about it in natural …
Video graph transformer for video question answering
This paper proposes a Video Graph Transformer (VGT) model for Video Question Answering
(VideoQA). VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer …
(VideoQA). VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer …
Can i trust your answer? visually grounded video question answering
We study visually grounded VideoQA in response to the emerging trends of utilizing
pretraining techniques for video-language understanding. Specifically by forcing vision …
pretraining techniques for video-language understanding. Specifically by forcing vision …
Intentqa: Context-aware video intent reasoning
In this paper, we propose a novel task IntentQA, a special VideoQA task focusing on video
intent reasoning, which has become increasingly important for AI with its advantages in …
intent reasoning, which has become increasingly important for AI with its advantages in …
Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
performance of accuracy, efficiency, and robustness, all of which largely rely on the …
Retrieving-to-answer: Zero-shot video question answering with frozen large language models
Abstract Video Question Answering (VideoQA) has been significantly advanced from the
scaling of recent Large Language Models (LLMs). The key idea is to convert the visual …
scaling of recent Large Language Models (LLMs). The key idea is to convert the visual …
Discovering spatio-temporal rationales for video question answering
This paper strives to solve complex video question answering (VideoQA) which features
long videos containing multiple objects and events at different time. To tackle the challenge …
long videos containing multiple objects and events at different time. To tackle the challenge …
Contrastive video question answering via video graph transformer
We propose to perform video question answering (VideoQA) in a Co ntrastive manner via a
V ideo G raph T ransformer model (CoVGT). CoVGT's uniqueness and superiority are three …
V ideo G raph T ransformer model (CoVGT). CoVGT's uniqueness and superiority are three …