Deep reinforcement learning in computer vision: a comprehensive survey
Deep reinforcement learning augments the reinforcement learning framework and utilizes
the powerful representation of deep neural networks. Recent works have demonstrated the …
the powerful representation of deep neural networks. Recent works have demonstrated the …
An analytical study of information extraction from unstructured and multidimensional big data
Process of information extraction (IE) is used to extract useful information from unstructured
or semi-structured data. Big data arise new challenges for IE techniques with the rapid …
or semi-structured data. Big data arise new challenges for IE techniques with the rapid …
Univtg: Towards unified video-language temporal grounding
Abstract Video Temporal Grounding (VTG), which aims to ground target clips from videos
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …
(such as consecutive intervals or disjoint shots) according to custom language queries (eg …
Timechat: A time-sensitive multimodal large language model for long video understanding
This work proposes TimeChat a time-sensitive multimodal large language model specifically
designed for long video understanding. Our model incorporates two key architectural …
designed for long video understanding. Our model incorporates two key architectural …
Egovlpv2: Egocentric video-language pre-training with fusion in the backbone
Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …
generalize to various vision and language tasks. However, existing egocentric VLP …
Query-dependent video representation for moment retrieval and highlight detection
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …
the demand for video understanding is drastically increased. The key objective of MR/HD is …
Video summarization using deep neural networks: A survey
Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …
selecting the most informative parts of the video content. Several approaches have been …
Machine remaining useful life prediction via an attention-based deep learning approach
For prognostics and health management of mechanical systems, a core task is to predict the
machine remaining useful life (RUL). Currently, deep structures with automatic feature …
machine remaining useful life (RUL). Currently, deep structures with automatic feature …
Align and attend: Multimodal summarization with dual contrastive losses
The goal of multimodal summarization is to extract the most important information from
different modalities to form summaries. Unlike unimodal summarization, the multimodal …
different modalities to form summaries. Unlike unimodal summarization, the multimodal …
End-to-end dense video captioning with masked transformer
Dense video captioning aims to generate text descriptions for all events in an untrimmed
video. This involves both detecting and describing events. Therefore, all previous methods …
video. This involves both detecting and describing events. Therefore, all previous methods …