Hit: Hierarchical transformer with momentum contrast for video-text retrieval
Abstract Video-Text Retrieval has been a hot research topic with the growth of multimedia
data on the internet. Transformer for video-text learning has attracted increasing attention …
data on the internet. Transformer for video-text learning has attracted increasing attention …
Fast video moment retrieval
This paper targets at fast video moment retrieval (fast VMR), aiming to localize the target
moment efficiently and accurately as queried by a given natural language sentence. We …
moment efficiently and accurately as queried by a given natural language sentence. We …
A survey on deep hashing methods
Nearest neighbor search aims at obtaining the samples in the database with the smallest
distances from them to the queries, which is a basic task in a range of fields, including …
distances from them to the queries, which is a basic task in a range of fields, including …
Reading-strategy inspired visual representation learning for text-to-video retrieval
This paper aims for the task of text-to-video retrieval, where given a query in the form of a
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …
natural-language sentence, it is asked to retrieve videos which are semantically relevant to …
Partially relevant video retrieval
Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …
oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is …
Unsupervised cross-modal hashing via semantic text mining
Cross-modal hashing has been widely used in multimedia retrieval tasks due to its fast
retrieval speed and low storage cost. Recently, many deep unsupervised cross-modal …
retrieval speed and low storage cost. Recently, many deep unsupervised cross-modal …
Complementarity-aware space learning for video-text retrieval
In general, videos are powerful at recording physical patterns (eg, spatial layout) while texts
are great at describing abstract symbols (eg, emotion). When video and text are used in …
are great at describing abstract symbols (eg, emotion). When video and text are used in …
SEA: Sentence encoder assembly for video retrieval by textual queries
Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a
core theme in multimedia data management and retrieval. The success of AVS counts on …
core theme in multimedia data management and retrieval. The success of AVS counts on …
Transferring image-clip to video-text retrieval via temporal relations
We present a novel network to transfer the image-language pre-trained model to video-text
retrieval in an end-to-end manner. Leading approaches in the domain of video-and …
retrieval in an end-to-end manner. Leading approaches in the domain of video-and …
Gratis: Deep learning graph representation with task-specific topology and multi-dimensional edge features
Graph is powerful for representing various types of real-world data. The topology (edges'
presence) and edges' features of a graph decides the message passing mechanism among …
presence) and edges' features of a graph decides the message passing mechanism among …