Similarity Graph-correlation Reconstruction Network for unsupervised cross-modal hashing
Existing cross-modal hash retrieval methods can simultaneously enhance retrieval speed
and reduce storage space. However, these methods face a major challenge in determining …
and reduce storage space. However, these methods face a major challenge in determining …
VISIONE at video browser showdown 2023
In this paper, we present the fourth release of VISIONE, a tool for fast and effective video
search on a large-scale dataset. It includes several search functionalities like text search …
search on a large-scale dataset. It includes several search functionalities like text search …
Text-to-motion retrieval: Towards joint understanding of human motion data and natural language
Due to recent advances in pose-estimation methods, human motion can be extracted from a
common video in the form of 3D skeleton sequences. Despite wonderful application …
common video in the form of 3D skeleton sequences. Despite wonderful application …
Towards Retrieval-Augmented Architectures for Image Captioning
The objective of image captioning models is to bridge the gap between the visual and
linguistic modalities by generating natural language descriptions that accurately reflect the …
linguistic modalities by generating natural language descriptions that accurately reflect the …
Visione: a large-scale video retrieval system with advanced search functionalities
VISIONE is a large-scale video retrieval system that integrates multiple search
functionalities, including free text search, spatial color and object search, visual and …
functionalities, including free text search, spatial color and object search, visual and …
[HTML][HTML] Image–Text Matching Model Based on CLIP Bimodal Encoding
Y Zhu, H Xu, A Du, B Wang - Applied Sciences, 2024 - mdpi.com
Image–text matching is a fundamental task in the multimodal research field, connecting
computer vision and natural language processing by aligning visual content with …
computer vision and natural language processing by aligning visual content with …
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective
R Tao, M Zhu, H Cao, H Ren - Sensors, 2024 - mdpi.com
Fine-grained representation is fundamental to species classification based on deep
learning, and in this context, cross-modal contrastive learning is an effective method. The …
learning, and in this context, cross-modal contrastive learning is an effective method. The …
VISIONE for newbies: an easier-to-use video retrieval system
This paper presents a revised version of the VISIONE video retrieval system, which offers a
wide range of search functionalities, including free text search, spatial color and object …
wide range of search functionalities, including free text search, spatial color and object …
Cascaded transformer-based networks for wikipedia large-scale image-caption matching
With the increasing importance of multimedia and multilingual data in online encyclopedias,
novel methods are needed to fill domain gaps and automatically connect different modalities …
novel methods are needed to fill domain gaps and automatically connect different modalities …
Evaluating Performance and Trends in Interactive Video Retrieval: Insights from the 12th VBS Competition
This paper conducts a thorough examination of the 12th Video Browser Showdown (VBS)
competition, a well-established international benchmarking campaign for interactive video …
competition, a well-established international benchmarking campaign for interactive video …