Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval

X Li, T Uricchio, L Ballan, M Bertini… - ACM Computing …, 2016 - dl.acm.org
Where previous reviews on content-based image retrieval emphasize what can be seen in
an image to bridge the semantic gap, this survey considers what people tag about an image …

Image retrieval on real-life images with pre-trained vision-and-language models

Z Liu, C Rodriguez-Opazo… - Proceedings of the …, 2021 - openaccess.thecvf.com
We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …

Deep multiple instance learning for image classification and auto-annotation

J Wu, Y Yu, C Huang, K Yu - Proceedings of the IEEE …, 2015 - openaccess.thecvf.com
The recent development in learning deep representations has demonstrated its wide
applications in traditional vision tasks like classification and detection. However, there has …

Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation

W Li, L Duan, D Xu, IW Tsang - IEEE Transactions on Pattern …, 2013 - ieeexplore.ieee.org
In this paper, we study the heterogeneous domain adaptation (HDA) problem, in which the
data from the source domain and the target domain are represented by heterogeneous …

Taxonomy, state-of-the-art, challenges and applications of visual understanding: A review

NY Khanday, SA Sofi - Computer Science Review, 2021 - Elsevier
Since the dawn of Humanity, to communicate both abstract and concrete ideas, visualization
through visual imagery has been an effective way. With the advancement of scientific …

Learning multi-level deep representations for image emotion classification

T Rao, X Li, M Xu - Neural processing letters, 2020 - Springer
In this paper, we propose a new deep network that learns multi-level deep representations
for image emotion classification (MldrNet). Image emotion can be recognized through image …

Flexattention for efficient high-resolution vision-language models

J Li, D Chen, T Cai, P Chen, Y Hong, Z Chen… - … on Computer Vision, 2024 - Springer
Current high-resolution vision-language models encode images as high-resolution image
tokens and exhaustively take all these tokens to compute attention, which significantly …

Discriminative multi-instance multitask learning for 3D action recognition

Y Yang, C Deng, S Gao, W Liu, D Tao… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
As the prosperity of low-cost and easy-operating depth cameras, skeleton-based human
action recognition has been extensively studied recently. However, most of the existing …

Bi-directional training for composed image retrieval via text prompt learning

Z Liu, W Sun, Y Hong, D Teney… - Proceedings of the …, 2024 - openaccess.thecvf.com
Composed image retrieval searches for a target image based on a multi-modal user query
comprised of a reference image and modification text describing the desired changes …

Towards automatic construction of diverse, high-quality image datasets

Y Yao, J Zhang, F Shen, L Liu, F Zhu… - … on Knowledge and …, 2019 - ieeexplore.ieee.org
The availability of labeled image datasets has been shown critical for high-level image
understanding, which continuously drives the progress of feature designing and models …