[HTML][HTML] RS-CLIP: Zero shot remote sensing scene classification via contrastive vision-language supervision

X Li, C Wen, Y Hu, N Zhou - … Journal of Applied Earth Observation and …, 2023 - Elsevier
Zero-shot remote sensing scene classification aims to solve the scene classification problem
on unseen categories and has attracted numerous research attention in the remote sensing …

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …

Learning to prompt for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - International Journal of Computer Vision, 2022 - Springer
Large pre-trained vision-language models like CLIP have shown great potential in learning
representations that are transferable across a wide range of downstream tasks. Different …

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …

Task residual for tuning vision-language models

T Yu, Z Lu, X **, Z Chen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Large-scale vision-language models (VLMs) pre-trained on billion-level data have learned
general visual representations and broad visual concepts. In principle, the well-learned …

Progressive semantic-visual mutual adaption for generalized zero-shot learning

M Liu, F Li, C Zhang, Y Wei, H Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge
transferred from the seen domain, relying on the intrinsic interactions between visual and …

A survey of zero-shot learning: Settings, methods, and applications

W Wang, VW Zheng, H Yu, C Miao - ACM Transactions on Intelligent …, 2019 - dl.acm.org
Most machine-learning methods focus on classifying instances whose classes have already
been seen in training. In practice, many applications require classifying instances whose …

f-vaegan-d2: A feature generating framework for any-shot learning

Y **an, S Sharma, B Schiele… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
When labeled training data is scarce, a promising data augmentation approach is to
generate visual features of unknown classes using their attributes. To learn the class …

TN-ZSTAD: Transferable network for zero-shot temporal activity detection

L Zhang, X Chang, J Liu, M Luo, Z Li… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
An integral part of video analysis and surveillance is temporal activity detection, which
means to simultaneously recognize and localize activities in long untrimmed videos …