Tool learning with foundation models

Y Qin, S Hu, Y Lin, W Chen, N Ding, G Cui… - ACM Computing …, 2024 - dl.acm.org
Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …

Vision-language models in remote sensing: Current progress and future trends

X Li, C Wen, Y Hu, Z Yuan… - IEEE Geoscience and …, 2024 - ieeexplore.ieee.org
The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-
4) have sparked a wave of interest and research in the field of large language models …

Open-vocabulary panoptic segmentation with text-to-image diffusion models

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Side adapter network for open-vocabulary semantic segmentation

M Xu, Z Zhang, F Wei, H Hu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper presents a new framework for open-vocabulary semantic segmentation with the
pre-trained vision-language model, named SAN. Our approach models the semantic …

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Q Yu, J He, X Deng, X Shen… - Advances in Neural …, 2023 - proceedings.neurips.cc
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …

Maple: Multi-modal prompt learning

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com
Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …

Self-regulating prompts: Foundational model adaptation without forgetting

MU Khattak, ST Wasim, M Naseer… - Proceedings of the …, 2023 - openaccess.thecvf.com
Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …

Open-vocabulary semantic segmentation with mask-adapted clip

F Liang, B Wu, X Dai, K Li, Y Zhao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Open-vocabulary semantic segmentation aims to segment an image into semantic regions
according to text descriptions, which may not have been seen during training. Recent two …