Google Академія

Y Qin, S Hu, Y Lin, W Chen, N Ding, G Cui… - ACM Computing …, 2024 - dl.acm.org

Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …

Зберегти Послатися Цитовано в 311 джерелах Пов’язані статті Кількість версій: 10

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Зберегти Послатися Цитовано в 144 джерелах Пов’язані статті Кількість версій: 4

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Open-vocabulary panoptic segmentation with text-to-image diffusion models

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …

Зберегти Послатися Цитовано в 433 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Repurposing diffusion-based image generators for monocular depth estimation

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024 - openaccess.thecvf.com

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …

Зберегти Послатися Цитовано в 261 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Зберегти Послатися Цитовано в 466 джерелах Пов’язані статті Кількість версій: 11

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Зберегти Послатися Цитовано в 229 джерелах Пов’язані статті Кількість версій: 7 Пошук бібліотеки Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Side adapter network for open-vocabulary semantic segmentation

M Xu, Z Zhang, F Wei, H Hu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper presents a new framework for open-vocabulary semantic segmentation with the
pre-trained vision-language model, named SAN. Our approach models the semantic …

Зберегти Послатися Цитовано в 272 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Q Yu, J He, X Deng, X Shen… - Advances in Neural …, 2023 - proceedings.neurips.cc

Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …

Зберегти Послатися Цитовано в 130 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Openscene: 3d scene understanding with open vocabularies

S Peng, K Genova, C Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a
model for a single task with supervision. We propose OpenScene, an alternative approach …

Зберегти Послатися Цитовано в 284 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Зберегти Послатися Цитовано в 137 джерелах Пов’язані статті Кількість версій: 5

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Extract free dense labels from clip

Tool learning with foundation models

Foundation Models Defining a New Era in Vision: a Survey and Outlook

Open-vocabulary panoptic segmentation with text-to-image diffusion models

Repurposing diffusion-based image generators for monocular depth estimation

Vision-language models for vision tasks: A survey

Multimodal foundation models: From specialists to general-purpose assistants

Side adapter network for open-vocabulary semantic segmentation

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Openscene: 3d scene understanding with open vocabularies

Foundation models in robotics: Applications, challenges, and the future