- Academic Search

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

保存引用被引用数: 136 関連記事全 2 バージョン

[Free GPT-4]

[PDF] neurips.cc

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2024 - proceedings.neurips.cc

Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …

保存引用被引用数: 5071 関連記事全 15 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

保存引用被引用数: 1772 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Open-vocabulary panoptic segmentation with text-to-image diffusion models

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …

保存引用被引用数: 430 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Sdxl: Improving latent diffusion models for high-resolution image synthesis

D Podell, Z English, K Lacey, A Blattmann… - arxiv preprint arxiv …, 2023 - arxiv.org

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to
previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone …

保存引用被引用数: 1617 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …

保存引用被引用数: 348 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc

Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

保存引用被引用数: 302 関連記事全 12 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Q Yu, J He, X Deng, X Shen… - Advances in Neural …, 2024 - proceedings.neurips.cc

Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …

保存引用被引用数: 127 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Minigpt-v2: large language model as a unified interface for vision-language multi-task learning

J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models have shown their remarkable capabilities as a general interface for
various language-related applications. Motivated by this, we target to build a unified …

保存引用被引用数: 500 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Pick-a-pic: An open dataset of user preferences for text-to-image generation

Y Kirstain, A Polyak, U Singer… - Advances in …, 2023 - proceedings.neurips.cc

The ability to collect a large dataset of human preferences from text-to-image users is
usually limited to companies, making such datasets inaccessible to the public. To address …

保存引用被引用数: 264 関連記事全 5 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Openclip, July 2021

Foundation Models Defining a New Era in Vision: a Survey and Outlook

Visual instruction tuning

Improved baselines with visual instruction tuning

Open-vocabulary panoptic segmentation with text-to-image diffusion models

Sdxl: Improving latent diffusion models for high-resolution image synthesis

Yi: Open foundation models by 01. ai

Emergent correspondence from image diffusion

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Minigpt-v2: large language model as a unified interface for vision-language multi-task learning

Pick-a-pic: An open dataset of user preferences for text-to-image generation