- Academic Search

P Sahoo, AK Singh, S Saha, V Jain, S Mondal… - arxiv preprint arxiv …, 2024 - arxiv.org

Prompt engineering has emerged as an indispensable technique for extending the
capabilities of large language models (LLMs) and vision-language models (VLMs). This …

Simpan Kutip Dirujuk 319 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Simpan Kutip Dirujuk 421 kali Artikel terkait 9 versi

[Free GPT-4]

[PDF] thecvf.com

Open-vocabulary semantic segmentation with mask-adapted clip

F Liang, B Wu, X Dai, K Li, Y Zhao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Open-vocabulary semantic segmentation aims to segment an image into semantic regions
according to text descriptions, which may not have been seen during training. Recent two …

Simpan Kutip Dirujuk 466 kali Artikel terkait 9 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Visual prompt multi-modal tracking

J Zhu, S Lai, X Chen, D Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Visible-modal object tracking gives rise to a series of downstream multi-modal tracking
tributaries. To inherit the powerful representations of the foundation model, a natural modus …

Simpan Kutip Dirujuk 205 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arxiv preprint arxiv …, 2022 - arxiv.org

This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

Simpan Kutip Dirujuk 615 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Repurposing diffusion-based image generators for monocular depth estimation

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024 - openaccess.thecvf.com

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …

Simpan Kutip Dirujuk 239 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Simda: Simple diffusion adapter for efficient video generation

Z **ng, Q Dai, H Hu, Z Wu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …

Simpan Kutip Dirujuk 63 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching

X Wu, F Zhu, R Zhao, H Li - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

Open-vocabulary detection (OVD) is an object detection task aiming at detecting objects
from novel categories beyond the base categories on which the detector is trained. Recent …

Simpan Kutip Dirujuk 128 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]

[PDF] neurips.cc

Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization

J Abdul Samadh, MH Gani, N Hussein… - Advances in …, 2023 - proceedings.neurips.cc

The promising zero-shot generalization of vision-language models such as CLIP has led to
their adoption using prompt learning for numerous downstream tasks. Previous works have …

Simpan Kutip Dirujuk 48 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Towards large-scale 3d representation learning with multi-dataset point prompt training

X Wu, Z Tian, X Wen, B Peng, X Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

The rapid advancement of deep learning models is often attributed to their ability to leverage
massive training data. In contrast such privilege has not yet fully benefited 3D deep learning …

Simpan Kutip Dirujuk 35 kali Artikel terkait 3 versi Versi HTML

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

A systematic survey of prompt engineering in large language models: Techniques and applications

Vision-language models for vision tasks: A survey

Open-vocabulary semantic segmentation with mask-adapted clip

Visual prompt multi-modal tracking

Vision transformer adapter for dense predictions

Repurposing diffusion-based image generators for monocular depth estimation

Simda: Simple diffusion adapter for efficient video generation

Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching

Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization

Towards large-scale 3d representation learning with multi-dataset point prompt training