- Academic Search

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier

Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

บันทึก อ้างอิง อ้างโดย151 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

บันทึก อ้างอิง อ้างโดย198 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

บันทึก อ้างอิง อ้างโดย466 บทความที่เกี่ยวข้อง ทั้งหมด 11 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

บันทึก อ้างอิง อ้างโดย229 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

บันทึก อ้างอิง อ้างโดย199 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Segment and Recognize Anything at Any Granularity

F Li, H Zhang, P Sun, X Zou, S Liu, C Li, J Yang… - … on Computer Vision, 2024 - Springer

In this work, we introduce Semantic-SAM, an augmented image segmentation foundation for
segmenting and recognizing anything at desired granularities. Compared to the …

บันทึก อ้างอิง อ้างโดย178 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Openscene: 3d scene understanding with open vocabularies

S Peng, K Genova, C Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a
model for a single task with supervision. We propose OpenScene, an alternative approach …

บันทึก อ้างอิง อ้างโดย284 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Maple: Multi-modal prompt learning

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …

บันทึก อ้างอิง อ้างโดย710 บทความที่เกี่ยวข้อง ทั้งหมด 9 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Self-regulating prompts: Foundational model adaptation without forgetting

MU Khattak, ST Wasim, M Naseer… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …

บันทึก อ้างอิง อ้างโดย164 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Visual-language prompt tuning with knowledge-guided context optimization

H Yao, R Zhang, C Xu - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

Prompt tuning is an effective way to adapt the pretrained visual-language model (VLM) to
the downstream task using task-related textual tokens. Representative CoOp-based works …

บันทึก อ้างอิง อ้างโดย203 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Denseclip: Language-guided dense prediction with context-aware prompting

[HTML][HTML] Review of large vision models and visual prompt engineering

Vision-language pre-training: Basics, recent advances, and future trends

Vision-language models for vision tasks: A survey

Multimodal foundation models: From specialists to general-purpose assistants

MM1: methods, analysis and insights from multimodal LLM pre-training

Segment and Recognize Anything at Any Granularity

Openscene: 3d scene understanding with open vocabularies

Maple: Multi-modal prompt learning

Self-regulating prompts: Foundational model adaptation without forgetting

Visual-language prompt tuning with knowledge-guided context optimization