- Academic Search

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Enregistrer Citer Cité 729 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] techrxiv.org

Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, R Qureshi, A Shah, M Irfan, A Zafar… - Authorea …, 2023 - techrxiv.org

Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

Enregistrer Citer Cité 266 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

Enregistrer Citer Cité 445 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Enregistrer Citer Cité 145 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Images speak in images: A generalist painter for in-context visual learning

X Wang, W Wang, Y Cao, C Shen… - Proceedings of the …, 2023 - openaccess.thecvf.com

In-context learning, as a new paradigm in NLP, allows the model to rapidly adapt to various
tasks with only a handful of prompts and examples. But in computer vision, the difficulties for …

Enregistrer Citer Cité 240 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Enregistrer Citer Cité 214 fois Autres articles Les 6 versions Free GPT-4 Recherche dans les bibliothèques Version HTML

[Free GPT-4]

[PDF] openreview.net

Unified-io: A unified model for vision, language, and multi-modal tasks

J Lu, C Clark, R Zellers, R Mottaghi… - The Eleventh …, 2022 - openreview.net

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …

Enregistrer Citer Cité 403 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Instructdiffusion: A generalist modeling interface for vision tasks

Z Geng, B Yang, T Hang, C Li, S Gu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present InstructDiffusion a unified and generic framework for aligning computer vision
tasks with human instructions. Unlike existing approaches that integrate prior knowledge …

Enregistrer Citer Cité 87 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Seggpt: Segmenting everything in context

X Wang, X Zhang, Y Cao, W Wang, C Shen… - arxiv preprint arxiv …, 2023 - arxiv.org

We present SegGPT, a generalist model for segmenting everything in context. We unify
various segmentation tasks into a generalist in-context learning framework that …

Enregistrer Citer Cité 254 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

What does clip know about a red circle? visual prompt engineering for vlms

A Shtedritski, C Rupprecht… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Large-scale Vision-Language Models, such as CLIP, learn powerful image-text
representations that have found numerous applications, from zero-shot classification to text …

Enregistrer Citer Cité 126 fois Autres articles Les 7 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Visual prompting via image inpainting

The rise and potential of large language model based agents: A survey

Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

Sequential modeling enables scalable learning for large vision models

Images speak in images: A generalist painter for in-context visual learning

Multimodal foundation models: From specialists to general-purpose assistants

Unified-io: A unified model for vision, language, and multi-modal tasks

Instructdiffusion: A generalist modeling interface for vision tasks

Seggpt: Segmenting everything in context

What does clip know about a red circle? visual prompt engineering for vlms