- Academic Search

M Moor, O Banerjee, ZSH Abad, HM Krumholz… - Nature, 2023 - nature.com

The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI)
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …

保存引用被引用数: 1072 関連記事全 17 バージョン

[Free GPT-4]

[PDF] github.io

The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer

For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

保存引用被引用数: 730 関連記事全 4 バージョン

[Free GPT-4]

[PDF] researchhub.com

[PDF][PDF] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond

J Bai, S Bai, S Yang, S Wang… - arxiv preprint …, 2023 - storage.prod.researchhub.com

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …

保存引用被引用数: 546 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Segment everything everywhere all at once

X Zou, J Yang, H Zhang, F Li, L Li… - Advances in …, 2024 - proceedings.neurips.cc

In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …

保存引用被引用数: 529 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

保存引用被引用数: 447 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

保存引用被引用数: 609 関連記事全 2 バージョン

[Free GPT-4]

[PDF] neurips.cc

Objaverse-xl: A universe of 10m+ 3d objects

M Deitke, R Liu, M Wallingford, H Ngo… - Advances in …, 2024 - proceedings.neurips.cc

Natural language processing and 2D vision models have attained remarkable proficiency on
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …

保存引用被引用数: 291 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Qwen-vl: A frontier large vision-language model with versatile abilities

J Bai, S Bai, S Yang, S Wang, S Tan, P Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …

保存引用被引用数: 1012 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …

保存引用被引用数: 331 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

保存引用被引用数: 568 関連記事全 3 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Unified-io: A unified model for vision, language, and multi-modal tasks

Foundation models for generalist medical artificial intelligence

The rise and potential of large language model based agents: A survey

[PDF][PDF] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond

Segment everything everywhere all at once

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Objaverse-xl: A universe of 10m+ 3d objects

Qwen-vl: A frontier large vision-language model with versatile abilities

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)