Google Академія

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Зберегти Послатися Цитовано в 1224 джерелах Пов’язані статті Кількість версій: 12

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey of continual learning: Theory, method and application

L Wang, X Zhang, H Su, J Zhu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

To cope with real-world dynamics, an intelligent system needs to incrementally acquire,
update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as …

Зберегти Послатися Цитовано в 716 джерелах Пов’язані статті Кількість версій: 9

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2023 - proceedings.neurips.cc

Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …

Зберегти Послатися Цитовано в 5412 джерелах Пов’язані статті Кількість версій: 18 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

Segment anything in medical images

J Ma, Y He, F Li, L Han, C You, B Wang - Nature Communications, 2024 - nature.com

Medical image segmentation is a critical component in clinical practice, facilitating accurate
diagnosis, treatment planning, and disease monitoring. However, existing methods, often …

Зберегти Послатися Цитовано в 1381 джерелах Пов’язані статті Кількість версій: 12

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Зберегти Послатися Цитовано в 611 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rt-2: Vision-language-action models transfer web knowledge to robotic control

A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2023 - arxiv.org

We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

Зберегти Послатися Цитовано в 807 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sharegpt4v: Improving large multi-modal models with better captions

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - … on Computer Vision, 2024 - Springer

Modality alignment serves as the cornerstone for large multi-modal models (LMMs).
However, the impact of different attributes (eg, data type, quality, and scale) of training data …

Зберегти Послатися Цитовано в 482 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Animate anyone: Consistent and controllable image-to-video synthesis for character animation

L Hu - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Character Animation aims to generating character videos from still images through driving
signals. Currently diffusion models have become the mainstream in visual generation …

Зберегти Послатися Цитовано в 285 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Lisa: Reasoning segmentation via large language model

X Lai, Z Tian, Y Chen, Y Li, Y Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

Although perception systems have made remarkable advancements in recent years they still
rely on explicit human instruction or pre-defined categories to identify the target objects …

Зберегти Послатися Цитовано в 398 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023 - stableaiprompts.com

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …

Зберегти Послатися Цитовано в 590 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Segment anything

A Survey of Multimodel Large Language Models

A comprehensive survey of continual learning: Theory, method and application

Visual instruction tuning

Segment anything in medical images

Depth anything: Unleashing the power of large-scale unlabeled data

Rt-2: Vision-language-action models transfer web knowledge to robotic control

Sharegpt4v: Improving large multi-modal models with better captions

Animate anyone: Consistent and controllable image-to-video synthesis for character animation

Lisa: Reasoning segmentation via large language model

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)