„Google“ mokslinčius

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Išsaugoti Cituoti Cituoja 228 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Generative AI and process systems engineering: The next frontier

B Decardi-Nelson, AS Alshehri, A Ajagekar… - Computers & Chemical …, 2024 - Elsevier

This review article explores how emerging generative artificial intelligence (GenAI) models,
such as large language models (LLMs), can enhance solution methodologies within process …

Išsaugoti Cituoti Cituoja 26 Susiję straipsniai Visos 3 versijos

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Išsaugoti Cituoti Cituoja 231 Susiję straipsniai Visos 7 versijos Paieška bibliotekoje HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visual autoregressive modeling: Scalable image generation via next-scale prediction

K Tian, Y Jiang, Z Yuan, B Peng… - Advances in neural …, 2025 - proceedings.neurips.cc

Abstract We present Visual AutoRegressive modeling (VAR), a new generation paradigm
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …

Išsaugoti Cituoti Cituoja 156 Susiję straipsniai Visos 5 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Generative multimodal models are in-context learners

Q Sun, Y Cui, X Zhang, F Zhang, Q Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Humans can easily solve multimodal tasks in context with only a few demonstrations or
simple instructions which current multimodal systems largely struggle to imitate. In this work …

Išsaugoti Cituoti Cituoja 208 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ferret: Refer and ground anything anywhere at any granularity

H You, H Zhang, Z Gan, X Du, B Zhang, Z Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of
understanding spatial referring of any shape or granularity within an image and accurately …

Išsaugoti Cituoti Cituoja 253 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Išsaugoti Cituoti Cituoja 124 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Emu: Generative pretraining in multimodality

Q Sun, Q Yu, Y Cui, F Zhang, X Zhang, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly
generate images and texts in multimodal context. This omnivore model can take in any …

Išsaugoti Cituoti Cituoja 230 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

An image is worth 32 tokens for reconstruction and generation

Q Yu, M Weber, X Deng, X Shen… - Advances in Neural …, 2025 - proceedings.neurips.cc

Recent advancements in generative models have highlighted the crucial role of image
tokenization in the efficient synthesis of high-resolution images. Tokenization, which …

Išsaugoti Cituoti Cituoja 54 Susiję straipsniai Visos 5 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dreamllm: Synergistic multimodal comprehension and creation

R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang… - arxiv preprint arxiv …, 2023 - arxiv.org

This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …

Išsaugoti Cituoti Cituoja 154 Susiję straipsniai Visos 4 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Scaling autoregressive multi-modal models: Pretraining and instruction tuning

Mm-llms: Recent advances in multimodal large language models

Generative AI and process systems engineering: The next frontier

Multimodal foundation models: From specialists to general-purpose assistants

Visual autoregressive modeling: Scalable image generation via next-scale prediction

Generative multimodal models are in-context learners

Ferret: Refer and ground anything anywhere at any granularity

Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action

Emu: Generative pretraining in multimodality

An image is worth 32 tokens for reconstruction and generation

Dreamllm: Synergistic multimodal comprehension and creation