„Google“ mokslinčius

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Išsaugoti Cituoti Cituoja 229 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The revolution of multimodal large language models: a survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arxiv preprint arxiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

Išsaugoti Cituoti Cituoja 46 Susiję straipsniai Visos 9 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Next-gpt: Any-to-any multimodal llm

S Wu, H Fei, L Qu, W Ji, TS Chua - Forty-first International …, 2024 - openreview.net

While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides,
they mostly fall prey to the limitation of only input-side multimodal understanding, without the …

Išsaugoti Cituoti Cituoja 512 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Išsaugoti Cituoti Cituoja 231 Susiję straipsniai Visos 7 versijos Paieška bibliotekoje HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Generative multimodal models are in-context learners

Q Sun, Y Cui, X Zhang, F Zhang, Q Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Humans can easily solve multimodal tasks in context with only a few demonstrations or
simple instructions which current multimodal systems largely struggle to imitate. In this work …

Išsaugoti Cituoti Cituoja 209 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Seed-bench: Benchmarking multimodal llms with generative comprehension

B Li, R Wang, G Wang, Y Ge, Y Ge, Y Shan - arxiv preprint arxiv …, 2023 - arxiv.org

Based on powerful Large Language Models (LLMs), recent generative Multimodal Large
Language Models (MLLMs) have gained prominence as a pivotal research area, exhibiting …

Išsaugoti Cituoti Cituoja 460 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Išsaugoti Cituoti Cituoja 203 Susiję straipsniai Visos 7 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, R Guo, H Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org

Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

Išsaugoti Cituoti Cituoja 290 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ferret: Refer and ground anything anywhere at any granularity

H You, H Zhang, Z Gan, X Du, B Zhang, Z Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of
understanding spatial referring of any shape or granularity within an image and accurately …

Išsaugoti Cituoti Cituoja 253 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Išsaugoti Cituoti Cituoja 126 Susiję straipsniai Visos 7 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Generating images with multimodal language models

Mm-llms: Recent advances in multimodal large language models

The revolution of multimodal large language models: a survey

Next-gpt: Any-to-any multimodal llm

Multimodal foundation models: From specialists to general-purpose assistants

Generative multimodal models are in-context learners

Seed-bench: Benchmarking multimodal llms with generative comprehension

MM1: methods, analysis and insights from multimodal LLM pre-training

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Ferret: Refer and ground anything anywhere at any granularity

Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action