- Academic Search

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Enregistrer Citer Cité 125 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] oup.com

A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Enregistrer Citer Cité 151 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Enregistrer Citer Cité 179 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Generative multimodal models are in-context learners

Q Sun, Y Cui, X Zhang, F Zhang, Q Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Humans can easily solve multimodal tasks in context with only a few demonstrations or
simple instructions which current multimodal systems largely struggle to imitate. In this work …

Enregistrer Citer Cité 197 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Cogvlm: Visual expert for pretrained language models

W Wang, Q Lv, W Yu, W Hong, J Qi, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce CogVLM, a powerful open-source visual language foundation model. Different
from the popular shallow alignment method which maps image features into the input space …

Enregistrer Citer Cité 551 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Gpt4roi: Instruction tuning large language model on region-of-interest

S Zhang, P Sun, S Chen, M **ao, W Shao… - arxiv preprint arxiv …, 2023 - arxiv.org

Instruction tuning large language model (LLM) on image-text pairs has achieved
unprecedented vision-language multimodal abilities. However, their vision-language …

Enregistrer Citer Cité 206 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Enregistrer Citer Cité 212 fois Autres articles Les 6 versions Free GPT-4 Recherche dans les bibliothèques Version HTML

[Free GPT-4]

[PDF] thecvf.com

Glamm: Pixel grounding large multimodal model

H Rasheed, M Maaz, S Shaji… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Multimodal Models (LMMs) extend Large Language Models to the vision
domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual …

Enregistrer Citer Cité 153 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Enregistrer Citer Cité 103 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

The all-seeing project v2: Towards general relation comprehension of the open world

W Wang, Y Ren, H Luo, T Li, C Yan, Z Chen… - … on Computer Vision, 2024 - Springer

Abstract We present the All-Seeing Project V2: a new model and dataset designed for
understanding object relations in images. Specifically, we propose the All-Seeing Model V2 …

Enregistrer Citer Cité 35 fois Autres articles Les 3 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Ferret: Refer and ground anything anywhere at any granularity

Combating misinformation in the age of llms: Opportunities and challenges

A Survey of Multimodel Large Language Models

MM1: methods, analysis and insights from multimodal LLM pre-training

Generative multimodal models are in-context learners

Cogvlm: Visual expert for pretrained language models

Gpt4roi: Instruction tuning large language model on region-of-interest

Multimodal foundation models: From specialists to general-purpose assistants

Glamm: Pixel grounding large multimodal model

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

The all-seeing project v2: Towards general relation comprehension of the open world