- Academic Search

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Salva Cita Citato da 205 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] oup.com

A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Salva Cita Citato da 153 Articoli correlati Tutte e 7 le versioni Web of Science: 1

[Free GPT-4]

[PDF] arxiv.org

Adversarial diffusion distillation

A Sauer, D Lorenz, A Blattmann… - European Conference on …, 2024 - Springer

Abstract We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that
efficiently samples large-scale foundational image diffusion models in just 1–4 steps while …

Salva Cita Citato da 281 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] nature.com

A foundation model for clinical-grade computational pathology and rare cancers detection

E Vorontsov, A Bozkurt, A Casson, G Shaikovski… - Nature medicine, 2024 - nature.com

The analysis of histopathology images with artificial intelligence aims to enable clinical
decision support systems and precision medicine. The success of such applications …

Salva Cita Citato da 87 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[PDF] arxiv.org

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Salva Cita Citato da 350 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Salva Cita Citato da 180 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

Salva Cita Citato da 1066 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]

[PDF] ieee.org

End-to-end autonomous driving: Challenges and frontiers

L Chen, P Wu, K Chitta, B Jaeger… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The autonomous driving community has witnessed a rapid growth in approaches that
embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle …

Salva Cita Citato da 242 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[PDF] thecvf.com

Anydoor: Zero-shot object-level image customization

X Chen, L Huang, Y Liu, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …

Salva Cita Citato da 212 Articoli correlati Tutte e 3 le versioni Versione HTML

Internvideo2: Scaling foundation models for multimodal video understanding

Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei… - … on Computer Vision, 2024 - Springer

We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the
state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue. Our …

Salva Cita Citato da 118 Articoli correlati Tutte e 3 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Dinov2: Learning robust visual features without supervision

Mm-llms: Recent advances in multimodal large language models

A Survey of Multimodel Large Language Models

Adversarial diffusion distillation

A foundation model for clinical-grade computational pathology and rare cancers detection

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

MM1: methods, analysis and insights from multimodal LLM pre-training

A survey on multimodal large language models

End-to-end autonomous driving: Challenges and frontiers

Anydoor: Zero-shot object-level image customization

Internvideo2: Scaling foundation models for multimodal video understanding