- Academic Search

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Zapisz Cytuj Cytowane przez 205 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arxiv preprint arxiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

Zapisz Cytuj Cytowane przez 43 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Uniir: Training and benchmarking universal multimodal information retrievers

C Wei, Y Chen, H Chen, H Hu, G Zhang, J Fu… - … on Computer Vision, 2024 - Springer

Existing information retrieval (IR) models often assume a homogeneous format, limiting their
applicability to diverse user needs, such as searching for images with text descriptions …

Zapisz Cytuj Cytowane przez 31 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]

[PDF] arxiv.org

Kosmos-2.5: A multimodal literate model

T Lv, Y Huang, J Chen, Y Zhao, Y Jia, L Cui… - arxiv preprint arxiv …, 2023 - arxiv.org

The automatic reading of text-intensive images represents a significant advancement toward
achieving Artificial General Intelligence (AGI). In this paper we present KOSMOS-2.5, a …

Zapisz Cytuj Cytowane przez 50 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Anygpt: Unified multimodal llm with discrete sequence modeling

J Zhan, J Dai, J Ye, Y Zhou, D Zhang, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete
representations for the unified processing of various modalities, including speech, text …

Zapisz Cytuj Cytowane przez 85 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

W Zhang, M Cai, T Zhang, Y Zhuang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Multimodal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …

Zapisz Cytuj Cytowane przez 80 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]

[PDF] arxiv.org

Recommendation with generative models

Y Deldjoo, Z He, J McAuley, A Korikov… - arxiv preprint arxiv …, 2024 - arxiv.org

Generative models are a class of AI models capable of creating new instances of data by
learning and sampling from their statistical distributions. In recent years, these models have …

Zapisz Cytuj Cytowane przez 8 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Unifiedmllm: Enabling unified representation for multi-modal multi-tasks with large language model

Z Li, W Wang, YQ Cai, X Qi, P Wang, D Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Significant advancements has recently been achieved in the field of multi-modal large
language models (MLLMs), demonstrating their remarkable capabilities in understanding …

Zapisz Cytuj Cytowane przez 6 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Multi-modal generative ai: Multi-modal llm, diffusion and beyond

H Chen, X Wang, Y Zhou, B Huang, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …

Zapisz Cytuj Cytowane przez 6 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation

Mm-llms: Recent advances in multimodal large language models

The (r) evolution of multimodal large language models: A survey

Uniir: Training and benchmarking universal multimodal information retrievers

Kosmos-2.5: A multimodal literate model

Anygpt: Unified multimodal llm with discrete sequence modeling

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

Recommendation with generative models

Unifiedmllm: Enabling unified representation for multi-modal multi-tasks with large language model

Multi-modal generative ai: Multi-modal llm, diffusion and beyond

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey