Google Académico

R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang… - arxiv preprint arxiv …, 2023 - arxiv.org

This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …

Guardar Citar Citado por 112 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Macaw-llm: Multi-modal language modeling with image, audio, video, and text integration

C Lyu, M Wu, L Wang, X Huang, B Liu, Z Du… - arxiv preprint arxiv …, 2023 - arxiv.org

Although instruction-tuned large language models (LLMs) have exhibited remarkable
capabilities across various NLP tasks, their effectiveness on other data modalities beyond …

Guardar Citar Citado por 80 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Minigpt-5: Interleaved vision-and-language generation via generative vokens

K Zheng, X He, XE Wang - arxiv preprint arxiv:2310.02239, 2023 - arxiv.org

Large Language Models (LLMs) have garnered significant attention for their advancements
in natural language processing, demonstrating unparalleled prowess in text comprehension …

Guardar Citar Citado por 82 Artículos relacionados Las 3 versiones Versión en HTML

Multimodal federated learning: Concept, methods, applications and future directions

W Huang, D Wang, X Ouyang, J Wan, J Liu, T Li - Information Fusion, 2024 - Elsevier

Multimodal learning mines and analyzes multimodal data in reality to better understand and
appreciate the world around people. However, how to exploit this rich multimodal data …

Guardar Citar Citado por 8 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

Mmdialog: A large-scale multi-turn dialogue dataset towards multi-modal open-domain conversation

J Feng, Q Sun, C Xu, P Zhao, Y Yang, C Tao… - arxiv preprint arxiv …, 2022 - arxiv.org

Responding with multi-modal content has been recognized as an essential capability for an
intelligent conversational agent. In this paper, we introduce the MMDialog dataset to better …

Guardar Citar Citado por 45 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] aclanthology.org

Easygen: Easing multimodal generation with bidiffuser and llms

X Zhao, B Liu, Q Liu, G Shi, XM Wu - Proceedings of the 62nd …, 2024 - aclanthology.org

We present EasyGen, an efficient model designed to enhance multimodal understanding
and generation by harnessing the capabilities of diffusion models and large language …

Guardar Citar Citado por 4 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Bi-mdrg: Bridging image history in multimodal dialogue response generation

HS Yoon, E Yoon, JTJ Tee, K Zhang, YJ Heo… - … on Computer Vision, 2024 - Springer

Abstract Multimodal Dialogue Response Generation (MDRG) is a recently proposed task
where the model needs to generate responses in texts, images, or a blend of both based on …

Guardar Citar Citado por 2 Artículos relacionados Las 4 versiones

[Free GPT-4]

[PDF] arxiv.org

Pace: Unified multi-modal dialogue pre-training with progressive and compositional experts

Y Li, B Hui, ZC Yin, M Yang, F Huang, Y Li - arxiv preprint arxiv …, 2023 - arxiv.org

Perceiving multi-modal information and fulfilling dialogues with humans is a long-term goal
of artificial intelligence. Pre-training is commonly regarded as an effective approach for multi …

Guardar Citar Citado por 19 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] aclanthology.org

DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset

YJ Lee, B Ko, HG Kim, J Hyeon… - Proceedings of the 2024 …, 2024 - aclanthology.org

As sharing images in an instant message is a crucial factor, there has been active research
on learning an image-text multi-modal dialogue models. However, training a well …

Guardar Citar Citado por 6 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

The Zeno's Paradox ofLow-Resource'Languages

HH Nigatu, AL Tonja, B Rosman, T Solorio… - arxiv preprint arxiv …, 2024 - arxiv.org

The disparity in the languages commonly studied in Natural Language Processing (NLP) is
typically reflected by referring to languages as low vs high-resourced. However, there is …

Guardar Citar Citado por 3 Artículos relacionados Las 3 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Multimodal dialogue response generation

Dreamllm: Synergistic multimodal comprehension and creation

Macaw-llm: Multi-modal language modeling with image, audio, video, and text integration

Minigpt-5: Interleaved vision-and-language generation via generative vokens

Multimodal federated learning: Concept, methods, applications and future directions

Mmdialog: A large-scale multi-turn dialogue dataset towards multi-modal open-domain conversation

Easygen: Easing multimodal generation with bidiffuser and llms

Bi-mdrg: Bridging image history in multimodal dialogue response generation

Pace: Unified multi-modal dialogue pre-training with progressive and compositional experts

DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset

The Zeno's Paradox ofLow-Resource'Languages