Dreamllm: Synergistic multimodal comprehension and creation
This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …
Macaw-llm: Multi-modal language modeling with image, audio, video, and text integration
Although instruction-tuned large language models (LLMs) have exhibited remarkable
capabilities across various NLP tasks, their effectiveness on other data modalities beyond …
capabilities across various NLP tasks, their effectiveness on other data modalities beyond …
Minigpt-5: Interleaved vision-and-language generation via generative vokens
Large Language Models (LLMs) have garnered significant attention for their advancements
in natural language processing, demonstrating unparalleled prowess in text comprehension …
in natural language processing, demonstrating unparalleled prowess in text comprehension …
Multimodal federated learning: Concept, methods, applications and future directions
Multimodal learning mines and analyzes multimodal data in reality to better understand and
appreciate the world around people. However, how to exploit this rich multimodal data …
appreciate the world around people. However, how to exploit this rich multimodal data …
Mmdialog: A large-scale multi-turn dialogue dataset towards multi-modal open-domain conversation
Responding with multi-modal content has been recognized as an essential capability for an
intelligent conversational agent. In this paper, we introduce the MMDialog dataset to better …
intelligent conversational agent. In this paper, we introduce the MMDialog dataset to better …
Easygen: Easing multimodal generation with bidiffuser and llms
We present EasyGen, an efficient model designed to enhance multimodal understanding
and generation by harnessing the capabilities of diffusion models and large language …
and generation by harnessing the capabilities of diffusion models and large language …
Bi-mdrg: Bridging image history in multimodal dialogue response generation
Abstract Multimodal Dialogue Response Generation (MDRG) is a recently proposed task
where the model needs to generate responses in texts, images, or a blend of both based on …
where the model needs to generate responses in texts, images, or a blend of both based on …
Pace: Unified multi-modal dialogue pre-training with progressive and compositional experts
Perceiving multi-modal information and fulfilling dialogues with humans is a long-term goal
of artificial intelligence. Pre-training is commonly regarded as an effective approach for multi …
of artificial intelligence. Pre-training is commonly regarded as an effective approach for multi …
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset
As sharing images in an instant message is a crucial factor, there has been active research
on learning an image-text multi-modal dialogue models. However, training a well …
on learning an image-text multi-modal dialogue models. However, training a well …
The Zeno's Paradox ofLow-Resource'Languages
The disparity in the languages commonly studied in Natural Language Processing (NLP) is
typically reflected by referring to languages as low vs high-resourced. However, there is …
typically reflected by referring to languages as low vs high-resourced. However, there is …