A survey on potentials, pathways and challenges of large language models in new-generation intelligent manufacturing

C Zhang, Q Xu, Y Yu, G Zhou, K Zeng, F Chang… - Robotics and Computer …, 2025 - Elsevier
Abstract Nowadays, Industry 5.0 starts to gain attention, which advocates that intelligent
manufacturing should adequately consider the roles and needs of humans. In this context …

Momentor: Advancing video large language model with fine-grained temporal reasoning

L Qian, J Li, Y Wu, Y Ye, H Fei, TS Chua… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) demonstrate remarkable proficiency in comprehending and
handling text-based tasks. Many efforts are being made to transfer these attributes to video …

Evaluating the impact of environmental semantic distractions on multimodal large language models

S Kuhozido, G Dunfield, E Ostrich, C Waterhouse - 2024 - researchsquare.com
Multimodal models integrating visual and textual data have transformed artificial intelligence
applications by providing more holistic and contextually aware responses. However, the …

Auto-encoding morph-tokens for multimodal llm

K Pan, S Tang, J Li, Z Fan, W Chow, S Yan… - arxiv preprint arxiv …, 2024 - arxiv.org
For multimodal LLMs, the synergy of visual comprehension (textual output) and generation
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …

Unified generative and discriminative training for multi-modal large language models

W Chow, J Li, Q Yu, K Pan, H Fei… - Advances in …, 2025 - proceedings.neurips.cc
Abstract In recent times, Vision-Language Models (VLMs) have been trained under two
predominant paradigms. Generative training has enabled Multimodal Large Language …

Fact: Teaching mllms with faithful, concise and transferable rationales

M Gao, S Chen, L Pang, Y Yao, J Dang… - Proceedings of the …, 2024 - dl.acm.org
The remarkable performance of Multimodal Large Language Models (MLLMs) has
demonstrated their proficient understanding capabilities in handling various visual tasks …

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

A survey on multimodal benchmarks: In the era of large ai models

L Li, G Chen, H Shi, J **ao, L Chen - arxiv preprint arxiv:2409.18142, 2024 - arxiv.org
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …

Acdc: Autoregressive coherent multimodal generation using diffusion correction

H Chung, D Lee, JC Ye - arxiv preprint arxiv:2410.04721, 2024 - arxiv.org
Autoregressive models (ARMs) and diffusion models (DMs) represent two leading
paradigms in generative modeling, each excelling in distinct areas: ARMs in global context …

Wall-e: World alignment by rule learning improves world model-based llm agents

S Zhou, T Zhou, Y Yang, G Long, D Ye, J Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Can large language models (LLMs) directly serve as powerful world models for model-
based agents? While the gaps between the prior knowledge of LLMs and the specified …