Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Large models for time series and spatio-temporal data: A survey and outlook

M **, Q Wen, Y Liang, C Zhang, S Xue, X Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Temporal data, notably time series and spatio-temporal data, are prevalent in real-world
applications. They capture dynamic system measurements and are produced in vast …

Large scale foundation models for intelligent manufacturing applications: a survey

H Zhang, SD Semujju, Z Wang, X Lv, K Xu… - Journal of Intelligent …, 2025 - Springer
Although the applications of artificial intelligence especially deep learning have greatly
improved various aspects of intelligent manufacturing, they still face challenges for broader …

Retrieval-augmented generation for ai-generated content: A survey

P Zhao, H Zhang, Q Yu, Z Wang, Y Geng, F Fu… - arxiv preprint arxiv …, 2024 - arxiv.org
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …

When urban region profiling meets large language models

Y Yan, H Wen, S Zhong, W Chen, H Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
Urban region profiling from web-sourced data is of utmost importance for urban planning
and sustainable development. We are witnessing a rising trend of LLMs for various fields …

Unlocking memorization in large language models with dynamic soft prompting

Z Wang, R Bao, Y Wu, J Taylor, C **ao, F Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Pretrained large language models (LLMs) have revolutionized natural language processing
(NLP) tasks such as summarization, question answering, and translation. However, LLMs …

Urbanclip: Learning text-enhanced urban region profiling with contrastive language-image pretraining from the web

Y Yan, H Wen, S Zhong, W Chen, H Chen… - Proceedings of the …, 2024 - dl.acm.org
Urban region profiling from web-sourced data is of utmost importance for urban computing.
We are witnessing a blossom of LLMs for various fields, especially in multi-modal data …

LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos

Y Wang, Y Yang, M Ren - arxiv preprint arxiv:2312.05269, 2023 - arxiv.org
The egocentric video natural language query (NLQ) task involves localizing a temporal
window in an egocentric video that provides an answer to a posed query, which has wide …

ChatCam: Embracing LLMs for Contextual Chatting-to-Camera with Interest-Oriented Video Summarization

K **ao, Y Gao, F Li, W Xu, P Chen… - Proceedings of the ACM on …, 2024 - dl.acm.org
Cameras are ubiquitous in society, with users increasingly looking to extract insights about
the physical world. Current human-to-camera interaction methods, while advanced, still …

Video Question Answering: A survey of the state-of-the-art

PJ Jeshmol, BC Kovoor - Journal of Visual Communication and Image …, 2024 - Elsevier
Abstract Video Question Answering (VideoQA) emerges as a prominent trend in the domain
of Artificial Intelligence, Computer Vision, and Natural Language Processing. It involves …