Foundation Models Defining a New Era in Vision: a Survey and Outlook
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …
fundamental to understanding our world. The complex relations between objects and their …
Large models for time series and spatio-temporal data: A survey and outlook
Temporal data, notably time series and spatio-temporal data, are prevalent in real-world
applications. They capture dynamic system measurements and are produced in vast …
applications. They capture dynamic system measurements and are produced in vast …
Large scale foundation models for intelligent manufacturing applications: a survey
H Zhang, SD Semujju, Z Wang, X Lv, K Xu… - Journal of Intelligent …, 2025 - Springer
Although the applications of artificial intelligence especially deep learning have greatly
improved various aspects of intelligent manufacturing, they still face challenges for broader …
improved various aspects of intelligent manufacturing, they still face challenges for broader …
Retrieval-augmented generation for ai-generated content: A survey
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …
advancements in model algorithms, scalable foundation model architectures, and the …
When urban region profiling meets large language models
Urban region profiling from web-sourced data is of utmost importance for urban planning
and sustainable development. We are witnessing a rising trend of LLMs for various fields …
and sustainable development. We are witnessing a rising trend of LLMs for various fields …
Unlocking memorization in large language models with dynamic soft prompting
Pretrained large language models (LLMs) have revolutionized natural language processing
(NLP) tasks such as summarization, question answering, and translation. However, LLMs …
(NLP) tasks such as summarization, question answering, and translation. However, LLMs …
Urbanclip: Learning text-enhanced urban region profiling with contrastive language-image pretraining from the web
Urban region profiling from web-sourced data is of utmost importance for urban computing.
We are witnessing a blossom of LLMs for various fields, especially in multi-modal data …
We are witnessing a blossom of LLMs for various fields, especially in multi-modal data …
LifelongMemory: Leveraging LLMs for Answering Queries in Egocentric Videos
The egocentric video natural language query (NLQ) task involves localizing a temporal
window in an egocentric video that provides an answer to a posed query, which has wide …
window in an egocentric video that provides an answer to a posed query, which has wide …
ChatCam: Embracing LLMs for Contextual Chatting-to-Camera with Interest-Oriented Video Summarization
Cameras are ubiquitous in society, with users increasingly looking to extract insights about
the physical world. Current human-to-camera interaction methods, while advanced, still …
the physical world. Current human-to-camera interaction methods, while advanced, still …
Video Question Answering: A survey of the state-of-the-art
PJ Jeshmol, BC Kovoor - Journal of Visual Communication and Image …, 2024 - Elsevier
Abstract Video Question Answering (VideoQA) emerges as a prominent trend in the domain
of Artificial Intelligence, Computer Vision, and Natural Language Processing. It involves …
of Artificial Intelligence, Computer Vision, and Natural Language Processing. It involves …