Domain generalization for semantic segmentation: a survey
Deep neural networks (DNNs) have proven explicit contributions in making autonomous
driving cars and related tasks such as semantic segmentation, motion tracking, object …
driving cars and related tasks such as semantic segmentation, motion tracking, object …
Unveiling deception in arabic: optimization of deceptive text detection across formal and informal genres
In recent years, social media has significantly influenced how we share information and
exchange messages. However, a significant issue arises from the fast dissemination of …
exchange messages. However, a significant issue arises from the fast dissemination of …
What Makes Multimodal In-Context Learning Work?
Abstract Large Language Models have demonstrated remarkable performance across
various tasks exhibiting the capacity to swiftly acquire new skills such as through In-Context …
various tasks exhibiting the capacity to swiftly acquire new skills such as through In-Context …
Large language model-brained gui agents: A survey
GUIs have long been central to human-computer interaction, providing an intuitive and
visually-driven way to access and interact with digital systems. The advent of LLMs …
visually-driven way to access and interact with digital systems. The advent of LLMs …
Rethinking the evaluation protocol of domain generalization
Abstract Domain generalization aims to solve the challenge of Out-of-Distribution (OOD)
generalization by leveraging common knowledge learned from multiple training domains to …
generalization by leveraging common knowledge learned from multiple training domains to …
Auto-Encoding Morph-Tokens for Multimodal LLM
For multimodal LLMs, the synergy of visual comprehension (textual output) and generation
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …
Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding
Temporal Sentence Grounding (TSG), which aims to localize events in untrimmed videos
with a given language query, has been widely studied in the last decades. However …
with a given language query, has been widely studied in the last decades. However …
Many-Shot In-Context Learning in Multimodal Foundation Models
Large language models are well-known to be effective at few-shot in-context learning (ICL).
Recent advancements in multimodal foundation models have enabled unprecedentedly …
Recent advancements in multimodal foundation models have enabled unprecedentedly …
A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization
Large language models (LLMs), with demonstrated reasoning abilities across multiple
domains, are largely underexplored for time-series reasoning (TsR), which is ubiquitous in …
domains, are largely underexplored for time-series reasoning (TsR), which is ubiquitous in …
TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
Top-view perspective denotes a typical way in which humans read and reason over different
types of maps, and it is vital for localization and navigation of humans as well as ofnon …
types of maps, and it is vital for localization and navigation of humans as well as ofnon …