- Academic Search

TH Rafi, R Mahjabin, E Ghosh, YW Ko… - Artificial Intelligence …, 2024 - Springer

Deep neural networks (DNNs) have proven explicit contributions in making autonomous
driving cars and related tasks such as semantic segmentation, motion tracking, object …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] ieee.org

Unveiling deception in arabic: optimization of deceptive text detection across formal and informal genres

F Alhayan, H Himdi, B Alharbi - IEEE Access, 2024 - ieeexplore.ieee.org

In recent years, social media has significantly influenced how we share information and
exchange messages. However, a significant issue arises from the fast dissemination of …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel

[Free GPT-4]

[PDF] thecvf.com

What Makes Multimodal In-Context Learning Work?

FB Baldassini, M Shukor, M Cord… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Language Models have demonstrated remarkable performance across
various tasks exhibiting the capacity to swiftly acquire new skills such as through In-Context …

Speichern Zitieren Zitiert von: 13 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Large language model-brained gui agents: A survey

C Zhang, S He, J Qian, B Li, L Li, S Qin, Y Kang… - arxiv preprint arxiv …, 2024 - arxiv.org

GUIs have long been central to human-computer interaction, providing an intuitive and
visually-driven way to access and interact with digital systems. The advent of LLMs …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Rethinking the evaluation protocol of domain generalization

H Yu, X Zhang, R Xu, J Liu, Y He… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Domain generalization aims to solve the challenge of Out-of-Distribution (OOD)
generalization by leveraging common knowledge learned from multiple training domains to …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Auto-Encoding Morph-Tokens for Multimodal LLM

K Pan, S Tang, J Li, Z Fan, W Chow, S Yan… - arxiv preprint arxiv …, 2024 - arxiv.org

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] openreview.net

Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding

X Jiang, Z Wei, S Li, X Xu, J Song… - Proceedings of the 32nd …, 2024 - dl.acm.org

Temporal Sentence Grounding (TSG), which aims to localize events in untrimmed videos
with a given language query, has been widely studied in the last decades. However …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

Many-Shot In-Context Learning in Multimodal Foundation Models

Y Jiang, J Irvin, JH Wang, MA Chaudhry… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models are well-known to be effective at few-shot in-context learning (ICL).
Recent advancements in multimodal foundation models have enabled unprecedentedly …

Speichern Zitieren Zitiert von: 19 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization

H Liu, C Liu, BA Prakash - arxiv preprint arxiv:2411.06018, 2024 - arxiv.org

Large language models (LLMs), with demonstrated reasoning abilities across multiple
domains, are largely underexplored for time-series reasoning (TsR), which is ubiquitous in …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

C Li, C Zhang, H Zhou, N Collier, A Korhonen… - arxiv preprint arxiv …, 2024 - arxiv.org

Top-view perspective denotes a typical way in which humans read and reason over different
types of maps, and it is vital for localization and navigation of humans as well as ofnon …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

On the out-of-distribution generalization of multimodal large language models

Domain generalization for semantic segmentation: a survey

Unveiling deception in arabic: optimization of deceptive text detection across formal and informal genres

What Makes Multimodal In-Context Learning Work?

Large language model-brained gui agents: A survey

Rethinking the evaluation protocol of domain generalization

Auto-Encoding Morph-Tokens for Multimodal LLM

Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding

Many-Shot In-Context Learning in Multimodal Foundation Models

A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners