Google Učenjak

Y Qiao, H Duan, X Fang, J Yang… - Advances in …, 2025 - proceedings.neurips.cc

Abstract Vision Language Models (VLMs) demonstrate remarkable proficiency in addressing
a wide array of visual questions, which requires strong perception and reasoning faculties …

Shrani Navedi Navedeno v 13 virih Sorodni članki Vse različice: 4 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A peek into token bias: Large language models are not yet genuine reasoners

B Jiang, Y **e, Z Hao, X Wang, T Mallick, WJ Su… - arxiv preprint arxiv …, 2024 - arxiv.org

This study introduces a hypothesis-testing framework to assess whether large language
models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We …

Shrani Navedi Navedeno v 30 virih Sorodni članki Vse različice: 6 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

NC Mendonça - ACM Transactions on Computing Education, 2024 - dl.acm.org

The recent integration of visual capabilities into Large Language Models (LLMs) has the
potential to play a pivotal role in science and technology education, where visual elements …

Shrani Navedi Navedeno v 11 virih Sorodni članki Vse različice: 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

What is the visual cognition gap between humans and multimodal llms?

X Cao, B Lai, W Ye, Y Ma, J Heintz, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, Multimodal Large Language Models (MLLMs) have shown great promise in
language-guided perceptual tasks such as recognition, segmentation, and object detection …

Shrani Navedi Navedeno v 6 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

A Wüst, T Tobiasch, L Helff, DS Dhami… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, newly developed Vision-Language Models (VLMs), such as OpenAI's GPT-4o,
have emerged, seemingly demonstrating advanced reasoning capabilities across text and …

Shrani Navedi Navedeno v 2 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

X Zou, K Li, Y Chen - arxiv preprint arxiv:2407.02534, 2024 - arxiv.org

Large Visual Language Model\textbfs (VLMs) such as GPT-4V have achieved remarkable
success in generating comprehensive and nuanced responses. Researchers have …

Shrani Navedi Navedeno v 3 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models

JT Huang, D Dai, JY Huang, Y Yuan, X Liu… - arxiv preprint arxiv …, 2025 - arxiv.org

Multimodal Large Language Models (MLLMs) have demonstrated remarkable
advancements in multimodal understanding; however, their fundamental visual cognitive …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual scratchpads: Enabling global reasoning in vision

A Lotfi, E Fini, S Bengio, M Nabi, E Abbe - arxiv preprint arxiv:2410.08165, 2024 - arxiv.org

Modern vision models have achieved remarkable success in benchmarks where local
features provide critical information about the target. There is now a growing interest in …

Shrani Navedi Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning

M Hersche, G Camposampiero, R Wattenhofer… - arxiv preprint arxiv …, 2024 - arxiv.org

This work compares large language models (LLMs) and neuro-symbolic approaches in
solving Raven's progressive matrices (RPM), a visual abstract reasoning test that involves …

Shrani Navedi Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Benchmarking Visual Cognition of Multimodal LLMs via Matrix Reasoning

X Cao, B Lai, W Ye, Y Ma, J Heintz, M Huang, J Chen… - openreview.net

Recently, Multimodal Large Language Models (MLLMs) and Vision Language Models
(VLMs) have shown great promise in language-guided perceptual tasks such as recognition …

Shrani Navedi Sorodni članki V obliki HTML

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

How Far Are We from Intelligent Visual Deductive Reasoning?

Prism: A framework for decoupling and assessing the capabilities of vlms

A peek into token bias: Large language models are not yet genuine reasoners

Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

What is the visual cognition gap between humans and multimodal llms?

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models

Visual scratchpads: Enabling global reasoning in vision

Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning

Benchmarking Visual Cognition of Multimodal LLMs via Matrix Reasoning