„Google“ mokslinčius

X Fu, Y Hu, B Li, Y Feng, H Wang, X Lin, D Roth… - … on Computer Vision, 2024 - Springer

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …

Išsaugoti Cituoti Cituoja 104 Susiję straipsniai Visos 6 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating text-to-visual generation with image-to-text generation

Z Lin, D Pathak, B Li, J Li, X **a, G Neubig… - … on Computer Vision, 2024 - Springer

Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

Išsaugoti Cituoti Cituoja 77 Susiję straipsniai Visos 7 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Explainable and interpretable multimodal large language models: A comprehensive survey

Y Dang, K Huang, J Huo, Y Yan, S Huang, D Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …

Išsaugoti Cituoti Cituoja 11 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Seed-x: Multimodal models with unified multi-granularity comprehension and generation

Y Ge, S Zhao, J Zhu, Y Ge, K Yi, L Song, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid evolution of multimodal foundation model has demonstrated significant
progresses in vision-language understanding and generation, eg, our previous work SEED …

Išsaugoti Cituoti Cituoja 64 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Task me anything

J Zhang, W Huang, Z Ma, O Michel, D He… - arxiv preprint arxiv …, 2024 - arxiv.org

Benchmarks for large multimodal language models (MLMs) now serve to simultaneously
assess the general capabilities of models instead of evaluating for a specific capability. As a …

Išsaugoti Cituoti Cituoja 43 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

D Muhtar, Z Li, F Gu, X Zhang, P **ao - European Conference on …, 2024 - Springer

The revolutionary capabilities of large language models (LLMs) have paved the way for
multimodal large language models (MLLMs) and fostered diverse applications across …

Išsaugoti Cituoti Cituoja 47 Susiję straipsniai Visos 4 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Kangaroo: A powerful video-language model supporting long-context video input

J Liu, Y Wang, H Ma, X Wu, X Ma, X Wei, J Jiao… - arxiv preprint arxiv …, 2024 - arxiv.org

Rapid advancements have been made in extending Large Language Models (LLMs) to
Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data …

Išsaugoti Cituoti Cituoja 34 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vl-trojan: Multimodal instruction backdoor attacks against autoregressive visual language models

J Liang, S Liang, A Liu, X Cao - International Journal of Computer Vision, 2025 - Springer

Abstract Autoregressive Visual Language Models (VLMs) demonstrate remarkable few-shot
learning capabilities within a multimodal context. Recently, multimodal instruction tuning has …

Išsaugoti Cituoti Cituoja 37 Susiję straipsniai Visos 3 versijos

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Scifibench: Benchmarking large multimodal models for scientific figure interpretation

J Roberts, K Han, N Houlsby… - Advances in Neural …, 2025 - proceedings.neurips.cc

Large multimodal models (LMMs) have proven flexible and generalisable across many tasks
and fields. Although they have strong potential to aid scientific research, their capabilities in …

Išsaugoti Cituoti Cituoja 11 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vhelm: A holistic evaluation of vision language models

T Lee, H Tu, CH Wong, W Zheng… - Advances in …, 2025 - proceedings.neurips.cc

Current benchmarks for assessing vision-language models (VLMs) often focus on their
perception or problem-solving capabilities and neglect other critical aspects such as …

Išsaugoti Cituoti Cituoja 8 Susiję straipsniai Visos 4 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Seed-bench: Benchmarking multimodal large language models

Blink: Multimodal large language models can see but not perceive

Evaluating text-to-visual generation with image-to-text generation

Explainable and interpretable multimodal large language models: A comprehensive survey

Seed-x: Multimodal models with unified multi-granularity comprehension and generation

Task me anything

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

Kangaroo: A powerful video-language model supporting long-context video input

Vl-trojan: Multimodal instruction backdoor attacks against autoregressive visual language models

Scifibench: Benchmarking large multimodal models for scientific figure interpretation

Vhelm: A holistic evaluation of vision language models