Google Akademik

Z Lin, D Pathak, B Li, J Li, X **a, G Neubig… - … on Computer Vision, 2024 - Springer

Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

Kaydet Alıntı yap Alıntılanma sayısı: 65 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation

T Wu, G Yang, Z Li, K Zhang, Z Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Despite recent advances in text-to-3D generative methods there is a notable absence of
reliable evaluation metrics. Existing metrics usually focus on a single criterion each such as …

Kaydet Alıntı yap Alıntılanma sayısı: 69 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gpt-4v (ision) is a generalist web agent, if grounded

B Zheng, B Gou, J Kil, H Sun, Y Su - arxiv preprint arxiv:2401.01614, 2024 - arxiv.org

The recent development on large multimodal models (LMMs), especially GPT-4V (ision) and
Gemini, has been quickly expanding the capability boundaries of multimodal models …

Kaydet Alıntı yap Alıntılanma sayısı: 152 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation

A Yan, Z Yang, W Zhu, K Lin, L Li, J Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

We present MM-Navigator, a GPT-4V-based agent for the smartphone graphical user
interface (GUI) navigation task. MM-Navigator can interact with a smartphone screen as …

Kaydet Alıntı yap Alıntılanma sayısı: 81 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sapiens: Foundation for human vision models

R Khirodkar, T Bagautdinov, J Martinez… - … on Computer Vision, 2024 - Springer

We present Sapiens, a family of models for four fundamental human-centric vision tasks–2D
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …

Kaydet Alıntı yap Alıntılanma sayısı: 13 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llava-critic: Learning to evaluate multimodal models

T **ong, X Wang, D Guo, Q Ye, H Fan, Q Gu… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as
a generalist evaluator to assess performance across a wide range of multimodal tasks …

Kaydet Alıntı yap Alıntılanma sayısı: 18 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey on LLM-as-a-Judge

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

Kaydet Alıntı yap Alıntılanma sayısı: 12 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dreambench++: A human-aligned benchmark for personalized image generation

Y Peng, Y Cui, H Tang, Z Qi, R Dong, J Bai… - arxiv preprint arxiv …, 2024 - arxiv.org

Personalized image generation holds great promise in assisting humans in everyday work
and life due to its impressive function in creatively generating personalized content …

Kaydet Alıntı yap Alıntılanma sayısı: 14 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gpt-4v (ision) as a social media analysis engine

H Lyu, J Huang, D Zhang, Y Yu, X Mou, J Pan… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent research has offered insights into the extraordinary capabilities of Large Multimodal
Models (LMMs) in various general vision and language tasks. There is growing interest in …

Kaydet Alıntı yap Alıntılanma sayısı: 32 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models

J Roberts, T Lee, CH Wong… - Advances in …, 2025 - proceedings.neurips.cc

Abstract We introduce Image2Struct, a benchmark to evaluate vision-language models
(VLMs) on extracting structure from images. Our benchmark 1) captures real-world use …

Kaydet Alıntı yap Alıntılanma sayısı: 1 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Gpt-4v (ision) as a generalist evaluator for vision-language tasks

Evaluating text-to-visual generation with image-to-text generation

Gpt-4v (ision) is a human-aligned evaluator for text-to-3d generation

Gpt-4v (ision) is a generalist web agent, if grounded

Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation

Sapiens: Foundation for human vision models

Llava-critic: Learning to evaluate multimodal models

A Survey on LLM-as-a-Judge

Dreambench++: A human-aligned benchmark for personalized image generation

Gpt-4v (ision) as a social media analysis engine

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models