- Academic Search

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Enregistrer Citer Cité 2246 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Benchmark evaluations, applications, and challenges of large vision language models: A survey

Z Li, X Wu, H Du, H Nghiem, G Shi - arxiv preprint arxiv:2501.02189, 2025 - arxiv.org

Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …

Enregistrer Citer Cité 3 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

Enregistrer Citer Cité 2 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

Unibench: Visual reasoning requires rethinking vision-language beyond scaling

H Al-Tahan, Q Garrido, R Balestriero… - arxiv preprint arxiv …, 2024 - arxiv.org

Significant research efforts have been made to scale and improve vision-language model
(VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers …

Enregistrer Citer Cité 9 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Ai safety in generative ai large language models: A survey

J Chua, Y Li, S Yang, C Wang, L Yao - arxiv preprint arxiv:2407.18369, 2024 - arxiv.org

Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are
facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) …

Enregistrer Citer Cité 7 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces

Z Chen, H Chen, M Imani, R Chen, F Imani - Expert Systems with …, 2025 - Elsevier

Workplace accidents due to personal protective equipment (PPE) non-compliance raise
serious safety concerns and lead to legal liabilities, financial penalties, and reputational …

Enregistrer Citer Cité 1 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

A Wüst, T Tobiasch, L Helff, DS Dhami… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, newly developed Vision-Language Models (VLMs), such as OpenAI's GPT-4o,
have emerged, seemingly demonstrating advanced reasoning capabilities across text and …

Enregistrer Citer Cité 3 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

FOCUS -- Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics

P Saha, F Wagner, D Mishra, C Peng, A Thakur… - arxiv preprint arxiv …, 2024 - arxiv.org

Effective training of large Vision-Language Models (VLMs) on resource-constrained client
devices in Federated Learning (FL) requires the usage of parameter-efficient fine-tuning …

Enregistrer Citer Cité 1 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

Evaluation and comparison of visual language models for transportation engineering problems

S Prajapati, T Singh, C Hegde… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent developments in vision language models (VLM) have shown great potential for
diverse applications related to image understanding. In this study, we have explored state-of …

Enregistrer Citer Cité 2 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Omnixr: Evaluating omni-modality language models on reasoning across modalities

L Chen, H Hu, M Zhang, Y Chen, Z Wang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce OmnixR, an evaluation suite designed to benchmark SoTA Omni-modality
Language Models, such as GPT-4o and Gemini. Evaluating OLMs, which integrate multiple …

Enregistrer Citer Cité 1 fois Autres articles Version HTML

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

The llama 3 herd of models

Benchmark evaluations, applications, and challenges of large vision language models: A survey

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Unibench: Visual reasoning requires rethinking vision-language beyond scaling

Ai safety in generative ai large language models: A survey

Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

FOCUS -- Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics

Evaluation and comparison of visual language models for transportation engineering problems

Omnixr: Evaluating omni-modality language models on reasoning across modalities