Llava-onevision: Easy visual task transfer

B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed
by consolidating our insights into data, models, and visual representations in the LLaVA …

[HTML][HTML] A survey on the use of large language models (llms) in fake news

E Papageorgiou, C Chronis, I Varlamis, Y Himeur - Future Internet, 2024 - mdpi.com
The proliferation of fake news and fake profiles on social media platforms poses significant
threats to information integrity and societal trust. Traditional detection methods, including …

RULER: What's the Real Context Size of Your Long-Context Language Models?

CP Hsieh, S Sun, S Kriman, S Acharya… - arxiv preprint arxiv …, 2024 - arxiv.org
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of
information (the" needle") from long distractor texts (the" haystack"), has been widely …

Qwen2. 5-coder technical report

B Hui, J Yang, Z Cui, J Yang, D Liu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its
predecessor, CodeQwen1. 5. This series includes six models: Qwen2. 5-Coder-(0.5 B/1.5 …

Video instruction tuning with synthetic data

Y Zhang, J Wu, W Li, B Li, Z Ma, Z Liu, C Li - arxiv preprint arxiv …, 2024 - arxiv.org
The development of video large multimodal models (LMMs) has been hindered by the
difficulty of curating large amounts of high-quality raw data from the web. To address this, we …

Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models

M Deitke, C Clark, S Lee, R Tripathi, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Today's most advanced multimodal models remain proprietary. The strongest open-weight
models rely heavily on synthetic data from proprietary VLMs to achieve good performance …

Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement

A Yang, B Zhang, B Hui, B Gao, B Yu, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we present a series of math-specific large language models: Qwen2. 5-Math
and Qwen2. 5-Math-Instruct-1.5 B/7B/72B. The core innovation of the Qwen2. 5 series lies in …

mplug-owl3: Towards long image-sequence understanding in multi-modal large language models

J Ye, H Xu, H Liu, A Hu, M Yan, Q Qian, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities
in executing instructions for a variety of single-image tasks. Despite this progress, significant …

Large language model inference acceleration: A comprehensive hardware perspective

J Li, J Xu, S Huang, Y Chen, W Li, J Liu, Y Lian… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities across various
fields, from natural language understanding to text generation. Compared to non-generative …

Graph retrieval-augmented generation: A survey

B Peng, Y Zhu, Y Liu, X Bo, H Shi, C Hong… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, Retrieval-Augmented Generation (RAG) has achieved remarkable success in
addressing the challenges of Large Language Models (LLMs) without necessitating …