Colpali: Efficient document retrieval with vision language models

M Faysse, H Sibille, T Wu, B Omrani… - The Thirteenth …, 2024 - openreview.net
Documents are visually rich structures that convey information through text, but also figures,
page layouts, tables, or even fonts. Since modern retrieval systems mainly rely on the textual …

Auxiliary task demands mask the capabilities of smaller language models

J Hu, MC Frank - arxiv preprint arxiv:2404.02418, 2024 - arxiv.org
Developmental psychologists have argued about when cognitive capacities such as
language understanding or theory of mind emerge. These debates often hinge on the …

VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation

Y Song, Y Kim, M Iyyer - arxiv preprint arxiv:2406.19276, 2024 - arxiv.org
Existing metrics for evaluating the factuality of long-form text, such as FACTSCORE (Min et
al., 2023) and SAFE (Wei et al., 2024), decompose an input text into" atomic claims" and …

Financemath: Knowledge-intensive math reasoning in finance domains

Y Zhao, H Liu, Y Long, R Zhang, C Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce FinanceMath, a novel benchmark designed to evaluate LLMs' capabilities in
solving knowledge-intensive math reasoning problems. Compared to prior works, this study …

Fast state restoration in LLM serving with hcache

S Gao, Y Chen, J Shu - arxiv preprint arxiv:2410.05004, 2024 - arxiv.org
The growing complexity of LLM usage today, eg, multi-round conversation and retrieval-
augmented generation (RAG), makes contextual states (ie, KV cache) reusable across user …

On the Diversity of Synthetic Data and its Impact on Training Large Language Models

H Chen, A Waheed, X Li, Y Wang, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
The rise of Large Language Models (LLMs) has accentuated the need for diverse, high-
quality pre-training data. Synthetic data emerges as a viable solution to the challenges of …

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

J Chen, T Zhang, S Huang, Y Niu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the recent breakthroughs achieved by Large Vision Language Models (LVLMs) in
understanding and responding to complex visual-textual contexts, their inherent …

Decoder-only streaming transformer for simultaneous translation

S Guo, S Zhang, Y Feng - arxiv preprint arxiv:2406.03878, 2024 - arxiv.org
Simultaneous Machine Translation (SiMT) generates translation while reading source
tokens, essentially producing the target prefix based on the source prefix. To achieve good …

Evaluating language models as risk scores

AF Cruz, M Hardt, C Mendler-Dünner - arxiv preprint arxiv:2407.14614, 2024 - arxiv.org
Current question-answering benchmarks predominantly focus on accuracy in realizable
prediction tasks. Conditioned on a question and answer-key, does the most likely token …

Automated Text Scoring in the Age of Generative AI for the GPU-poor

CM Ormerod, A Kwako - arxiv preprint arxiv:2407.01873, 2024 - arxiv.org
Current research on generative language models (GLMs) for automated text scoring (ATS)
has focused almost exclusively on querying proprietary models via Application …