Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge

X Wu, L Pan, Y **e, R Zhou, S Zhao, Y Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
Data contamination hinders fair LLM evaluation by introducing test data into newer models'
training sets. Existing studies solve this challenge by updating benchmarks with newly …

Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation

D Ru, L Qiu, X Hu, T Zhang, P Shi, S Chang… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging
external knowledge, a comprehensive evaluation of RAG systems is still challenging due to …

Query Routing for Homogeneous Tools: An Instantiation in the RAG Scenario

F Mu, Y Jiang, L Zhang, L Liuchu, W Li… - Findings of the …, 2024 - aclanthology.org
Current research on tool learning primarily focuses on selecting the most effective tool from
a wide array of options, often overlooking cost-effectiveness, a crucial factor in human …

Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

F Mu, Y Jiang, L Zhang, C Liu, W Li, P **e… - arxiv preprint arxiv …, 2024 - arxiv.org
Current research on tool learning primarily focuses on selecting the most effective tool from
a wide array of options, often overlooking cost-effectiveness, a crucial factor in human …