Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge
Data contamination hinders fair LLM evaluation by introducing test data into newer models'
training sets. Existing studies solve this challenge by updating benchmarks with newly …
training sets. Existing studies solve this challenge by updating benchmarks with newly …
Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation
Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging
external knowledge, a comprehensive evaluation of RAG systems is still challenging due to …
external knowledge, a comprehensive evaluation of RAG systems is still challenging due to …
Query Routing for Homogeneous Tools: An Instantiation in the RAG Scenario
Current research on tool learning primarily focuses on selecting the most effective tool from
a wide array of options, often overlooking cost-effectiveness, a crucial factor in human …
a wide array of options, often overlooking cost-effectiveness, a crucial factor in human …
Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario
Current research on tool learning primarily focuses on selecting the most effective tool from
a wide array of options, often overlooking cost-effectiveness, a crucial factor in human …
a wide array of options, often overlooking cost-effectiveness, a crucial factor in human …