Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge
Data contamination hinders fair LLM evaluation by introducing test data into newer models'
training sets. Existing studies solve this challenge by updating benchmarks with newly …
training sets. Existing studies solve this challenge by updating benchmarks with newly …
Towards effective neural topic modeling
X Wu - 2024 - dr.ntu.edu.sg
Over the past few decades, the world has witnessed an unprecedented explosion of
information. Of these, a substantial portion consists of unlabeled textual data, such as …
information. Of these, a substantial portion consists of unlabeled textual data, such as …