Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge

X Wu, L Pan, Y **e, R Zhou, S Zhao, Y Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
Data contamination hinders fair LLM evaluation by introducing test data into newer models'
training sets. Existing studies solve this challenge by updating benchmarks with newly …

Towards effective neural topic modeling

X Wu - 2024 - dr.ntu.edu.sg
Over the past few decades, the world has witnessed an unprecedented explosion of
information. Of these, a substantial portion consists of unlabeled textual data, such as …