A survey on evaluation of large language models
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …
industry, owing to their unprecedented performance in various applications. As LLMs …
Scientific large language models: A survey on biological & chemical domains
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …
natural language comprehension, representing a significant stride toward artificial general …
A survey of large language models
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
Inadequacies of large language model benchmarks in the era of generative artificial intelligence
The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities
has spurred public curiosity to evaluate and compare different LLMs, leading many …
has spurred public curiosity to evaluate and compare different LLMs, leading many …
Evalcrafter: Benchmarking and evaluating large video generation models
The vision and language generative models have been overgrown in recent years. For
video generation various open-sourced models and public-available services have been …
video generation various open-sourced models and public-available services have been …
Superclue: A comprehensive chinese large language model benchmark
Large language models (LLMs) have shown the potential to be integrated into human daily
lives. Therefore, user preference is the most critical criterion for assessing LLMs' …
lives. Therefore, user preference is the most critical criterion for assessing LLMs' …
Learning or self-aligning? rethinking instruction fine-tuning
Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs).
Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the …
Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the …
Sciassess: Benchmarking llm proficiency in scientific literature analysis
Recent breakthroughs in Large Language Models (LLMs) have revolutionized scientific
literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency …
literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency …
Can Large Language Models Understand Real-World Complex Instructions?
Large language models (LLMs) can understand human instructions, showing their potential
for pragmatic applications beyond traditional NLP tasks. However, they still struggle with …
for pragmatic applications beyond traditional NLP tasks. However, they still struggle with …
SeaEval for multilingual foundation models: From cross-lingual alignment to cultural reasoning
We present SeaEval, a benchmark for multilingual foundation models. In addition to
characterizing how these models understand and reason with natural language, we also …
characterizing how these models understand and reason with natural language, we also …