- Academic Search

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …

Lưu Trích dẫn Trích dẫn 247 bài viết Bài viết có liên quan Tất cả 4 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Benchmark data contamination of large language models: A survey

C Xu, S Guan, D Greene, M Kechadi - arxiv preprint arxiv:2406.04244, 2024 - arxiv.org

The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and
Gemini has transformed the field of natural language processing. However, it has also …

Lưu Trích dẫn Trích dẫn 30 bài viết Bài viết có liên quan Tất cả 4 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Benchmarking benchmark leakage in large language models

R Xu, Z Wang, RZ Fan, P Liu - arxiv preprint arxiv:2404.18824, 2024 - arxiv.org

Amid the expanding use of pre-training data, the phenomenon of benchmark dataset
leakage has become increasingly prominent, exacerbated by opaque training processes …

Lưu Trích dẫn Trích dẫn 59 bài viết Bài viết có liên quan Tất cả 3 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Promptbench: A unified library for evaluation of large language models

K Zhu, Q Zhao, H Chen, J Wang, X **e - Journal of Machine Learning …, 2024 - jmlr.org

The evaluation of large language models (LLMs) is crucial to assess their performance and
mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to …

Lưu Trích dẫn Trích dẫn 32 bài viết Bài viết có liên quan Tất cả 5 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Kieval: A knowledge-grounded interactive evaluation framework for large language models

Z Yu, C Gao, W Yao, Y Wang, W Ye, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Automatic evaluation methods for large language models (LLMs) are hindered by data
contamination, leading to inflated assessments of their effectiveness. Existing strategies …

Lưu Trích dẫn Trích dẫn 23 bài viết Bài viết có liên quan Tất cả 7 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Darg: Dynamic evaluation of large language models via adaptive reasoning graph

Z Zhang, J Chen, D Yang - Advances in Neural Information …, 2025 - proceedings.neurips.cc

The current paradigm of evaluating Large Language Models (LLMs) through static
benchmarks comes with significant limitations, such as vulnerability to data contamination …

Lưu Trích dẫn Trích dẫn 4 bài viết Bài viết có liên quan Tất cả 3 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey

T Masterman, S Besen, M Sawtell, A Chao - arxiv preprint arxiv …, 2024 - arxiv.org

This survey paper examines the recent advancements in AI agent implementations, with a
focus on their ability to achieve complex goals that require enhanced reasoning, planning …

Lưu Trích dẫn Trích dẫn 29 bài viết Bài viết có liên quan Tất cả 3 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Nphardeval: Dynamic benchmark on reasoning ability of large language models via complexity classes

L Fan, W Hua, L Li, H Ling, Y Zhang - arxiv preprint arxiv:2312.14890, 2023 - arxiv.org

Complex reasoning ability is one of the most important features of current LLMs, which has
also been leveraged to play an integral role in complex decision-making tasks. Therefore …

Lưu Trích dẫn Trích dẫn 27 bài viết Bài viết có liên quan Tất cả 6 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Graphinstruct: Empowering large language models with graph understanding and reasoning capability

Z Luo, X Song, H Huang, J Lian, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Evaluating and enhancing the general capabilities of large language models (LLMs) has
been an important research topic. Graph is a common data structure in the real world, and …

Lưu Trích dẫn Trích dẫn 23 bài viết Bài viết có liên quan Tất cả 4 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Co-occurrence is not factual association in language models

X Zhang, M Li, J Wu - Advances in Neural Information …, 2025 - proceedings.neurips.cc

Pretrained language models can encode a large amount of knowledge and utilize it for
various reasoning tasks, yet they can still struggle to learn novel factual knowledge …

Lưu Trích dẫn Trích dẫn 3 bài viết Bài viết có liên quan Tất cả 3 phiên bản Xem dạng HTML

Trích dẫn

Tìm kiếm nâng cao

Đã lưu vào Thư viện của tôi

Ai alignment: A comprehensive survey

Benchmark data contamination of large language models: A survey

Benchmarking benchmark leakage in large language models

Promptbench: A unified library for evaluation of large language models

Kieval: A knowledge-grounded interactive evaluation framework for large language models

Darg: Dynamic evaluation of large language models via adaptive reasoning graph

The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey

Nphardeval: Dynamic benchmark on reasoning ability of large language models via complexity classes

Graphinstruct: Empowering large language models with graph understanding and reasoning capability

Co-occurrence is not factual association in language models