Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Tool learning with foundation models
Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …
foundation models, artificial intelligence systems have the potential to be equally adept in …
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
Detecting hallucinations in large language models using semantic entropy
Large language model (LLM) systems, such as ChatGPT or Gemini, can show impressive
reasoning and question-answering capabilities but often 'hallucinate'false outputs and …
reasoning and question-answering capabilities but often 'hallucinate'false outputs and …
[HTML][HTML] Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation
Abstract Trustworthy Artificial Intelligence (AI) is based on seven technical requirements
sustained over three main pillars that should be met throughout the system's entire life cycle …
sustained over three main pillars that should be met throughout the system's entire life cycle …
Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
Gpqa: A graduate-level google-proof q&a benchmark
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain
experts in biology, physics, and chemistry. We ensure that the questions are high-quality …
experts in biology, physics, and chemistry. We ensure that the questions are high-quality …
Rlaif: Scaling reinforcement learning from human feedback with ai feedback
Reinforcement learning from human feedback (RLHF) is an effective technique for aligning
large language models (LLMs) to human preferences, but gathering high-quality human …
large language models (LLMs) to human preferences, but gathering high-quality human …
Tree of attacks: Jailbreaking black-box llms automatically
A Mehrotra, M Zampetakis… - Advances in …, 2025 - proceedings.neurips.cc
Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …
Guiding pretraining in reinforcement learning with large language models
Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped
reward function. Intrinsically motivated exploration methods address this limitation by …
reward function. Intrinsically motivated exploration methods address this limitation by …
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …
capabilities with increasing scale. Despite their potentially transformative impact, these new …