Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
From generation to judgment: Opportunities and challenges of llm-as-a-judge
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …
and natural language processing (NLP). However, traditional methods, whether matching …
Future events as backdoor triggers: Investigating temporal vulnerabilities in llms
Backdoors are hidden behaviors that are only triggered once an AI system has been
deployed. Bad actors looking to create successful backdoors must design them to avoid …
deployed. Bad actors looking to create successful backdoors must design them to avoid …
Is your llm outdated? evaluating llms at temporal generalization
C Zhu, N Chen, Y Gao, Y Zhang, P Tiwari… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid advancement of Large Language Models (LLMs) highlights the urgent need for
evolving evaluation methodologies that keep pace with improvements in language …
evolving evaluation methodologies that keep pace with improvements in language …
Enhancing logical reasoning in large language models through graph-based synthetic data
Despite recent advances in training and prompting strategies for Large Language Models
(LLMs), these models continue to face challenges with complex logical reasoning tasks that …
(LLMs), these models continue to face challenges with complex logical reasoning tasks that …
Graph Reasoning with LLMs (GReaL)
Graphs are a powerful tool for representing and analyzing complex relationships in real-
world applications. Large Language Models (LLMs) have demonstrated impressive …
world applications. Large Language Models (LLMs) have demonstrated impressive …
Perceive the Passage of Time: A Systematic Evaluation of Large Language Model in Temporal Relativity
S Chen, Y Zheng, S Li, Q Cheng… - Proceedings of the 31st …, 2025 - aclanthology.org
Temporal perception is crucial for Large Language Models (LLMs) to effectively understand
the world. However, current benchmarks primarily focus on temporal reasoning, falling short …
the world. However, current benchmarks primarily focus on temporal reasoning, falling short …
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
Reasoning about Actions and Change (RAC) has historically played a pivotal role in solving
foundational AI problems, such as the frame problem. It has driven advancements in AI …
foundational AI problems, such as the frame problem. It has driven advancements in AI …
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Large language models (LLMs) have significantly impacted many aspects of our lives.
However, assessing and ensuring their chronological knowledge remains challenging …
However, assessing and ensuring their chronological knowledge remains challenging …
VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges in Video Cognition
Recent advancements in Large Video-Language Models (LVLMs) have driven the
development of benchmarks designed to assess cognitive abilities in video-based tasks …
development of benchmarks designed to assess cognitive abilities in video-based tasks …
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time
Who is the US President? The answer changes depending on when the question is asked.
While large language models (LLMs) are evaluated on various reasoning tasks, they often …
While large language models (LLMs) are evaluated on various reasoning tasks, they often …