From generation to judgment: Opportunities and challenges of llm-as-a-judge
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …
and natural language processing (NLP). However, traditional methods, whether matching …
Future events as backdoor triggers: Investigating temporal vulnerabilities in llms
Backdoors are hidden behaviors that are only triggered once an AI system has been
deployed. Bad actors looking to create successful backdoors must design them to avoid …
deployed. Bad actors looking to create successful backdoors must design them to avoid …
Enhancing logical reasoning in large language models through graph-based synthetic data
Despite recent advances in training and prompting strategies for Large Language Models
(LLMs), these models continue to face challenges with complex logical reasoning tasks that …
(LLMs), these models continue to face challenges with complex logical reasoning tasks that …
Graph Reasoning with LLMs (GReaL)
Graphs are a powerful tool for representing and analyzing complex relationships in real-
world applications. Large Language Models (LLMs) have demonstrated impressive …
world applications. Large Language Models (LLMs) have demonstrated impressive …
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
Reasoning about Actions and Change (RAC) has historically played a pivotal role in solving
foundational AI problems, such as the frame problem. It has driven advancements in AI …
foundational AI problems, such as the frame problem. It has driven advancements in AI …
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Large language models (LLMs) have significantly impacted many aspects of our lives.
However, assessing and ensuring their chronological knowledge remains challenging …
However, assessing and ensuring their chronological knowledge remains challenging …
Perceive the Passage of Time: A Systematic Evaluation of Large Language Model in Temporal Relativity
S Chen, Y Zheng, S Li, Q Cheng… - Proceedings of the 31st …, 2025 - aclanthology.org
Temporal perception is crucial for Large Language Models (LLMs) to effectively understand
the world. However, current benchmarks primarily focus on temporal reasoning, falling short …
the world. However, current benchmarks primarily focus on temporal reasoning, falling short …
VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges in Video Cognition
Recent advancements in Large Video-Language Models (LVLMs) have driven the
development of benchmarks designed to assess cognitive abilities in video-based tasks …
development of benchmarks designed to assess cognitive abilities in video-based tasks …
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time
Who is the US President? The answer changes depending on when the question is asked.
While large language models (LLMs) are evaluated on various reasoning tasks, they often …
While large language models (LLMs) are evaluated on various reasoning tasks, they often …
Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance
across diverse tasks. Despite great success, recent studies show that LVLMs encounter …
across diverse tasks. Despite great success, recent studies show that LVLMs encounter …