Google 학술 검색

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org

Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

저장 인용 20회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Future events as backdoor triggers: Investigating temporal vulnerabilities in llms

S Price, A Panickssery, S Bowman… - arxiv preprint arxiv …, 2024 - arxiv.org

Backdoors are hidden behaviors that are only triggered once an AI system has been
deployed. Bad actors looking to create successful backdoors must design them to avoid …

저장 인용 5회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Is your llm outdated? evaluating llms at temporal generalization

C Zhu, N Chen, Y Gao, Y Zhang, P Tiwari… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid advancement of Large Language Models (LLMs) highlights the urgent need for
evolving evaluation methodologies that keep pace with improvements in language …

저장 인용 3회 인용 관련 학술자료 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhancing logical reasoning in large language models through graph-based synthetic data

J Zhou, A Ghaddar, G Zhang, L Ma, Y Hu, S Pal… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite recent advances in training and prompting strategies for Large Language Models
(LLMs), these models continue to face challenges with complex logical reasoning tasks that …

저장 인용 2회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

Graph Reasoning with LLMs (GReaL)

A Tsitsulin, B Perozzi, B Fatemi… - Proceedings of the 30th …, 2024 - dl.acm.org

Graphs are a powerful tool for representing and analyzing complex relationships in real-
world applications. Large Language Models (LLMs) have demonstrated impressive …

저장 인용 1회 인용 관련 학술자료

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Perceive the Passage of Time: A Systematic Evaluation of Large Language Model in Temporal Relativity

S Chen, Y Zheng, S Li, Q Cheng… - Proceedings of the 31st …, 2025 - aclanthology.org

Temporal perception is crucial for Large Language Models (LLMs) to effectively understand
the world. However, current benchmarks primarily focus on temporal reasoning, falling short …

저장 인용 관련 학술자료 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints

D Handa, P Dolin, S Kumbhar, TC Son… - arxiv preprint arxiv …, 2024 - arxiv.org

Reasoning about Actions and Change (RAC) has historically played a pivotal role in solving
foundational AI problems, such as the frame problem. It has driven advancements in AI …

저장 인용 1회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains

Y Park, C Yoon, J Park, D Lee, M Jeong… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have significantly impacted many aspects of our lives.
However, assessing and ensuring their chronological knowledge remains challenging …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges in Video Cognition

C Li, Q Chen, Z Li, F Tao, Y Zhang - arxiv preprint arxiv:2411.09105, 2024 - arxiv.org

Recent advancements in Large Video-Language Models (LVLMs) have driven the
development of benchmarks designed to assess cognitive abilities in video-based tasks …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time

D Herel, V Bartek, T Mikolov - arxiv preprint arxiv:2409.13338, 2024 - arxiv.org

Who is the US President? The answer changes depending on when the question is asked.
While large language models (LLMs) are evaluated on various reasoning tasks, they often …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Test of time: A benchmark for evaluating llms on temporal reasoning

From generation to judgment: Opportunities and challenges of llm-as-a-judge

Future events as backdoor triggers: Investigating temporal vulnerabilities in llms

Is your llm outdated? evaluating llms at temporal generalization

Enhancing logical reasoning in large language models through graph-based synthetic data

Graph Reasoning with LLMs (GReaL)

Perceive the Passage of Time: A Systematic Evaluation of Large Language Model in Temporal Relativity

ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints

ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains

VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges in Video Cognition

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time