Google 학술 검색

WL Chiang, L Zheng, Y Sheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have unlocked new capabilities and applications; however,
evaluating the alignment with human preferences still poses significant challenges. To …

저장 인용 384회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Generalization or memorization: Data contamination and trustworthy evaluation for large language models

Y Dong, X Jiang, H Liu, Z **, B Gu, M Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent statements about the impressive capabilities of large language models (LLMs) are
usually supported by evaluating on open-access benchmarks. Considering the vast size and …

저장 인용 48회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spiking-physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer

M Liu, J Tang, Y Chen, H Li, J Qi, S Li, K Wang, J Gan… - Neural Networks, 2025 - Elsevier

Artificial neural networks (ANNs) can help camera-based remote photoplethysmography
(rPPG) in measuring cardiac activity and physiological signals from facial videos, such as …

저장 인용 10회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Key-point-driven data synthesis with its enhancement on mathematical reasoning

Y Huang, X Liu, Y Gong, Z Gou, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have shown great potential in complex reasoning tasks, yet
their performance is often hampered by the scarcity of high-quality, reasoning-focused …

저장 인용 23회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories

J Li, G Li, X Zhang, Y Dong, Z ** - arxiv preprint arxiv:2404.00599, 2024 - arxiv.org

How to evaluate Large Language Models (LLMs) in code generation is an open question.
Existing benchmarks demonstrate poor alignment with real-world code repositories and are …

저장 인용 24회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Livecodebench: Holistic and contamination free evaluation of large language models for code

N Jain, K Han, A Gu, WD Li, F Yan, T Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) applied to code-related applications have emerged as a
prominent field, attracting significant interest from both academia and industry. However, as …

저장 인용 31회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Can Language Models Solve Olympiad Programming?

Q Shi, M Tang, K Narasimhan, S Yao - arxiv preprint arxiv:2404.10952, 2024 - arxiv.org

Computing olympiads contain some of the most challenging problems for humans, requiring
complex algorithmic reasoning, puzzle solving, in addition to generating efficient code …

저장 인용 14회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Benchmark Data Contamination of Large Language Models: A Survey

C Xu, S Guan, D Greene, M Kechadi - arxiv preprint arxiv:2406.04244, 2024 - arxiv.org

The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and
Gemini has transformed the field of natural language processing. However, it has also …

저장 인용 25회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Z Zeng, P Chen, H Jiang, J Jia - arxiv preprint arxiv:2312.17080, 2023 - arxiv.org

In this work, we introduce a novel evaluation paradigm for Large Language Models, one that
challenges them to engage in meta-reasoning. This approach addresses critical …

저장 인용 3회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Real-time Fake News from Adversarial Feedback

S Chen, Y Huang, B Dhingra - arxiv preprint arxiv:2410.14651, 2024 - arxiv.org

We show that existing evaluations for fake news detection based on conventional sources,
such as claims on fact-checking websites, result in high accuracies over time for LLM-based …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Competition-level problems are effective llm evaluators

Chatbot arena: An open platform for evaluating llms by human preference

Generalization or memorization: Data contamination and trustworthy evaluation for large language models

Spiking-physformer: Camera-based remote photoplethysmography with parallel spike-driven transformer

Key-point-driven data synthesis with its enhancement on mathematical reasoning

EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories

Livecodebench: Holistic and contamination free evaluation of large language models for code

Can Language Models Solve Olympiad Programming?

Benchmark Data Contamination of Large Language Models: A Survey

Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Real-time Fake News from Adversarial Feedback