Študovňa Google

C Xu, S Guan, D Greene, M Kechadi - arxiv preprint arxiv:2406.04244, 2024 - arxiv.org

The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and
Gemini has transformed the field of natural language processing. However, it has also …

Uložiť Citovať Citované 28-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An Empirical Analysis of Uncertainty in Large Language Model Evaluations

Q **e, Q Li, Z Yu, Y Zhang, Y Zhang, L Yang - arxiv preprint arxiv …, 2025 - arxiv.org

As LLM-as-a-Judge emerges as a new paradigm for assessing large language models
(LLMs), concerns have been raised regarding the alignment, bias, and stability of LLM …

Uložiť Citovať Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Outcome-Refining Process Supervision for Code Generation

Z Yu, W Gu, Y Wang, Z Zeng, J Wang, W Ye… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models have demonstrated remarkable capabilities in code generation, yet
they often struggle with complex programming tasks that require deep algorithmic …

Uložiť Citovať Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text

R Ghosh, T Yao, L Chen, S Hasan, T Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Model (LLM) integrations into applications like Microsoft365 suite and
Google Workspace for creating/processing documents, emails, presentations, etc. has led to …

Uložiť Citovať Súvisiace články Všetky verzie 2 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Freeeval: A modular framework for trustworthy and efficient evaluation of large language models

Benchmark data contamination of large language models: A survey

An Empirical Analysis of Uncertainty in Large Language Model Evaluations

Outcome-Refining Process Supervision for Code Generation

SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text