Google 학술 검색

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org

Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

저장 인용 14회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Leveraging large language models for nlg evaluation: Advances and challenges

Z Li, X Xu, T Shen, C Xu, JC Gu, Y Lai… - Proceedings of the …, 2024 - aclanthology.org

In the rapidly evolving domain of Natural Language Generation (NLG) evaluation,
introducing Large Language Models (LLMs) has opened new avenues for assessing …

저장 인용 10회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] rivista.ai

[PDF][PDF] Meta-rewarding language models: Self-improving alignment with llm-as-a-meta-judge

T Wu, W Yuan, O Golovneva, J Xu, Y Tian, J Jiao… - arxiv preprint arxiv …, 2024 - rivista.ai

ABSTRACT Large Language Models (LLMs) are rapidly surpassing human knowledge in
many domains. While improving these models traditionally relies on costly human data …

저장 인용 41회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundational autoraters: Taming large language models for better automatic evaluation

T Vu, K Krishna, S Alzubi, C Tar, M Faruqui… - arxiv preprint arxiv …, 2024 - arxiv.org

As large language models (LLMs) advance, it becomes more challenging to reliably
evaluate their output due to the high costs of human evaluation. To make progress towards …

저장 인용 23회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audiobench: A universal benchmark for audio large language models

B Wang, X Zou, G Lin, S Sun, Z Liu, W Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce AudioBench, a universal benchmark designed to evaluate Audio Large
Language Models (AudioLLMs). It encompasses 8 distinct tasks and 26 datasets, among …

저장 인용 18회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recommendation with generative models

Y Deldjoo, Z He, J McAuley, A Korikov… - arxiv preprint arxiv …, 2024 - arxiv.org

Generative models are a class of AI models capable of creating new instances of data by
learning and sampling from their statistical distributions. In recent years, these models have …

저장 인용 11회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cheating automatic llm benchmarks: Null models achieve high win rates

X Zheng, T Pang, C Du, Q Liu, J Jiang, M Lin - arxiv preprint arxiv …, 2024 - arxiv.org

Automatic LLM benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench,
have become popular for evaluating language models due to their cost-effectiveness and …

저장 인용 7회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-generated critiques boost reward modeling for language models

Y Yu, Z Chen, A Zhang, L Tan, C Zhu, RY Pang… - arxiv preprint arxiv …, 2024 - arxiv.org

Reward modeling is crucial for aligning large language models (LLMs) with human
preferences, especially in reinforcement learning from human feedback (RLHF). However …

저장 인용 5회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CopyBench: Measuring literal and non-literal reproduction of copyright-protected text in language model generation

T Chen, A Asai, N Mireshghallah, S Min… - arxiv preprint arxiv …, 2024 - arxiv.org

Evaluating the degree of reproduction of copyright-protected content by language models
(LMs) is of significant interest to the AI and legal communities. Although both literal and non …

저장 인용 5회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning to refine with fine-grained natural language feedback

M Wadhwa, X Zhao, JJ Li, G Durrett - arxiv preprint arxiv:2407.02397, 2024 - arxiv.org

Recent work has explored the capability of large language models (LLMs) to identify and
correct errors in LLM-generated responses. These refinement approaches frequently …

저장 인용 5회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Prometheus 2: An open source language model specialized in evaluating other language models

From generation to judgment: Opportunities and challenges of llm-as-a-judge

Leveraging large language models for nlg evaluation: Advances and challenges

[PDF][PDF] Meta-rewarding language models: Self-improving alignment with llm-as-a-meta-judge

Foundational autoraters: Taming large language models for better automatic evaluation

Audiobench: A universal benchmark for audio large language models

Recommendation with generative models

Cheating automatic llm benchmarks: Null models achieve high win rates

Self-generated critiques boost reward modeling for language models

CopyBench: Measuring literal and non-literal reproduction of copyright-protected text in language model generation

Learning to refine with fine-grained natural language feedback