- Academic Search

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org

Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

Uložit Citovat Počet citací tohoto článku: 18 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large language models for data annotation and synthesis: A survey

Z Tan, D Li, S Wang, A Beigi, B Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org

Data annotation and synthesis generally refers to the labeling or generating of raw data with
relevant information, which could be used for improving the efficacy of machine learning …

Uložit Citovat Počet citací tohoto článku: 13 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning

Y Tong, D Li, S Wang, Y Wang, F Teng… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent works have shown the benefits to LLMs from fine-tuning golden-standard Chain-of-
Thought (CoT) rationales or using them as correct examples in few-shot prompting. While …

Uložit Citovat Počet citací tohoto článku: 27 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Weak-to-strong reasoning

Y Yang, Y Ma, P Liu - arxiv preprint arxiv:2407.13647, 2024 - arxiv.org

When large language models (LLMs) exceed human-level capabilities, it becomes
increasingly challenging to provide full-scale and accurate supervision for these models …

Uložit Citovat Počet citací tohoto článku: 8 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Language Model Preference Evaluation with Multiple Weak Evaluators

Z Hu, J Zhang, Z **ong, A Ratner, H **ong… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite the remarkable success of Large Language Models (LLMs), evaluating their outputs'
quality regarding preference remains a critical challenge. Existing works usually leverage a …

Uložit Citovat Počet citací tohoto článku: 1 Související články Všechny verze (počet: 3) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Optimizing Language Model's Reasoning Abilities with Weak Supervision

From generation to judgment: Opportunities and challenges of llm-as-a-judge

Large language models for data annotation and synthesis: A survey

Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning

Weak-to-strong reasoning

Language Model Preference Evaluation with Multiple Weak Evaluators