Google Академія

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Зберегти Послатися Цитовано в 2311 джерелах Пов’язані статті Кількість версій: 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

C Wang, X Liu, Y Yue, X Tang, T Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …

Зберегти Послатися Цитовано в 187 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] utk.edu

[PDF][PDF] Trustllm: Trustworthiness in large language models

L Sun, Y Huang, H Wang, S Wu, Q Zhang… - arxiv preprint arxiv …, 2024 - mosis.eecs.utk.edu

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Зберегти Послатися Цитовано в 262 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization

Y Wang, Z Yu, Z Zeng, L Yang, C Wang, H Chen… - arxiv preprint arxiv …, 2023 - arxiv.org

Instruction tuning large language models (LLMs) remains a challenging task, owing to the
complexity of hyperparameter selection and the difficulty involved in evaluating the tuned …

Зберегти Послатися Цитовано в 205 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Зберегти Послатися Цитовано в 61 джерелах Пов’язані статті Кількість версій: 11 Кеш

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Does fine-tuning LLMs on new knowledge encourage hallucinations?

Z Gekhman, G Yona, R Aharoni, M Eyal… - arxiv preprint arxiv …, 2024 - arxiv.org

When large language models are aligned via supervised fine-tuning, they may encounter
new factual information that was not acquired through pre-training. It is often conjectured that …

Зберегти Послатися Цитовано в 79 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Investigating the factual knowledge boundary of large language models with retrieval augmentation

R Ren, Y Wang, Y Qu, WX Zhao, J Liu, H Tian… - arxiv preprint arxiv …, 2023 - arxiv.org

Knowledge-intensive tasks (eg, open-domain question answering (QA)) require a
substantial amount of factual knowledge and often rely on external information for …

Зберегти Послатися Цитовано в 98 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Leave no document behind: Benchmarking long-context llms with extended multi-doc qa

M Wang, L Chen, F Cheng, S Liao… - Proceedings of the …, 2024 - aclanthology.org

Long-context modeling capabilities of Large Language Models (LLMs) have garnered
widespread attention, leading to the emergence of LLMs with ultra-context windows …

Зберегти Послатися Цитовано в 34 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in llms

T Ceron, N Falk, A Barić, D Nikolaev… - Transactions of the …, 2024 - direct.mit.edu

Due to the widespread use of large language models (LLMs), we need to understand
whether they embed a specific “worldview” and what these views reflect. Recent studies …

Зберегти Послатися Цитовано в 12 джерелах Пов’язані статті Кількість версій: 6

[Free GPT-4]
[DeepSeek]

[PDF] polyu.edu.hk

Unveiling the clinical incapabilities: a benchmarking study of GPT-4V (ision) for ophthalmic multimodal image analysis

P Xu, X Chen, Z Zhao, D Shi - British Journal of Ophthalmology, 2024 - bjo.bmj.com

Purpose To evaluate the capabilities and incapabilities of a GPT-4V (ision)-based chatbot in
interpreting ocular multimodal images. Methods We developed a digital ophthalmologist app …

Зберегти Послатися Цитовано в 17 джерелах Пов’язані статті Кількість версій: 6

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Evaluating open-qa evaluation

A survey on evaluation of large language models

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

[PDF][PDF] Trustllm: Trustworthiness in large language models

Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Does fine-tuning LLMs on new knowledge encourage hallucinations?

Investigating the factual knowledge boundary of large language models with retrieval augmentation

Leave no document behind: Benchmarking long-context llms with extended multi-doc qa

Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in llms

Unveiling the clinical incapabilities: a benchmarking study of GPT-4V (ision) for ophthalmic multimodal image analysis