- Academic Search

A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations

MTR Laskar, S Alqahtani, MS Bari… - Proceedings of the …, 2024 - aclanthology.org

Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …

Save Cite Cited by 16 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com

Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

E Hoque, MS Islam - Computer Graphics Forum, 2024 - Wiley Online Library

Natural language and visualization are two complementary modalities of human
communication that play a crucial role in conveying information effectively. While …

Save Cite Cited by 1 Related articles All 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Alleviating hallucinations of large language models through induced hallucinations

Y Zhang, L Cui, W Bi, S Shi - ar**_a_Framework_for_Auditing_Large_Language_Models_Using_Human-in-the-Loop/links/65cdc8b6790074549791de40/Develo**-a-Framework-for-Auditing-Large-Language-Models-Using-Human-in-the-Loop.pdf" data-clk="hl=en&sa=T&oi=gga&ct=gga&cd=5&d=10236333257919685027&ei=0VKxZ9DXA4C96rQP29mI6AY" data-clk-atid="o-HRGN3CDo4J" target="_blank">[PDF] researchgate.net

[PDF][PDF] Develo** a framework for auditing large language models using human-in-the-loop

M Amirizaniani, J Yao, A Lavergne… - arxiv preprint arxiv …, 2024 - researchgate.net

* Work does not relate to position at Amazon. Authors' addresses: Maryam Amirizaniani,
amaryam@ uw. edu, University of Washington, Seattle, WA, USA; Jihan Yao, jihany2@ uw …

Save Cite Cited by 11 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

K Krishna, S Ramprasad, P Gupta, BC Wallace… - arxiv preprint arxiv …, 2024 - arxiv.org

LLMs can generate factually incorrect statements even when provided access to reference
documents. Such errors can be dangerous in high-stakes applications (eg, document …

Save Cite Cited by 6 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

J Zhang, L Xue, L Song, J Wang, W Huang… - arxiv preprint arxiv …, 2024 - arxiv.org

With the rise of multimodal applications, instruction data has become critical for training
multimodal language models capable of understanding complex image-based queries …

Save Cite Cited by 1 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

An Audit on the Perspectives and Challenges of Hallucinations in NLP

PN Venkit, T Chakravorti, V Gupta… - Proceedings of the …, 2024 - aclanthology.org

We audit how hallucination in large language models (LLMs) is characterized in peer-
reviewed literature, using a critical examination of 103 publications across NLP research …

Save Cite Cited by 1 Related articles View as HTML

A Comparative Analysis of Text-Based Explainable Recommender Systems

A Ariza-Casabona, L Boratto, M Salamó - Proceedings of the 18th ACM …, 2024 - dl.acm.org

One way to increase trust among users towards recommender systems is to provide the
recommendation along with a textual explanation. In the literature, extraction-based …

Save Cite Cited by 1 Related articles

Create alert

Cite

Advanced search

Saved to My library

Delucionqa: Detecting hallucinations in domain-specific question answering

A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations

Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

Alleviating hallucinations of large language models through induced hallucinations

[PDF][PDF] Develo** a framework for auditing large language models using human-in-the-loop

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

An Audit on the Perspectives and Challenges of Hallucinations in NLP

A Comparative Analysis of Text-Based Explainable Recommender Systems