A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations

MTR Laskar, S Alqahtani, MS Bari… - Proceedings of the …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …

[HTML][HTML] A comprehensive evaluation of large language models on benchmark biomedical text processing tasks

I Jahan, MTR Laskar, C Peng, JX Huang - Computers in biology and …, 2024 - Elsevier
Abstract Recently, Large Language Models (LLMs) have demonstrated impressive
capability to solve a wide range of tasks. However, despite their success across various …

Investigating hallucinations in pruned large language models for abstractive summarization

G Chrysostomou, Z Zhao, M Williams… - Transactions of the …, 2024 - direct.mit.edu
Despite the remarkable performance of generative large language models (LLMs) on
abstractive summarization, they face two significant challenges: their considerable size and …

Cognitive overload: Jailbreaking large language models with overloaded logical thinking

N Xu, F Wang, B Zhou, BZ Li, C **ao… - arxiv preprint arxiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated increasing power, they have also
given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can …

CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization

F Kirstein, JP Wahle, B Gipp, T Ruas - Journal of Artificial Intelligence …, 2025 - jair.org
Abstractive dialogue summarization is the task of distilling conversations into informative
and concise summaries. Although focused reviews have been conducted on this topic, there …

Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology

R Hada, S Husain, V Gumma, H Diddee… - The 2024 ACM …, 2024 - dl.acm.org
Existing research in measuring and mitigating gender bias predominantly centers on
English, overlooking the intricate challenges posed by non-English languages and the …

Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

XY Fu, MTR Laskar, E Khasanova, C Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide
range of tasks without being explicitly fine-tuned on task-specific datasets. However …

What's Wrong? Refining Meeting Summaries with LLM Feedback

F Kirstein, T Ruas, B Gipp - arxiv preprint arxiv:2407.11919, 2024 - arxiv.org
Meeting summarization has become a critical task since digital encounters have become a
common practice. Large language models (LLMs) show great potential in summarization …

Exploring the opportunities of large language models for summarizing palliative care consultations: A pilot comparative study

X Chen, W Zhou, R Hoda, A Li, C Bain… - Digital Health, 2024 - journals.sagepub.com
Introduction Recent developments in the field of large language models have showcased
impressive achievements in their ability to perform natural language processing tasks …

TutoAI: a cross-domain framework for AI-assisted mixed-media tutorial creation on physical tasks

Y Chen, VI Morariu, A Truong, Z Liu - … of the CHI Conference on Human …, 2024 - dl.acm.org
Mixed-media tutorials, which integrate videos, images, text, and diagrams to teach
procedural skills, offer more browsable alternatives than timeline-based videos. However …