A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations

MTR Laskar, S Alqahtani, MS Bari… - Proceedings of the …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …

A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets

MTR Laskar, MS Bari, M Rahman… - arxiv preprint arxiv …, 2023 - arxiv.org
The development of large language models (LLMs) such as ChatGPT has brought a lot of
attention recently. However, their evaluation in the benchmark academic datasets remains …

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

J Wang, JX Huang, X Tu, J Wang, AJ Huang… - ACM Computing …, 2024 - dl.acm.org
Recent years have witnessed a substantial increase in the use of deep learning to solve
various natural language processing (NLP) problems. Early deep learning models were …

BioBART: Pretraining and evaluation of a biomedical generative language model

H Yuan, Z Yuan, R Gan, J Zhang, Y **e… - arxiv preprint arxiv …, 2022 - arxiv.org
Pretrained language models have served as important backbones for natural language
processing. Recently, in-domain pretraining has been shown to benefit various domain …

[HTML][HTML] A comprehensive evaluation of large language models on benchmark biomedical text processing tasks

I Jahan, MTR Laskar, C Peng, JX Huang - Computers in biology and …, 2024 - Elsevier
Abstract Recently, Large Language Models (LLMs) have demonstrated impressive
capability to solve a wide range of tasks. However, despite their success across various …

DEPTWEET: A typology for social media texts to detect depression severities

M Kabir, T Ahmed, MB Hasan, MTR Laskar… - Computers in Human …, 2023 - Elsevier
Mental health research through data-driven methods has been hindered by a lack of
standard typology and scarcity of adequate data. In this study, we leverage the clinical …

Evaluation of chatgpt on biomedical tasks: A zero-shot comparison with fine-tuned generative transformers

I Jahan, MTR Laskar, C Peng, J Huang - arxiv preprint arxiv:2306.04504, 2023 - arxiv.org
ChatGPT is a large language model developed by OpenAI. Despite its impressive
performance across various tasks, no prior work has investigated its capability in the …

Building real-world meeting summarization systems using large language models: A practical perspective

MTR Laskar, XY Fu, C Chen, SB Tn - arxiv preprint arxiv:2310.19233, 2023 - arxiv.org
This paper studies how to effectively build meeting summarization systems for real-world
usage using large language models (LLMs). For this purpose, we conduct an extensive …

Unsupervised domain adaptation via progressive positioning of target-class prototypes

Y Du, Y Zhou, Y **e, D Zhou, J Shi, Y Lei - Knowledge-Based Systems, 2023 - Elsevier
Abstract Domain adaptation transfers knowledge from the source domain to the target
domain. The existing methods reduce the domain discrepancy by aligning domain …

Chartsumm: A comprehensive benchmark for automatic chart summarization of long and short summaries

R Rahman, R Hasan, AA Farhad, MTR Laskar… - arxiv preprint arxiv …, 2023 - arxiv.org
Automatic chart to text summarization is an effective tool for the visually impaired people
along with providing precise insights of tabular data in natural language to the user. A large …