A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …
their remarkable capabilities in performing diverse tasks across various domains. However …
A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets
The development of large language models (LLMs) such as ChatGPT has brought a lot of
attention recently. However, their evaluation in the benchmark academic datasets remains …
attention recently. However, their evaluation in the benchmark academic datasets remains …
Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges
Recent years have witnessed a substantial increase in the use of deep learning to solve
various natural language processing (NLP) problems. Early deep learning models were …
various natural language processing (NLP) problems. Early deep learning models were …
BioBART: Pretraining and evaluation of a biomedical generative language model
Pretrained language models have served as important backbones for natural language
processing. Recently, in-domain pretraining has been shown to benefit various domain …
processing. Recently, in-domain pretraining has been shown to benefit various domain …
[HTML][HTML] A comprehensive evaluation of large language models on benchmark biomedical text processing tasks
Abstract Recently, Large Language Models (LLMs) have demonstrated impressive
capability to solve a wide range of tasks. However, despite their success across various …
capability to solve a wide range of tasks. However, despite their success across various …
DEPTWEET: A typology for social media texts to detect depression severities
Mental health research through data-driven methods has been hindered by a lack of
standard typology and scarcity of adequate data. In this study, we leverage the clinical …
standard typology and scarcity of adequate data. In this study, we leverage the clinical …
Evaluation of chatgpt on biomedical tasks: A zero-shot comparison with fine-tuned generative transformers
ChatGPT is a large language model developed by OpenAI. Despite its impressive
performance across various tasks, no prior work has investigated its capability in the …
performance across various tasks, no prior work has investigated its capability in the …
Building real-world meeting summarization systems using large language models: A practical perspective
This paper studies how to effectively build meeting summarization systems for real-world
usage using large language models (LLMs). For this purpose, we conduct an extensive …
usage using large language models (LLMs). For this purpose, we conduct an extensive …
Unsupervised domain adaptation via progressive positioning of target-class prototypes
Abstract Domain adaptation transfers knowledge from the source domain to the target
domain. The existing methods reduce the domain discrepancy by aligning domain …
domain. The existing methods reduce the domain discrepancy by aligning domain …
Chartsumm: A comprehensive benchmark for automatic chart summarization of long and short summaries
Automatic chart to text summarization is an effective tool for the visually impaired people
along with providing precise insights of tabular data in natural language to the user. A large …
along with providing precise insights of tabular data in natural language to the user. A large …