Pruning as a Domain-specific LLM Extractor

N Zhang, Y Liu, X Zhao, W Cheng, R Bao… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array
of NLP tasks. However, the escalation in model size also engenders substantial deployment …

MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization

G Balde, S Roy, M Mondal, N Ganguly - arxiv preprint arxiv:2405.04163, 2024 - arxiv.org
This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-
trained language models (PLMs) like BertSumAbs, BART, and PEGASUS for improved …

Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models

G Balde, S Roy, M Mondal, N Ganguly - arxiv preprint arxiv:2410.03258, 2024 - arxiv.org
In this work, we show a fundamental limitation in vocabulary adaptation approaches that use
Byte-Pair Encoding (BPE) tokenization scheme for fine-tuning pretrained language models …

Self-supervised Segment Contrastive Learning for Medical Document Representation

WA Abro, H Kteich, Z Bouraoui - International Conference on Artificial …, 2024 - Springer
Learning high-quality text embedding is vital for biomedical topic classification and many
other NLP tasks. Contrastive learning has shown remarkable performance in generating …