Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

Z Xu, K Peng, L Ding, D Tao, X Lu - arxiv preprint arxiv:2403.09963, 2024 - arxiv.org
Recent research shows that pre-trained language models (PLMs) suffer from" prompt bias"
in factual knowledge extraction, ie, prompts tend to introduce biases toward specific labels …

Panda: Prompt transfer meets knowledge distillation for efficient model adaptation

Q Zhong, L Ding, J Liu, B Du… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by
initializing the target prompt with the existing prompt trained on similar source tasks …

Prompt-learning for cross-lingual relation extraction

C Hsu, C Zan, L Ding, L Wang, X Wang… - … Joint Conference on …, 2023 - ieeexplore.ieee.org
Relation Extraction (RE) is a crucial task in Information Extraction, which entails predicting
relationships between entities within a given sentence. However, extending pre-trained RE …

Divide, conquer, and combine: Mixture of semantic-independent experts for zero-shot dialogue state tracking

Q Wang, L Ding, Y Cao, Y Zhan, Z Lin, S Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of
task-oriented dialogue domains without the cost of collecting in-domain data. Existing works …

Zero-shot sharpness-aware quantization for pre-trained language models

M Zhu, Q Zhong, L Shen, L Ding, J Liu, B Du… - arxiv preprint arxiv …, 2023 - arxiv.org
Quantization is a promising approach for reducing memory overhead and accelerating
inference, especially in large pre-trained language model (PLM) scenarios. While having no …

Diversifying the mixture-of-experts representation for language models with orthogonal optimizer

B Liu, L Ding, L Shen, K Peng, Y Cao, D Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org
The Mixture of Experts (MoE) has emerged as a highly successful technique in deep
learning, based on the principle of divide-and-conquer to maximize model capacity without …

Bag of tricks for effective language model pretraining and downstream adaptation: A case study on glue

Q Zhong, L Ding, K Peng, J Liu, B Du, L Shen… - arxiv preprint arxiv …, 2023 - arxiv.org
This technical report briefly describes our JDExplore d-team's submission Vega v1 on the
General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a …

JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far

N Ljubešić, T Kuzman, P Rupnik, I Vulić… - Proceedings of the …, 2024 - aclanthology.org
The paper presents the JSI and WüNLP systems submitted to the DIALECT-COPA shared
task on causal commonsense reasoning in dialectal texts. Jointly, we compare LLM-based …

MPMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism

Z Zhang, Y **a, H Wang, D Yang, C Hu… - … on Parallel and …, 2024 - ieeexplore.ieee.org
In recent years, the Mixture-of-Experts (MoE) technique has gained widespread popularity
as a means to scale pre-trained models to exceptionally large sizes. Dynamic activation of …

PAD-net: An efficient framework for dynamic networks

S He, L Ding, D Dong, B Liu, F Yu, D Tao - arxiv preprint arxiv:2211.05528, 2022 - arxiv.org
Dynamic networks, eg, Dynamic Convolution (DY-Conv) and the Mixture of Experts (MoE),
have been extensively explored as they can considerably improve the model's …