Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
Recent research shows that pre-trained language models (PLMs) suffer from" prompt bias"
in factual knowledge extraction, ie, prompts tend to introduce biases toward specific labels …
in factual knowledge extraction, ie, prompts tend to introduce biases toward specific labels …
Panda: Prompt transfer meets knowledge distillation for efficient model adaptation
Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by
initializing the target prompt with the existing prompt trained on similar source tasks …
initializing the target prompt with the existing prompt trained on similar source tasks …
Prompt-learning for cross-lingual relation extraction
Relation Extraction (RE) is a crucial task in Information Extraction, which entails predicting
relationships between entities within a given sentence. However, extending pre-trained RE …
relationships between entities within a given sentence. However, extending pre-trained RE …
Divide, conquer, and combine: Mixture of semantic-independent experts for zero-shot dialogue state tracking
Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of
task-oriented dialogue domains without the cost of collecting in-domain data. Existing works …
task-oriented dialogue domains without the cost of collecting in-domain data. Existing works …
Zero-shot sharpness-aware quantization for pre-trained language models
Quantization is a promising approach for reducing memory overhead and accelerating
inference, especially in large pre-trained language model (PLM) scenarios. While having no …
inference, especially in large pre-trained language model (PLM) scenarios. While having no …
Diversifying the mixture-of-experts representation for language models with orthogonal optimizer
The Mixture of Experts (MoE) has emerged as a highly successful technique in deep
learning, based on the principle of divide-and-conquer to maximize model capacity without …
learning, based on the principle of divide-and-conquer to maximize model capacity without …
Bag of tricks for effective language model pretraining and downstream adaptation: A case study on glue
This technical report briefly describes our JDExplore d-team's submission Vega v1 on the
General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a …
General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a …
JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far
The paper presents the JSI and WüNLP systems submitted to the DIALECT-COPA shared
task on causal commonsense reasoning in dialectal texts. Jointly, we compare LLM-based …
task on causal commonsense reasoning in dialectal texts. Jointly, we compare LLM-based …
MPMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism
In recent years, the Mixture-of-Experts (MoE) technique has gained widespread popularity
as a means to scale pre-trained models to exceptionally large sizes. Dynamic activation of …
as a means to scale pre-trained models to exceptionally large sizes. Dynamic activation of …
PAD-net: An efficient framework for dynamic networks
Dynamic networks, eg, Dynamic Convolution (DY-Conv) and the Mixture of Experts (MoE),
have been extensively explored as they can considerably improve the model's …
have been extensively explored as they can considerably improve the model's …