- Academic Search

Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

Z Xu, K Peng, L Ding, D Tao, X Lu - arxiv preprint arxiv:2403.09963, 2024 - arxiv.org

Recent research shows that pre-trained language models (PLMs) suffer from" prompt bias"
in factual knowledge extraction, ie, prompts tend to introduce biases toward specific labels …

Opslaan Citeren Geciteerd door 19 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Panda: Prompt transfer meets knowledge distillation for efficient model adaptation

Q Zhong, L Ding, J Liu, B Du… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Prompt Transfer (PoT) is a recently-proposed approach to improve prompt-tuning, by
initializing the target prompt with the existing prompt trained on similar source tasks …

Opslaan Citeren Geciteerd door 39 Verwante artikelen Alle 5 versies

[Free GPT-4]

[PDF] arxiv.org

Prompt-learning for cross-lingual relation extraction

C Hsu, C Zan, L Ding, L Wang, X Wang… - … Joint Conference on …, 2023 - ieeexplore.ieee.org

Relation Extraction (RE) is a crucial task in Information Extraction, which entails predicting
relationships between entities within a given sentence. However, extending pre-trained RE …

Opslaan Citeren Geciteerd door 22 Verwante artikelen Alle 3 versies

[Free GPT-4]

[PDF] arxiv.org

Divide, conquer, and combine: Mixture of semantic-independent experts for zero-shot dialogue state tracking

Q Wang, L Ding, Y Cao, Y Zhan, Z Lin, S Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of
task-oriented dialogue domains without the cost of collecting in-domain data. Existing works …

Opslaan Citeren Geciteerd door 19 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Zero-shot sharpness-aware quantization for pre-trained language models

M Zhu, Q Zhong, L Shen, L Ding, J Liu, B Du… - arxiv preprint arxiv …, 2023 - arxiv.org

Quantization is a promising approach for reducing memory overhead and accelerating
inference, especially in large pre-trained language model (PLM) scenarios. While having no …

Opslaan Citeren Geciteerd door 4 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Diversifying the mixture-of-experts representation for language models with orthogonal optimizer

B Liu, L Ding, L Shen, K Peng, Y Cao, D Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org

The Mixture of Experts (MoE) has emerged as a highly successful technique in deep
learning, based on the principle of divide-and-conquer to maximize model capacity without …

Opslaan Citeren Geciteerd door 11 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Bag of tricks for effective language model pretraining and downstream adaptation: A case study on glue

Q Zhong, L Ding, K Peng, J Liu, B Du, L Shen… - arxiv preprint arxiv …, 2023 - arxiv.org

This technical report briefly describes our JDExplore d-team's submission Vega v1 on the
General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a …

Opslaan Citeren Geciteerd door 12 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]

[PDF] aclanthology.org

JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far

N Ljubešić, T Kuzman, P Rupnik, I Vulić… - Proceedings of the …, 2024 - aclanthology.org

The paper presents the JSI and WüNLP systems submitted to the DIALECT-COPA shared
task on causal commonsense reasoning in dialectal texts. Jointly, we compare LLM-based …

Opslaan Citeren Geciteerd door 3 Verwante artikelen Alle 2 versies HTML-versie

MPMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism

Z Zhang, Y **a, H Wang, D Yang, C Hu… - … on Parallel and …, 2024 - ieeexplore.ieee.org

In recent years, the Mixture-of-Experts (MoE) technique has gained widespread popularity
as a means to scale pre-trained models to exceptionally large sizes. Dynamic activation of …

Opslaan Citeren Geciteerd door 19 Verwante artikelen Alle 5 versies

[Free GPT-4]

[PDF] arxiv.org

PAD-net: An efficient framework for dynamic networks

S He, L Ding, D Dong, B Liu, F Yu, D Tao - arxiv preprint arxiv:2211.05528, 2022 - arxiv.org

Dynamic networks, eg, Dynamic Convolution (DY-Conv) and the Mixture of Experts (MoE),
have been extensively explored as they can considerably improve the model's …

Opslaan Citeren Geciteerd door 10 Verwante artikelen Alle 4 versies HTML-versie

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

Panda: Prompt transfer meets knowledge distillation for efficient model adaptation

Prompt-learning for cross-lingual relation extraction

Divide, conquer, and combine: Mixture of semantic-independent experts for zero-shot dialogue state tracking

Zero-shot sharpness-aware quantization for pre-trained language models

Diversifying the mixture-of-experts representation for language models with orthogonal optimizer

Bag of tricks for effective language model pretraining and downstream adaptation: A case study on glue

JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far

MPMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism

PAD-net: An efficient framework for dynamic networks