Large language model alignment: A survey T Shen, R Jin, Y Huang, C Liu, W Dong, Z Guo, X Wu, Y Liu, D Xiong arXiv preprint arXiv:2309.15025, 2023 | 149 | 2023 |
Depn: Detecting and editing privacy neurons in pretrained language models X Wu, J Li, M Xu, W Dong, S Wu, C Bian, D Xiong Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 70 | 2023 |
Exploring Multilingual Human Value Concepts in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages? S Xu, W Dong, Z Guo, X Wu, D Xiong Findings of the Association for Computational Linguistics: EMNLP 2024, 1771–1793, 2024 | 10 | 2024 |
Fewfedweight: Few-shot federated learning framework across multiple nlp tasks W Dong, X Wu, J Li, S Wu, C Bian, D Xiong arXiv preprint arXiv:2212.08354, 2022 | 8 | 2022 |
Action unit detection with joint adaptive attention and graph relation C Zhang, J Song, Q Zhang, W Dong, R Ding, Z Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 6 | 2021 |
Ircan: Mitigating knowledge conflicts in llm generation via identifying and reweighting context-aware neurons D Shi, R Jin, T Shen, W Dong, X Wu, D Xiong The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 | 5 | 2024 |
Swing distillation: A privacy-preserving knowledge distillation framework J Li, X Wu, W Dong, S Wu, C Bian, D Xiong arXiv preprint arXiv:2212.08349, 2022 | 4 | 2022 |
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation W Dong, X Wu, R Jin, S Xu, D Xiong arXiv preprint arXiv:2405.13578, 2024 | 3 | 2024 |
Mitigating privacy seesaw in large language models: Augmented privacy neuron editing via activation patching X Wu, W Dong, S Xu, D Xiong Findings of the Association for Computational Linguistics ACL 2024, 5319-5332, 2024 | 2 | 2024 |