Ikuti
Jiaxin Wen
Jiaxin Wen
Email yang diverifikasi di mails.tsinghua.edu.cn - Beranda
Judul
Dikutip oleh
Dikutip oleh
Tahun
Unveiling the implicit toxicity in large language models
J Wen, P Ke, H Sun, Z Zhang, C Li, J Bai, M Huang
arXiv preprint arXiv:2311.17391, 2023
592023
Robustness testing of language understanding in task-oriented dialog
J Liu, R Takanobu, J Wen, D Wan, H Li, W Nie, C Li, W Peng, M Huang
arXiv preprint arXiv:2012.15262, 2020
532020
Augesc: Dialogue augmentation with large language models for emotional support conversation
C Zheng, S Sabour, J Wen, Z Zhang, M Huang
arXiv preprint arXiv:2202.13047, 2022
502022
A chatbot for mental health support: exploring the impact of Emohaa on reducing mental distress in China
S Sabour, W Zhang, X Xiao, Y Zhang, Y Zheng, J Wen, J Zhao, M Huang
Frontiers in digital health 5, 1133987, 2023
492023
Eva2. 0: Investigating open-domain chinese dialogue systems with large-scale pre-training
Y Gu, J Wen, H Sun, Y Song, P Ke, C Zheng, Z Zhang, J Yao, L Liu, X Zhu, ...
Machine Intelligence Research 20 (2), 207-219, 2023
452023
Ethicist: Targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation
Z Zhang, J Wen, M Huang
arXiv preprint arXiv:2307.04401, 2023
292023
Persona-guided planning for controlling the protagonist's persona in story generation
Z Zhang, J Wen, J Guan, M Huang
arXiv preprint arXiv:2204.10703, 2022
232022
Augesc: Large-scale data augmentation for emotional support conversation with pre-trained language models
C Zheng, S Sabour, J Wen, M Huang
arXiv preprint arXiv:2202.13047, 2022
172022
Autocad: Automatically generating counterfactuals for mitigating shortcut learning
J Wen, Y Zhu, J Zhang, J Zhou, M Huang
arXiv preprint arXiv:2211.16202, 2022
132022
Language models learn to mislead humans via rlhf
J Wen, R Zhong, A Khan, E Perez, J Steinhardt, M Huang, SR Bowman, ...
arXiv preprint arXiv:2409.12822, 2024
122024
Learning task decomposition to assist humans in competitive programming
J Wen, R Zhong, P Ke, Z Shao, H Wang, M Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024
42024
Adaptivebackdoor: Backdoored language model agents that detect human overseers
H Wang, R Zhong, J Wen, J Steinhardt
ICML 2024 Next Generation of AI Safety Workshop, 2024
32024
Robustness testing of language understanding in dialog systems
J Liu, R Takanobu, J Wen, D Wan, W Nie, H Li, C Li, W Peng, M Huang
CoRR, abs, 2012
32012
Codeplan: Unlocking reasoning potential in large langauge models by scaling code-form planning
J Wen, J Guan, H Wang, W Wu, M Huang
CoRR, 2024
22024
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
J Wen, V Hebbar, C Larson, A Bhatt, A Radhakrishnan, M Sharma, ...
arXiv preprint arXiv:2411.17693, 2024
2024
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning
J Wen, J Guan, H Wang, W Wu, M Huang
arXiv preprint arXiv:2409.12452, 2024
2024
Re3Dial: Retrieve, Reorganize and Rescale Conversations for Long-Turn Open-Domain Dialogue Pre-training
J Wen, H Zhou, J Guan, J Zhou, M Huang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023
2023
SmartBackdoor: Malicious Language Model Agents that Avoid Being Caught
H Wang, R Zhong, J Wen, J Steinhardt
Sistem tidak dapat melakukan operasi ini. Coba lagi nanti.
Artikel 1–18