Spremljaj
Yancheng He
Yancheng He
Alibaba Group
Preverjeni e-poštni naslov na taobao.com
Naslov
Navedeno
Navedeno
Leto
Mt-bench-101: A fine-grained benchmark for evaluating large language models in multi-turn dialogues
G Bai, J Liu, X Bu, Y He, J Liu, Z Zhou, Z Lin, W Su, T Ge, B Zheng, ...
arXiv preprint arXiv:2402.14762, 2024
532024
Graphreader: Building graph-based agent to enhance long-context abilities of large language models
S Li, Y He, H Guo, X Bu, G Bai, J Liu, J Liu, X Qu, Y Li, W Ouyang, W Su, ...
arXiv preprint arXiv:2406.14550, 2024
182024
Using auxiliary tasks in multimodal fusion of wav2vec 2.0 and bert for multimodal emotion recognition
D Sun, Y He, J Han
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
172023
Chinese simpleqa: A chinese factuality evaluation for large language models
Y He, S Li, J Liu, Y Tan, W Wang, H Huang, X Bu, H Guo, C Hu, B Zheng, ...
arXiv preprint arXiv:2411.07140, 2024
92024
Aspect-Sentiment-Multiple-Opinion Triplet Extraction
F Wang, Y Li, S Zhong, C Yin, Y He
Natural Language Processing and Chinese Computing: 10th CCF International …, 2021
42021
Token preference optimization with self-calibrated visual-anchored rewards for hallucination mitigation
J Gu, Y Wang, M Cao, P Bu, J Song, Y He, S Li, B Zheng
arXiv preprint arXiv:2412.14487, 2024
22024
MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training
H Huang, J Liu, Y He, S Li, B Xu, C Zhu, M Yang, T Zhao
arXiv preprint arXiv:2502.11541, 2025
12025
ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models
H Chen, K Lv, C Hu, Y Li, Y Yuan, Y He, X Zhang, L Liu, S Liu, W Su, ...
arXiv preprint arXiv:2502.20196, 2025
2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Y He, S Li, J Liu, W Wang, X Bu, G Zhang, Z Peng, Z Zhang, W Su, ...
arXiv preprint arXiv:2502.19361, 2025
2025
AIR: Complex Instruction Generation via Automatic Iterative Refinement
W Liu, Y He, H Huang, C Hu, J Liu, S Li, W Su, B Zheng
arXiv preprint arXiv:2502.17787, 2025
2025
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
A Zhang, M Dong, J Liu, W Zhang, Y Wang, J Yang, G Zhang, T Liu, ...
arXiv preprint arXiv:2502.16614, 2025
2025
" See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models
J Gu, Y Wang, P Bu, C Wang, Z Wang, T Song, D Wei, J Yuan, Y Zhao, ...
arXiv preprint arXiv:2502.11718, 2025
2025
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Y Tan, B Zheng, B Zheng, K Cao, H Jing, J Wei, J Liu, Y He, W Su, X Zhu, ...
arXiv preprint arXiv:2412.15265, 2024
2024
WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
C Hu, J Zheng, Y He, H Guo, J Jiang, H Zhu, K Sun, Y Jiang, W Su, ...
arXiv preprint arXiv:2412.03359, 2024
2024
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
S Li, Y He, H Huang, X Bu, J Liu, H Guo, W Wang, J Gu, W Su, B Zheng
arXiv preprint arXiv:2410.19720, 2024
2024
HITMI&T at SemEval-2022 Task 4: Investigating Task-Adaptive Pretraining And Attention Mechanism On PCL Detection
Z Liu, Y He, F Zhuang, B Xu
Proceedings of the 16th International Workshop on Semantic Evaluation …, 2022
2022
Sistem trenutno ne more izvesti postopka. Poskusite znova pozneje.
Članki 1–16