A survey of neural code intelligence: Paradigms, advances and beyond

Q Sun, Z Chen, F Xu, K Cheng, C Ma, Z Yin… - arxiv preprint arxiv …, 2024 - arxiv.org
Neural Code Intelligence--leveraging deep learning to understand, generate, and optimize
code--holds immense potential for transformative impacts on the whole society. Bridging the …

Evaluating and aligning codellms on human preference

J Yang, J Yang, K **, Y Miao, L Zhang, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Code large language models (codeLLMs) have made significant strides in code generation.
Most previous code-related benchmarks, which consist of various programming exercises …

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Z Luo, H Wu, D Li, J Ma, M Kankanhalli, J Li - arxiv preprint arxiv …, 2024 - arxiv.org
Large multimodal models (LMMs) with advanced video analysis capabilities have recently
garnered significant attention. However, most evaluations rely on traditional methods like …

CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge

Y Zhang, Q Lu, K Liu, W Dou, J Zhu, L Qian… - arxiv preprint arxiv …, 2025 - arxiv.org
Unit testing plays a pivotal role in the software development lifecycle, as it ensures code
quality. However, writing high-quality unit tests remains a time-consuming task for …

One-to-many testing for code generation from (just) natural language

M Uniyal, M Singh, G Verbruggen… - Findings of the …, 2024 - aclanthology.org
MBPP is a popular dataset for evaluating the task of code generation from natural language.
Despite its popularity, there are three problems:(1) it relies on providing test cases to …