Where Are Large Language Models for Code Generation on GitHub?

X Yu, L Liu, X Hu, JW Keung, J Liu, X **a - arxiv preprint arxiv:2406.19544, 2024 - arxiv.org
The increasing use of Large Language Models (LLMs) in software development has
garnered significant attention from researchers assessing the quality of the code they …

Autoglm: Autonomous foundation agents for guis

X Liu, B Qin, D Liang, G Dong, H Lai, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation
agents for autonomous control of digital devices through Graphical User Interfaces (GUIs) …

Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning

K **ong, X Ding, L Du, J Ying, T Liu, B Qin… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are versatile and demonstrate impressive generalization
ability by mining and learning information from extensive unlabeled text. However, they still …

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

Y Huang, C Gao, S Wu, H Wang, X Wang… - arxiv preprint arxiv …, 2025 - arxiv.org
Generative Foundation Models (GenFMs) have emerged as transformative tools. However,
their widespread adoption raises critical concerns regarding trustworthiness across …

How Well Do LLMs Generate Code for Different Application Domains? Benchmark and Evaluation

D Zheng, Y Wang, E Shi, H Zhang, Z Zheng - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, an increasing number of AI-driven programming assistants powered by code
LLMs have been integrated into various real-world software development environments …

FVEval: Understanding Language Model Capabilities in Formal Verification of Digital Hardware

M Kang, M Liu, GB Hamad, S Suhaib, H Ren - arxiv preprint arxiv …, 2024 - arxiv.org
The remarkable reasoning and code generation capabilities of large language models
(LLMs) have spurred significant interest in applying LLMs to enable task automation in …

DataSciBench: An LLM Agent Benchmark for Data Science

D Zhang, S Zhoubian, M Cai, F Li, L Yang… - arxiv preprint arxiv …, 2025 - arxiv.org
This paper presents DataSciBench, a comprehensive benchmark for evaluating Large
Language Model (LLM) capabilities in data science. Recent related benchmarks have …

[PDF][PDF] Llms for malware offense and defense

K Roach - kellyroach.com
This survey paper explores the use of large language models (LLMs) in generating and
defending against malware. The paper summarizes news reports and research papers on …