A survey on large language models for software engineering

Q Zhang, C Fang, Y **e, Y Zhang, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Software Engineering (SE) is the systematic design, development, maintenance, and
management of software applications underpinning the digital infrastructure of our modern …

Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference

B Warner, A Chaffin, B Clavié, O Weller… - arxiv preprint arxiv …, 2024 - arxiv.org
Encoder-only transformer models such as BERT offer a great performance-size tradeoff for
retrieval and classification tasks with respect to larger decoder-only models. Despite being …

O1 Embedder: Let Retrievers Think Before Action

R Yan, Z Liu, D Lian - arxiv preprint arxiv:2502.07555, 2025 - arxiv.org
The growing power of large language models (LLMs) has revolutionized how people access
and utilize information. Notably, the LLMs excel at performing fine-grained data …

CoRNStack: High-Quality Contrastive Data for Better Code Ranking

T Suresh, RG Reddy, Y Xu, Z Nussbaum… - arxiv preprint arxiv …, 2024 - arxiv.org
Effective code retrieval plays a crucial role in advancing code generation, bug fixing, and
software maintenance, particularly as software systems increase in complexity. While current …

A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?

QH Chen, J Li, J Deng, J Yu, JTJ Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in Large Language Models (LLMs) have led to their widespread
application in automated code generation. However, these models can still generate …

CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking

T Suresh, RG Reddy, Y Xu, Z Nussbaum… - … Conference on Learning … - openreview.net
Effective code retrieval plays a crucial role in advancing code generation, bug fixing, and
software maintenance, particularly as software systems increase in complexity. While current …