On llms-driven synthetic data generation, curation, and evaluation: A survey

L Long, R Wang, R **ao, J Zhao, X Ding… - arxiv preprint arxiv …, 2024 - arxiv.org
Within the evolving landscape of deep learning, the dilemma of data quantity and quality has
been a long-standing problem. The recent advent of Large Language Models (LLMs) offers …

When to stop? towards efficient code generation in llms with excess token prevention

L Guo, Y Wang, E Shi, W Zhong, H Zhang… - Proceedings of the 33rd …, 2024 - dl.acm.org
Code generation aims to automatically generate code snippets that meet given natural
language requirements and plays an important role in software development. Although …

Supervised knowledge makes large language models better in-context learners

L Yang, S Zhang, Z Yu, G Bao, Y Wang, J Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) exhibit emerging in-context learning abilities through
prompt engineering. The recent progress in large-scale generative models has further …

Hide and seek in noise labels: Noise-robust collaborative active learning with LLMs-powered assistance

B Yuan, Y Chen, Y Zhang, W Jiang - Proceedings of the 62nd …, 2024 - aclanthology.org
Learning from noisy labels (LNL) is a challenge that arises in many real-world scenarios
where collected training data can contain incorrect or corrupted labels. Most existing …

Actively Learn from LLMs with Uncertainty Propagation for Generalized Category Discovery

J Liang, L Liao, H Fei, B Li, J Jiang - Proceedings of the 2024 …, 2024 - aclanthology.org
Generalized category discovery faces a key issue: the lack of supervision for new and
unseen data categories. Traditional methods typically combine supervised pretraining with …

Filling memory gaps: Enhancing continual semantic parsing via sql syntax variance-guided llms without real data replay

R Liu, J Zhang, Y Song, Y Zhang, B Yang - arxiv preprint arxiv …, 2024 - arxiv.org
Continual Semantic Parsing (CSP) aims to train parsers to convert natural language
questions into SQL across tasks with limited annotated examples, adapting to the real-world …

MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models

C Gan, Q Yin, X He, H Wei, Y Liang, Y Lim… - arxiv preprint arxiv …, 2024 - arxiv.org
The Mutual Reinforcement Effect (MRE) represents a promising avenue in information
extraction and multitasking research. Nevertheless, its applicability has been constrained …

Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation

H Rouzegar, M Makrehchi - arxiv preprint arxiv:2406.12114, 2024 - arxiv.org
In the context of text classification, the financial burden of annotation exercises for creating
training data is a critical issue. Active learning techniques, particularly those rooted in …

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

C Schröder, G Heyer - arxiv preprint arxiv:2406.09206, 2024 - arxiv.org
Active learning is an iterative labeling process that is used to obtain a small labeled subset,
despite the absence of labeled data, thereby enabling to train a model for supervised tasks …

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Y Zhou, J Zhu, P Xu, X Liu, X Wang, D Koutra… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have significantly advanced various natural language
processing tasks, but deploying them remains computationally expensive. Knowledge …