Scaling llm inference with optimized sample compute allocation

K Zhang, S Zhou, D Wang, WY Wang, L Li - arxiv preprint arxiv …, 2024‏ - arxiv.org
Sampling is a basic operation in many inference-time algorithms of large language models
(LLMs). To scale up inference efficiently with a limited compute, it is crucial to find an optimal …

LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-tailed Problems

P Wang, Z Zhao, HB Wen, F Wang… - Advances in …, 2025‏ - proceedings.neurips.cc
The long-tailed distribution is the underlying nature of real-world data, and it presents
unprecedented challenges for training deep learning models. Existing long-tailed learning …

The inherent predisposition of popular LLM services: Analysis of classification bias in GPT-4o mini, Mistral NeMo and Gemini 1.5 Flash

C De Nadai - 2024‏ - diva-portal.org
Abstract LLM (Large Language Models) is today the most popular form of neural networks
for generating and classifying text. These models are used in everything from chat systems …