Automated design of agentic systems

S Hu, C Lu, J Clune - ar** powerful general-purpose agents,
wherein Foundation Models are used as modules within agentic systems (eg Chain-of …

Automating the Search for Artificial Life with Foundation Models

A Kumar, C Lu, L Kirsch, Y Tang, KO Stanley… - arxiv preprint arxiv …, 2024 - arxiv.org
With the recent Nobel Prize awarded for radical advances in protein discovery, foundation
models (FMs) for exploring large combinatorial spaces promise to revolutionize many …

Learning Loss Landscapes in Preference Optimization

C Alfano, S Sapora, JN Foerster, P Rebeschini… - arxiv preprint arxiv …, 2024 - arxiv.org
We present an empirical study investigating how specific properties of preference datasets,
such as mixed-quality or noisy data, affect the performance of Preference Optimization (PO) …

Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift

S Son, W Bankes, SR Chowdhury, B Paige… - arxiv preprint arxiv …, 2024 - arxiv.org
Reinforcement learning from human feedback (RLHF) aligns Large Language Models
(LLMs) with human preferences. However, these preferences can often change over time …

RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation

I Poey, J Liu, Q Zhong, A Chenailler - arxiv preprint arxiv:2411.03920, 2024 - arxiv.org
Real-time detection of out-of-context LLM outputs is crucial for enterprises looking to safely
adopt RAG applications. In this work, we train lightweight models to discriminate LLM …

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

G Chua, SY Chan, S Khoo - arxiv preprint arxiv:2411.12946, 2024 - arxiv.org
Large Language Models are prone to off-topic misuse, where users may prompt these
models to perform tasks beyond their intended scope. Current guardrails, which often rely on …

[PDF][PDF] The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

RT Lange, A Prasad, Q Sun, M Faldor, Y Tang, D Ha - 2025 - pub.sakana.ai
The demand for computational power in machine learning has increased exponentially over
the past decade, driven by the rising complexity of deep learning models and the need for …

Quality-Diversity Self-Play: Open-Ended Strategy Innovation via Foundation Models

A Dharna, C Lu, J Clune - NeurIPS 2024 Workshop on Open-World Agents - openreview.net
Multi-agent dynamics have powered innovation from time immemorial, such as scientific
innovations during the space race or predator-prey dynamics in the natural world. The …

Let Large Language Models Find the Data to Train Themselves

F Wan, D Cai, S Huang, X Quan, M Wang - openreview.net
The current iterative development process for large language models (LLMs) is heavily data-
centric, relying on human researchers and engineers to manually analyze model …

Beyond Benchmarking: Automated Capability Discovery via Model Self-Exploration

C Lu, S Hu, J Clune - Language Gamification-NeurIPS 2024 Workshop - openreview.net
Large language and foundation models have become ubiquitous as general-purpose
assistants, exhibiting diverse capabilities across a wide variety of domains through training …