Fine-tuning large vision-language models as decision-making agents via reinforcement learning

S Zhai, H Bai, Z Lin, J Pan, P Tong… - Advances in …, 2025 - proceedings.neurips.cc
Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following
data have exhibited impressive language reasoning capabilities across various scenarios …

Object goal navigation using goal-oriented semantic exploration

DS Chaplot, DP Gandhi, A Gupta… - Advances in Neural …, 2020 - proceedings.neurips.cc
This work studies the problem of object goal navigation which involves navigating to an
instance of the given object category in unseen environments. End-to-end learning-based …

Evolving curricula with regret-based environment design

J Parker-Holder, M Jiang, M Dennis… - International …, 2022 - proceedings.mlr.press
Training generally-capable agents with reinforcement learning (RL) remains a significant
challenge. A promising avenue for improving the robustness of RL agents is through the use …

Embodied intelligence via learning and evolution

A Gupta, S Savarese, S Ganguli, L Fei-Fei - Nature communications, 2021 - nature.com
The intertwined processes of learning and evolution in complex environmental niches have
resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal …

Learning to explore using active neural slam

DS Chaplot, D Gandhi, S Gupta, A Gupta… - arxiv preprint arxiv …, 2020 - arxiv.org
This work presents a modular and hierarchical approach to learn policies for exploring 3D
environments, calledActive Neural SLAM'. Our approach leverages the strengths of both …

Character controllers using motion vaes

HY Ling, F Zinno, G Cheng… - ACM Transactions on …, 2020 - dl.acm.org
A fundamental problem in computer animation is that of realizing purposeful and realistic
human movement given a sufficiently-rich set of motion capture clips. We learn data-driven …

Recurrent independent mechanisms

A Goyal, A Lamb, J Hoffmann, S Sodhani… - arxiv preprint arxiv …, 2019 - arxiv.org
Learning modular structures which reflect the dynamics of the environment can lead to better
generalization and robustness to changes which only affect a few of the underlying causes …

Reward constrained policy optimization

C Tessler, DJ Mankowitz, S Mannor - arxiv preprint arxiv:1805.11074, 2018 - arxiv.org
Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …

Unsupervised state representation learning in atari

A Anand, E Racah, S Ozair, Y Bengio… - Advances in neural …, 2019 - proceedings.neurips.cc
State representation learning, or the ability to capture latent generative factors of an
environment is crucial for building intelligent agents that can perform a wide variety of tasks …

Why generalization in rl is difficult: Epistemic pomdps and implicit partial observability

D Ghosh, J Rahme, A Kumar, A Zhang… - Advances in neural …, 2021 - proceedings.neurips.cc
Generalization is a central challenge for the deployment of reinforcement learning (RL)
systems in the real world. In this paper, we show that the sequential structure of the RL …