On Transforming Reinforcement Learning With Transformers: The Development Trajectory

S Hu, L Shen, Y Zhang, Y Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Transformers, originally devised for natural language processing (NLP), have also produced
significant successes in computer vision (CV). Due to their strong expression power …

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arxiv preprint arxiv …, 2024 - arxiv.org
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

Q-value regularized transformer for offline reinforcement learning

S Hu, Z Fan, C Huang, L Shen, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in offline reinforcement learning (RL) have underscored the
capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action …

HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

S Hu, Z Fan, L Shen, Y Zhang, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy
applicable to diverse tasks without the need for online environmental interaction. Recent …

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

Z Wang, L Zhang, W Wu, Y Zhu, D Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
A longstanding goal of artificial general intelligence is highly capable generalists that can
learn from diverse experiences and generalize to unseen tasks. The language and vision …

Context-former: Stitching via latent conditioned sequence modeling

Z Zhang, J Xu, J Liu, Z Zhuang, D Wang, M Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Offline reinforcement learning (RL) algorithms can learn better decision-making compared to
behavior policies by stitching the suboptimal trajectories to derive more optimal ones …

Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

Y Yang, P Xu - arxiv preprint arxiv:2408.01402, 2024 - arxiv.org
Decision Transformer (DT) has emerged as a promising class of algorithms in offline
reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's …

Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

Y Dai, O Ma, L Zhang, X Liang, S Hu, M Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer-based trajectory optimization methods have demonstrated exceptional
performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to …

Task-Aware Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

Z Fan, S Hu, Y Zhou, L Shen, Y Zhang, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy
applicable to diverse tasks without the need for online environmental interaction. Recent …

Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy Generalization with Global and Adaptive

Z Wang, H Wang, Y Qi - arxiv preprint arxiv:2412.00979, 2024 - arxiv.org
Decision transformers recast reinforcement learning as a conditional sequence generation
problem, offering a simple but effective alternative to traditional value or policy-based …