Survey on large language model-enhanced reinforcement learning: Concept, taxonomy, and methods
With extensive pretrained knowledge and high-level general capabilities, large language
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …
Aligning cyber space with physical world: A comprehensive survey on embodied ai
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Drivelm: Driving with graph visual question answering
We study how vision-language models (VLMs) trained on web-scale data can be integrated
into end-to-end driving systems to boost generalization and enable interactivity with human …
into end-to-end driving systems to boost generalization and enable interactivity with human …
Photorealistic video generation with diffusion models
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
Foundation models in robotics: Applications, challenges, and the future
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …
learning models in robotics are trained on small datasets tailored for specific tasks, which …
Octo: An open-source generalist robot policy
Large policies pretrained on diverse robot datasets have the potential to transform robotic
learning: instead of training new policies from scratch, such generalist robot policies may be …
learning: instead of training new policies from scratch, such generalist robot policies may be …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
Large language models as commonsense knowledge for large-scale task planning
Large-scale task planning is a major challenge. Recent work exploits large language
models (LLMs) directly as a policy and shows surprisingly interesting results. This paper …
models (LLMs) directly as a policy and shows surprisingly interesting results. This paper …
Netllm: Adapting large language models for networking
Many networking tasks now employ deep learning (DL) to solve complex prediction and
optimization problems. However, current design philosophy of DL-based algorithms entails …
optimization problems. However, current design philosophy of DL-based algorithms entails …
Fmb: a functional manipulation benchmark for generalizable robotic learning
In this paper, we propose a real-world benchmark for studying robotic learning in the context
of functional manipulation: a robot needs to accomplish complex long-horizon behaviors by …
of functional manipulation: a robot needs to accomplish complex long-horizon behaviors by …