Perceiver-actor: A multi-task transformer for robotic manipulation
Transformers have revolutionized vision and natural language processing with their ability to
scale with large datasets. But in robotic manipulation, data is both limited and expensive …
scale with large datasets. But in robotic manipulation, data is both limited and expensive …
Blind image quality assessment via vision-language correspondence: A multitask learning perspective
We aim at advancing blind image quality assessment (BIQA), which predicts the human
perception of image quality without any reference information. We develop a general and …
perception of image quality without any reference information. We develop a general and …
Instruction-driven history-aware policies for robotic manipulations
In human environments, robots are expected to accomplish a variety of manipulation tasks
given simple natural language instructions. Yet, robotic manipulation is extremely …
given simple natural language instructions. Yet, robotic manipulation is extremely …
Manigaussian: Dynamic gaussian splatting for multi-task robotic manipulation
Performing language-conditioned robotic manipulation tasks in unstructured environments
is highly demanded for general intelligent robots. Conventional robotic manipulation …
is highly demanded for general intelligent robots. Conventional robotic manipulation …
Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation
We present ChainedDiffuser, a policy architecture that unifies action keypose prediction and
trajectory diffusion generation for learning robot manipulation from demonstrations. Our …
trajectory diffusion generation for learning robot manipulation from demonstrations. Our …
[PDF][PDF] Prismer: A vision-language model with an ensemble of experts
Recent vision-language models have shown impressive multi-modal generation
capabilities. However, typically they require training huge models on massive datasets. As a …
capabilities. However, typically they require training huge models on massive datasets. As a …
Act3d: Infinite resolution action detection transformer for robotic manipulation
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …
Act3d: 3d feature field transformers for multi-task robotic manipulation
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …
Image quality-aware diagnosis via meta-knowledge co-embedding
Medical images usually suffer from image degradation in clinical practice, leading to
decreased performance of deep learning-based models. To resolve this problem, most …
decreased performance of deep learning-based models. To resolve this problem, most …
Forkmerge: Mitigating negative transfer in auxiliary-task learning
Abstract Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by
leveraging the knowledge obtained from related tasks. Occasionally, learning multiple tasks …
leveraging the knowledge obtained from related tasks. Occasionally, learning multiple tasks …