Perceiver-actor: A multi-task transformer for robotic manipulation

M Shridhar, L Manuelli, D Fox - Conference on Robot …, 2023 - proceedings.mlr.press
Transformers have revolutionized vision and natural language processing with their ability to
scale with large datasets. But in robotic manipulation, data is both limited and expensive …

Blind image quality assessment via vision-language correspondence: A multitask learning perspective

W Zhang, G Zhai, Y Wei, X Yang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We aim at advancing blind image quality assessment (BIQA), which predicts the human
perception of image quality without any reference information. We develop a general and …

Instruction-driven history-aware policies for robotic manipulations

PL Guhur, S Chen, RG Pinel… - … on Robot Learning, 2023 - proceedings.mlr.press
In human environments, robots are expected to accomplish a variety of manipulation tasks
given simple natural language instructions. Yet, robotic manipulation is extremely …

Manigaussian: Dynamic gaussian splatting for multi-task robotic manipulation

G Lu, S Zhang, Z Wang, C Liu, J Lu, Y Tang - European Conference on …, 2024 - Springer
Performing language-conditioned robotic manipulation tasks in unstructured environments
is highly demanded for general intelligent robots. Conventional robotic manipulation …

Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation

Z **an, N Gkanatsios, T Gervet, TW Ke… - … Annual Conference on …, 2023 - openreview.net
We present ChainedDiffuser, a policy architecture that unifies action keypose prediction and
trajectory diffusion generation for learning robot manipulation from demonstrations. Our …

[PDF][PDF] Prismer: A vision-language model with an ensemble of experts

S Liu, L Fan, E Johns, Z Yu, C **ao… - arxiv preprint arxiv …, 2023 - authors.library.caltech.edu
Recent vision-language models have shown impressive multi-modal generation
capabilities. However, typically they require training huge models on massive datasets. As a …

Act3d: Infinite resolution action detection transformer for robotic manipulation

T Gervet, Z **an, N Gkanatsios… - arxiv preprint arxiv …, 2023 - arxiv.org
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …

Act3d: 3d feature field transformers for multi-task robotic manipulation

T Gervet, Z **an, N Gkanatsios… - 7th Annual Conference …, 2023 - openreview.net
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …

Image quality-aware diagnosis via meta-knowledge co-embedding

H Che, S Chen, H Chen - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Medical images usually suffer from image degradation in clinical practice, leading to
decreased performance of deep learning-based models. To resolve this problem, most …

Forkmerge: Mitigating negative transfer in auxiliary-task learning

J Jiang, B Chen, J Pan, X Wang, D Liu… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by
leveraging the knowledge obtained from related tasks. Occasionally, learning multiple tasks …