Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Piqa: Reasoning about physical commonsense in natural language
To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions
requiring this kind of physical commonsense pose a challenge to today's natural language …
requiring this kind of physical commonsense pose a challenge to today's natural language …
Experience grounds language
Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …
physical world it describes and to the social interactions it facilitates. Despite the incredible …
Robots that use language
This article surveys the use of natural language in robotics from a robotics point of view. To
use human language, robots must map words to aspects of the physical world, mediated by …
use human language, robots must map words to aspects of the physical world, mediated by …
A review of robot learning for manipulation: Challenges, representations, and algorithms
A key challenge in intelligent robotics is creating robots that are capable of directly
interacting with the world around them to achieve their goals. The last decade has seen …
interacting with the world around them to achieve their goals. The last decade has seen …
Goal driven discovery of distributional differences via language descriptions
Exploring large corpora can generate useful discoveries but is time-consuming for humans.
We formulate a new task, D5, that automatically discovers differences between two large …
We formulate a new task, D5, that automatically discovers differences between two large …
Statler: State-maintaining language models for embodied reasoning
There has been a significant research interest in employing large language models to
empower intelligent robots with complex reasoning. Existing work focuses on harnessing …
empower intelligent robots with complex reasoning. Existing work focuses on harnessing …
Embodied bert: A transformer model for embodied, language-guided visual task completion
Language-guided robots performing home and office tasks must navigate in and interact
with the world. Grounding language instructions against visual observations and actions to …
with the world. Grounding language instructions against visual observations and actions to …
Language grounding with 3d objects
Seemingly simple natural language requests to a robot are generally underspecified, for
example" Can you bring me the wireless mouse?" Flat images of candidate mice may not …
example" Can you bring me the wireless mouse?" Flat images of candidate mice may not …