Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Piqa: Reasoning about physical commonsense in natural language

Y Bisk, R Zellers, J Gao, Y Choi - … of the AAAI conference on artificial …, 2020 - ojs.aaai.org
To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions
requiring this kind of physical commonsense pose a challenge to today's natural language …

Experience grounds language

Y Bisk, A Holtzman, J Thomason, J Andreas… - arxiv preprint arxiv …, 2020 - arxiv.org
Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …

Robots that use language

S Tellex, N Gopalan, H Kress-Gazit… - Annual Review of …, 2020 - annualreviews.org
This article surveys the use of natural language in robotics from a robotics point of view. To
use human language, robots must map words to aspects of the physical world, mediated by …

A review of robot learning for manipulation: Challenges, representations, and algorithms

O Kroemer, S Niekum, G Konidaris - Journal of machine learning research, 2021 - jmlr.org
A key challenge in intelligent robotics is creating robots that are capable of directly
interacting with the world around them to achieve their goals. The last decade has seen …

Goal driven discovery of distributional differences via language descriptions

R Zhong, P Zhang, S Li, J Ahn… - Advances in Neural …, 2023 - proceedings.neurips.cc
Exploring large corpora can generate useful discoveries but is time-consuming for humans.
We formulate a new task, D5, that automatically discovers differences between two large …

Statler: State-maintaining language models for embodied reasoning

T Yoneda, J Fang, P Li, H Zhang… - … on Robotics and …, 2024 - ieeexplore.ieee.org
There has been a significant research interest in employing large language models to
empower intelligent robots with complex reasoning. Existing work focuses on harnessing …

Embodied bert: A transformer model for embodied, language-guided visual task completion

A Suglia, Q Gao, J Thomason, G Thattai… - arxiv preprint arxiv …, 2021 - arxiv.org
Language-guided robots performing home and office tasks must navigate in and interact
with the world. Grounding language instructions against visual observations and actions to …

Language grounding with 3d objects

J Thomason, M Shridhar, Y Bisk… - … on Robot Learning, 2022 - proceedings.mlr.press
Seemingly simple natural language requests to a robot are generally underspecified, for
example" Can you bring me the wireless mouse?" Flat images of candidate mice may not …