- Academic Search

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

Gem Citer Citeret af 322 Relaterede artikler Alle 8 versioner

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Gem Citer Citeret af 992 Relaterede artikler Alle 13 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Threedworld: A platform for interactive multi-modal physical simulation

C Gan, J Schwartz, S Alter, D Mrowca… - arxiv preprint arxiv …, 2020 - arxiv.org

We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation.
TDW enables simulation of high-fidelity sensory data and physical interactions between …

Gem Citer Citeret af 321 Relaterede artikler Alle 9 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Look, listen, and act: Towards audio-visual embodied navigation

C Gan, Y Zhang, J Wu, B Gong… - … on Robotics and …, 2020 - ieeexplore.ieee.org

A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory
inputs in an environment and to make a sequence of actions to reach their goals. In this …

Gem Citer Citeret af 165 Relaterede artikler Alle 8 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sep-stereo: Visually guided stereophonic audio generation by associating source separation

H Zhou, X Xu, D Lin, X Wang, Z Liu - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

Stereophonic audio is an indispensable ingredient to enhance human auditory experience.
Recent research has explored the usage of visual information as guidance to generate …

Gem Citer Citeret af 95 Relaterede artikler Alle 5 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VisualEchoes: Spatial Image Representation Learning Through Echolocation

R Gao, C Chen, Z Al-Halah, C Schissler… - Computer Vision–ECCV …, 2020 - Springer

Several animal species (eg, bats, dolphins, and whales) and even visually impaired humans
have the remarkable ability to perform echolocation: a biological sonar used to perceive …

Gem Citer Citeret af 105 Relaterede artikler Alle 11 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-language navigation: a survey and taxonomy

W Wu, T Chang, X Li, Q Yin, Y Hu - Neural Computing and Applications, 2024 - Springer

Vision-language navigation (VLN) tasks require an agent to follow language instructions
from a human guide to navigate in previously unseen environments using visual …

Gem Citer Citeret af 23 Relaterede artikler Alle 4 versioner

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

See, hear, explore: Curiosity via audio-visual association

V Dean, S Tulsiani, A Gupta - Advances in neural …, 2020 - proceedings.neurips.cc

Exploration is one of the core challenges in reinforcement learning. A common formulation
of curiosity-driven exploration uses the difference between the real future and the future …

Gem Citer Citeret af 60 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-visual floorplan reconstruction

S Purushwalkam, SVA Gari, VK Ithapu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Given only a few glimpses of an environment, how much can we infer about its entire
floorplan? Existing methods can map only what is visible or immediately apparent from …

Gem Citer Citeret af 56 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Language-guided audio-visual source separation via trimodal consistency

R Tan, A Ray, A Burns, BA Plummer… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose a self-supervised approach for learning to perform audio source separation in
videos based on natural language queries, using only unlabeled video and audio pairs as …

Gem Citer Citeret af 16 Relaterede artikler Alle 10 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Audio-visual embodied navigation

A survey of embodied ai: From simulators to research tasks

Ego4d: Around the world in 3,000 hours of egocentric video

Threedworld: A platform for interactive multi-modal physical simulation

Look, listen, and act: Towards audio-visual embodied navigation

Sep-stereo: Visually guided stereophonic audio generation by associating source separation

VisualEchoes: Spatial Image Representation Learning Through Echolocation

Vision-language navigation: a survey and taxonomy

See, hear, explore: Curiosity via audio-visual association

Audio-visual floorplan reconstruction

Language-guided audio-visual source separation via trimodal consistency