- Academic Search

C Chen, C Schissler, S Garg… - Advances in …, 2022 - proceedings.neurips.cc

Abstract We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio
rendering for 3D environments. Given a 3D mesh of a real-world environment …

Gem Citer Citeret af 87 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]

[PDF] thecvf.com

Semantic audio-visual navigation

C Chen, Z Al-Halah, K Grauman - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Recent work on audio-visual navigation assumes a constantly-sounding target and restricts
the role of audio to signaling the target's position. We introduce semantic audio-visual …

Gem Citer Citeret af 116 Relaterede artikler Alle 14 versioner Vis som HTML

[Free GPT-4]

[PDF] thecvf.com

Toward practical monocular indoor depth estimation

CY Wu, J Wang, M Hall… - Proceedings of the …, 2022 - openaccess.thecvf.com

The majority of prior monocular depth estimation methods without groundtruth depth
guidance focus on driving scenarios. We show that such methods generalize poorly to …

Gem Citer Citeret af 68 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]

[PDF] neurips.cc

Few-shot audio-visual learning of environment acoustics

S Majumder, C Chen, Z Al-Halah… - Advances in Neural …, 2022 - proceedings.neurips.cc

Room impulse response (RIR) functions capture how the surrounding physical environment
transforms the sounds heard by a listener, with implications for various applications in AR …

Gem Citer Citeret af 47 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]

[PDF] thecvf.com

Pathdreamer: A world model for indoor navigation

JY Koh, H Lee, Y Yang, J Baldridge… - Proceedings of the …, 2021 - openaccess.thecvf.com

People navigating in unfamiliar buildings take advantage of myriad visual, spatial and
semantic cues to efficiently achieve their navigation goals. Towards equip** …

Gem Citer Citeret af 77 Relaterede artikler Alle 10 versioner Vis som HTML

[Free GPT-4]

[PDF] thecvf.com

Move2hear: Active audio-visual source separation

S Majumder, Z Al-Halah… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

We introduce the active audio-visual source separation problem, where an agent must move
intelligently in order to better isolate the sounds coming from an object of interest in its …

Gem Citer Citeret af 52 Relaterede artikler Alle 10 versioner Bibliotekssøgning Vis som HTML

[Free GPT-4]

[PDF] arxiv.org

Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

P Chen, X Sun, H Zhi, R Zeng, TH Li, G Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet
challenging problem in which an agent learns to navigate following a path described by …

Gem Citer Citeret af 22 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]

[PDF] thecvf.com

Listening human behavior: 3d human pose estimation with acoustic signals

Y Shibata, Y Kawashima, M Isogawa… - Proceedings of the …, 2023 - openaccess.thecvf.com

Given only acoustic signals without any high-level information, such as voices or sounds of
scenes/actions, how much can we infer about the behavior of humans? Unlike existing …

Gem Citer Citeret af 16 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]

[PDF] sciencedirect.com

Context understanding in computer vision: A survey

X Wang, Z Zhu - Computer Vision and Image Understanding, 2023 - Elsevier

Contextual information plays an important role in many computer vision tasks, such as object
detection, video action detection, image classification, etc. Recognizing a single object or …

Gem Citer Citeret af 42 Relaterede artikler Alle 6 versioner

[Free GPT-4]

[PDF] neurips.cc

Disentangled counterfactual learning for physical audiovisual commonsense reasoning

C Lv, S Zhang, Y Tian, M Qi… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this paper, we propose a Disentangled Counterfactual Learning (DCL) approach for
physical audiovisual commonsense reasoning. The task aims to infer objects' physics …

Gem Citer Citeret af 15 Relaterede artikler Alle 5 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Audio-visual floorplan reconstruction

Soundspaces 2.0: A simulation platform for visual-acoustic learning

Semantic audio-visual navigation

Toward practical monocular indoor depth estimation

Few-shot audio-visual learning of environment acoustics

Pathdreamer: A world model for indoor navigation

Move2hear: Active audio-visual source separation

Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

Listening human behavior: 3d human pose estimation with acoustic signals

Context understanding in computer vision: A survey

Disentangled counterfactual learning for physical audiovisual commonsense reasoning