- Academic Search

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …

保存引用被引用数: 137 関連記事全 9 バージョン HTMLバージョン

Human activity recognition (har) using deep learning: Review, methodologies, progress and future research directions

P Kumar, S Chauhan, LK Awasthi - Archives of Computational Methods in …, 2024 - Springer

Human activity recognition is essential in many domains, including the medical and smart
home sectors. Using deep learning, we conduct a comprehensive survey of current state …

保存引用被引用数: 59 関連記事

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Retrospectives on the embodied ai workshop

M Deitke, D Batra, Y Bisk, T Campari, AX Chang… - arxiv preprint arxiv …, 2022 - arxiv.org

We present a retrospective on the state of Embodied AI research. Our analysis focuses on
13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are …

保存引用被引用数: 56 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Egocentric audio-visual object localization

C Huang, Y Tian, A Kumar… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person
view. Likewise, machines are advanced to approach human intelligence by learning with …

保存引用被引用数: 33 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised visual learning from interactions with objects

A Aubret, C Teulière, J Triesch - European Conference on Computer …, 2024 - Springer

Self-supervised learning (SSL) has revolutionized visual representation learning, but has
not achieved the robustness of human vision. A reason for this could be that SSL does not …

保存引用被引用数: 7 関連記事全 7 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Hyperbolic audio-visual zero-shot learning

J Hong, Z Hayder, J Han, P Fang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Audio-visual zero-shot learning aims to classify samples consisting of a pair of
corresponding audio and video sequences from classes that are not present during training …

保存引用被引用数: 14 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Soundingactions: Learning how actions sound from narrated egocentric videos

C Chen, K Ashutosh, R Girdhar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We propose a novel self-supervised embedding to learn how actions sound from narrated in-
the-wild egocentric videos. Whereas existing methods rely on curated data with known …

保存引用被引用数: 4 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

L Sun, Z Lian, B Liu, J Tao - Information Fusion, 2024 - Elsevier

Abstract Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in
recent years for its critical role in creating emotion-aware intelligent machines. Previous …

保存引用被引用数: 17 関連記事全 3 バージョン

Multi-task learning of object states and state-modifying actions from web videos

T Soucek, JB Alayrac, A Miech, I Laptev… - IEEE Transactions on …, 2024 - computer.org

We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …

保存引用被引用数: 6 関連記事全 4 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Interaction region visual transformer for egocentric action anticipation

D Roy, R Rajendiran… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Human-object interaction (HOI) and temporal dynamics along the motion paths are the most
important visual cues for egocentric action anticipation. Especially, interaction regions …

保存引用被引用数: 13 関連記事全 3 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Learning state-aware visual representations from audible interactions

Affordances from human videos as a versatile representation for robotics

Human activity recognition (har) using deep learning: Review, methodologies, progress and future research directions

Retrospectives on the embodied ai workshop

Egocentric audio-visual object localization

Self-supervised visual learning from interactions with objects

Hyperbolic audio-visual zero-shot learning

Soundingactions: Learning how actions sound from narrated egocentric videos

HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

Multi-task learning of object states and state-modifying actions from web videos

Interaction region visual transformer for egocentric action anticipation