Μελετητής Google

Άρθρα

Μελετητής

3 αποτελέσματα (0,02 δευτ.)

Learning by Aligning 2D Skeleton Sequences and Multi-modality Fusion

QH Tran, M Ahmed, M Popattia, MH Ahmed… - … on Computer Vision, 2024 - Springer

This paper presents a self-supervised temporal video alignment framework which is useful
for several fine-grained human activity understanding applications. In contrast with the state …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 1 Σχετικά άρθρα Όλες οι 2 εκδοχές

[Free GPT-4]

[PDF] arxiv.org

Video LLMs for Temporal Reasoning in Long Videos

FJ Fateh, U Ahmed, H Khan, MZ Zia… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces TemporalVLM, a video large language model capable of effective
temporal reasoning and fine-grained understanding in long videos. At the core, our …

Αποθήκευση Παράθεση Σχετικά άρθρα Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

Understanding via Gaze: Gaze-based Task Decomposition for Imitation Learning of Robot Manipulation

R Takizawa, Y Ohmura, Y Kuniyoshi - arxiv preprint arxiv:2501.15071, 2025 - arxiv.org

In imitation learning for robotic manipulation, decomposing object manipulation tasks into
multiple semantic actions is essential. This decomposition enables the reuse of learned …

Αποθήκευση Παράθεση Σχετικά άρθρα Προβολή ως HTML

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Learning by Aligning 2D Skeleton Sequences and Multi-modality Fusion

Video LLMs for Temporal Reasoning in Long Videos

Understanding via Gaze: Gaze-based Task Decomposition for Imitation Learning of Robot Manipulation