Μελετητής Google

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 92 Σχετικά άρθρα Όλες οι 3 εκδοχές

[Free GPT-4]

[PDF] arxiv.org

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 135 Σχετικά άρθρα Όλες οι 2 εκδοχές

[Free GPT-4]

[PDF] thecvf.com

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 8315 Σχετικά άρθρα Όλες οι 12 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 838 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] thecvf.com

Open-vocabulary panoptic segmentation with text-to-image diffusion models

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 428 Σχετικά άρθρα Όλες οι 6 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] neurips.cc

Egoschema: A diagnostic benchmark for very long-form video language understanding

K Mangalam, R Akshulakov… - Advances in Neural …, 2023 - proceedings.neurips.cc

We introduce EgoSchema, a very long-form video question-answering dataset, and
benchmark to evaluate long video understanding capabilities of modern vision and …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 164 Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 463 Σχετικά άρθρα Όλες οι 6 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] neurips.cc

Embodiedgpt: Vision-language pre-training via embodied chain of thought

Y Mu, Q Zhang, M Hu, W Wang… - Advances in …, 2024 - proceedings.neurips.cc

Embodied AI is a crucial frontier in robotics, capable of planning and executing action
sequences for robots to accomplish long-horizon tasks in physical environments. In this …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 194 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] mlr.press

Scaling up and distilling down: Language-guided robot skill acquisition

H Ha, P Florence, S Song - Conference on Robot Learning, 2023 - proceedings.mlr.press

We present a framework for robot skill acquisition, which 1) efficiently scale up data
generation of language-labelled robot data and 2) effectively distills this data down into a …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 124 Σχετικά άρθρα Όλες οι 7 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 146 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

A survey on video diffusion models

Foundation Models Defining a New Era in Vision: a Survey and Outlook

Segment anything

Imagebind: One embedding space to bind them all

Open-vocabulary panoptic segmentation with text-to-image diffusion models

Egoschema: A diagnostic benchmark for very long-form video language understanding

Voxposer: Composable 3d value maps for robotic manipulation with language models

Embodiedgpt: Vision-language pre-training via embodied chain of thought

Scaling up and distilling down: Language-guided robot skill acquisition

Sequential modeling enables scalable learning for large vision models