- Academic Search

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022‏ - Elsevier‏

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …‏

שמור צטט צוטט על ידי 108 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On transforming reinforcement learning with transformers: The development trajectory‏

S Hu, L Shen, Y Zhang, Y Chen… - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org‏

Transformers, originally devised for natural language processing (NLP), have also produced
significant successes in computer vision (CV). Due to their strong expression power …‏

שמור צטט צוטט על ידי 36 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Llm-planner: Few-shot grounded planning for embodied agents with large language models‏

CH Song, J Wu, C Washington… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

This study focuses on using large language models (LLMs) as a planner for embodied
agents that can follow natural language instructions to complete complex tasks in a visually …‏

שמור צטט צוטט על ידי 504 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How much can clip benefit vision-and-language tasks?‏

S Shen, LH Li, H Tan, M Bansal, A Rohrbach… - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using
a relatively small set of manually-annotated data (as compared to web-crawled data), to …‏

שמור צטט צוטט על ידי 461 מאמרים בנושא זה כל 3 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

History aware multimodal transformer for vision-and-language navigation‏

S Chen, PL Guhur, C Schmid… - Advances in neural …, 2021‏ - proceedings.neurips.cc‏

Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow
instructions and navigate in real scenes. To remember previously visited locations and …‏

שמור צטט צוטט על ידי 241 מאמרים בנושא זה כל 10 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Think global, act local: Dual-scale graph transformer for vision-and-language navigation‏

S Chen, PL Guhur, M Tapaswi… - Proceedings of the …, 2022‏ - openaccess.thecvf.com‏

Following language instructions to navigate in unseen environments is a challenging
problem for autonomous embodied agents. The agent not only needs to ground languages …‏

שמור צטט צוטט על ידי 159 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Bird's-eye-view scene graph for vision-language navigation‏

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com‏

Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …‏

שמור צטט צוטט על ידי 50 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation‏

J Li, M Bansal - Advances in Neural Information Processing …, 2023‏ - proceedings.neurips.cc‏

Abstract Vision-and-Language Navigation requires the agent to follow language instructions
to navigate through 3D environments. One main challenge in Vision-and-Language …‏

שמור צטט צוטט על ידי 49 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vln bert: A recurrent vision-and-language bert for navigation‏

Y Hong, Q Wu, Y Qi… - Proceedings of the …, 2021‏ - openaccess.thecvf.com‏

Accuracy of many visiolinguistic tasks has benefited significantly from the application of
vision-and-language (V&L) BERT. However, its application for the task of vision-and …‏

שמור צטט צוטט על ידי 299 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding‏

A Ku, P Anderson, R Patel, E Ie, J Baldridge - arxiv preprint arxiv …, 2020‏ - arxiv.org‏

We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …‏

שמור צטט צוטט על ידי 312 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Robust navigation with language pretraining and stochastic sampling

Multimodal research in vision and language: A review of current and emerging trends‏

On transforming reinforcement learning with transformers: The development trajectory‏

Llm-planner: Few-shot grounded planning for embodied agents with large language models‏

How much can clip benefit vision-and-language tasks?‏

History aware multimodal transformer for vision-and-language navigation‏

Think global, act local: Dual-scale graph transformer for vision-and-language navigation‏

Bird's-eye-view scene graph for vision-language navigation‏

Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation‏

Vln bert: A recurrent vision-and-language bert for navigation‏

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding‏