- Academic Search

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023‏ - journals.sagepub.com‏

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …‏

שמור צטט צוטט על ידי 140 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Filling the image information gap for vqa: Prompting large language models to proactively ask questions‏

Z Wang, C Chen, P Li, Y Liu - arxiv preprint arxiv:2311.11598, 2023‏ - arxiv.org‏

Large Language Models (LLMs) demonstrate impressive reasoning ability and the
maintenance of world knowledge not only in natural language tasks, but also in some vision …‏

שמור צטט צוטט על ידי 16 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Lang2ltl-2: Grounding spatiotemporal navigation commands using large language and vision-language models‏

JX Liu, A Shah, G Konidaris, S Tellex… - 2024 IEEE/RSJ …, 2024‏ - ieeexplore.ieee.org‏

Grounding spatiotemporal navigation commands to structured task specifications enables
autonomous robots to understand a broad range of natural language and solve long-horizon …‏

שמור צטט צוטט על ידי 3 מאמרים בנושא זה כל 8 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Knowledge acquisition disentanglement for knowledge-based visual question answering with large language models‏

W An, F Tian, J Nie, W Shi, H Lin, Y Chen… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Knowledge-based Visual Question Answering (KVQA) requires both image and world
knowledge to answer questions. Current methods first retrieve knowledge from the image …‏

שמור צטט צוטט על ידי 3 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

Enhancing few-shot KB-VQA with panoramic image captions guided by Large Language Models‏

P Qiang, H Tan, X Li, D Wang, R Li, X Sun, H Zhang… - Neurocomputing, 2025‏ - Elsevier‏

Current state-of-the-art (SOTA) KB-VQA techniques involve transforming images into image
captions as prompts to harness the potent reasoning capabilities of large language models …‏

שמור צטט מאמרים בנושא זה

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification‏

R Qu, M Yatskar - arxiv preprint arxiv:2405.18672, 2024‏ - arxiv.org‏

(Renyi Qu's Master's Thesis) Recent advancements in interpretable models for vision-
language tasks have achieved competitive performance; however, their interpretability often …‏

שמור צטט מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Learning to Ask Denotative and Connotative Questions for Knowledge-based VQA‏

X **ng, P **ong, L Fan, Y Li, Y Wu - Findings of the Association for …, 2024‏ - aclanthology.org‏

Large language models (LLMs) have attracted increasing attention due to its prominent
performance on various tasks. Recent works seek to leverage LLMs on knowledge-based …‏

שמור צטט מאמרים בנושא זה פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zero-Shot End-To-End Spoken Question Answering In Medical Domain‏

Y Labrak, A Moumen, R Dufour, M Rouvier - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

In the rapidly evolving landscape of spoken question-answering (SQA), the integration of
large language models (LLMs) has emerged as a transformative development …‏

שמור צטט מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators‏

A Roush, E Zakirov, A Shirokov, P Lunina… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Recent advancements in text-to-image generation have revolutionized numerous fields,
including art and cinema, by automating the generation of high-quality, context-aware …‏

שמור צטט צוטט על ידי 1 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Zero-shot visual question answering with language model feedback

Foundation models in robotics: Applications, challenges, and the future‏

Filling the image information gap for vqa: Prompting large language models to proactively ask questions‏

Lang2ltl-2: Grounding spatiotemporal navigation commands using large language and vision-language models‏

Knowledge acquisition disentanglement for knowledge-based visual question answering with large language models‏

Enhancing few-shot KB-VQA with panoramic image captions guided by Large Language Models‏

LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification‏

Learning to Ask Denotative and Connotative Questions for Knowledge-based VQA‏

Zero-Shot End-To-End Spoken Question Answering In Medical Domain‏

LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators‏