Google Académico

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Guardar Citar Citado por 135 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] springer.com

When large language models meet personalization: Perspectives of challenges and opportunities

J Chen, Z Liu, X Huang, C Wu, Q Liu, G Jiang, Y Pu… - World Wide Web, 2024 - Springer

The advent of large language models marks a revolutionary breakthrough in artificial
intelligence. With the unprecedented scale of training and model parameters, the capability …

Guardar Citar Citado por 134 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

Palm-e: An embodied multimodal language model

D Driess, F **a, MSM Sajjadi, C Lynch… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models excel at a wide range of complex tasks. However, enabling general
inference in the real world, eg, for robotics problems, raises the challenge of grounding. We …

Guardar Citar Citado por 1623 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Language is not all you need: Aligning perception with language models

S Huang, L Dong, W Wang, Y Hao… - Advances in …, 2023 - proceedings.neurips.cc

A big convergence of language, multimodal perception, action, and world modeling is a key
step toward artificial general intelligence. In this work, we introduce KOSMOS-1, a …

Guardar Citar Citado por 478 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Rt-2: Vision-language-action models transfer web knowledge to robotic control

A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2023 - arxiv.org

We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

Guardar Citar Citado por 777 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Augmented language models: a survey

G Mialon, R Dessì, M Lomeli, C Nalmpantis… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …

Guardar Citar Citado por 506 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Kosmos-2: Grounding multimodal large language models to the world

Z Peng, W Wang, L Dong, Y Hao, S Huang… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new
capabilities of perceiving object descriptions (eg, bounding boxes) and grounding text to the …

Guardar Citar Citado por 606 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Openagi: When llm meets domain experts

Y Ge, W Hua, K Mei, J Tan, S Xu… - Advances in Neural …, 2023 - proceedings.neurips.cc

Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This
capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI …

Guardar Citar Citado por 231 Artículos relacionados Las 9 versiones Versión en HTML

[Free GPT-4]

[HTML] mlr.press

[HTML][HTML] Rt-2: Vision-language-action models transfer web knowledge to robotic control

B Zitkovich, T Yu, S Xu, P Xu, T **ao… - … on Robot Learning, 2023 - proceedings.mlr.press

We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

Guardar Citar Citado por 195 Artículos relacionados Las 2 versiones En caché

[Free GPT-4]

[PDF] arxiv.org

Multimodal chain-of-thought reasoning in language models

Z Zhang, A Zhang, M Li, H Zhao, G Karypis… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have shown impressive performance on complex reasoning
by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains …

Guardar Citar Citado por 425 Artículos relacionados Las 5 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Language models are general-purpose interfaces

Foundation Models Defining a New Era in Vision: a Survey and Outlook

When large language models meet personalization: Perspectives of challenges and opportunities

Palm-e: An embodied multimodal language model

Language is not all you need: Aligning perception with language models

Rt-2: Vision-language-action models transfer web knowledge to robotic control

Augmented language models: a survey

Kosmos-2: Grounding multimodal large language models to the world

Openagi: When llm meets domain experts

[HTML][HTML] Rt-2: Vision-language-action models transfer web knowledge to robotic control

Multimodal chain-of-thought reasoning in language models