Google Академія

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Зберегти Послатися Цитовано в 144 джерелах Пов’язані статті Кількість версій: 4

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Large language models for human–robot interaction: A review

C Zhang, J Chen, J Li, Y Peng, Z Mao - Biomimetic Intelligence and …, 2023 - Elsevier

The fusion of large language models and robotic systems has introduced a transformative
paradigm in human–robot interaction, offering unparalleled capabilities in natural language …

Зберегти Послатися Цитовано в 146 джерелах Пов’язані статті Кількість версій: 3

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

Зберегти Послатися Цитовано в 856 джерелах Пов’язані статті Кількість версій: 10 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Motiongpt: Human motion as a foreign language

B Jiang, X Chen, W Liu, J Yu, G Yu… - Advances in Neural …, 2023 - proceedings.neurips.cc

Though the advancement of pre-trained large language models unfolds, the exploration of
building a unified model for language and other multimodal data, such as motion, remains …

Зберегти Послатися Цитовано в 267 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation

Y Wu, K Chen, T Zhang, Y Hui… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Contrastive learning has shown remarkable success in the field of multimodal
representation learning. In this paper, we propose a pipeline of contrastive language-audio …

Зберегти Послатися Цитовано в 537 джерелах Пов’язані статті Кількість версій: 9

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Beats: Audio pre-training with acoustic tokenizers

S Chen, Y Wu, C Wang, S Liu, D Tompkins… - arxiv preprint arxiv …, 2022 - arxiv.org

The massive growth of self-supervised learning (SSL) has been witnessed in language,
vision, speech, and audio domains over the past few years. While discrete label prediction is …

Зберегти Послатися Цитовано в 289 джерелах Пов’язані статті Кількість версій: 9 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Зберегти Послатися Цитовано в 135 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Onellm: One framework to align all modalities with language

J Han, K Gong, Y Zhang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) have gained significant attention due to their
strong multimodal understanding capability. However existing works rely heavily on modality …

Зберегти Послатися Цитовано в 102 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pandagpt: One model to instruction-follow them all

Y Su, T Lan, H Li, J Xu, Y Wang, D Cai - arxiv preprint arxiv:2305.16355, 2023 - arxiv.org

We present PandaGPT, an approach to emPower large lANguage moDels with visual and
Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can …

Зберегти Послатися Цитовано в 275 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Pengi: An audio language model for audio tasks

S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc

In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …

Зберегти Послатися Цитовано в 147 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Audioclip: Extending clip to image, text and audio

Foundation Models Defining a New Era in Vision: a Survey and Outlook

[HTML][HTML] Large language models for human–robot interaction: A review

Imagebind: One embedding space to bind them all

Motiongpt: Human motion as a foreign language

Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation

Beats: Audio pre-training with acoustic tokenizers

Foundation models in robotics: Applications, challenges, and the future

Onellm: One framework to align all modalities with language

Pandagpt: One model to instruction-follow them all

Pengi: An audio language model for audio tasks