Μελετητής Google

OF Kar, A Tonioni, P Poklukar, A Kulshrestha… - … on Computer Vision, 2024 - Springer

Vision-language models (VLMs) are typically composed of a vision encoder, eg CLIP, and a
language model (LM) that interprets the encoded features to solve downstream tasks …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 27 Σχετικά άρθρα Όλες οι 2 εκδοχές

[Free GPT-4]

[PDF] arxiv.org

Slowfast-llava: A strong training-free baseline for video large language models

M Xu, M Gao, Z Gan, HY Chen, Z Lai, H Gang… - arxiv preprint arxiv …, 2024 - arxiv.org

We propose SlowFast-LLaVA (or SF-LLaVA for short), a training-free video large language
model (LLM) that can jointly capture detailed spatial semantics and long-range temporal …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 30 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

Omnigen: Unified image generation

S **ao, Y Wang, J Zhou, H Yuan, X **ng, R Yan… - arxiv preprint arxiv …, 2024 - arxiv.org

In this work, we introduce OmniGen, a new diffusion model for unified image generation.
Unlike popular diffusion models (eg, Stable Diffusion), OmniGen no longer requires …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 25 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] techrxiv.org

Multi-modal LLMs in agriculture: A comprehensive review

R Sapkota, R Qureshi, SZ Hassan, J Shutske… - Authorea …, 2024 - techrxiv.org

Given the rapid emergence and applications of Large Language Models (LLMs) across
various scientific fields, insights regarding their applicability in agriculture are still only …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 5 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

PixWizard: Versatile image-to-image visual assistant with open-language instructions

W Lin, X Wei, R Zhang, L Zhuo, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper presents a versatile image-to-image visual assistant, PixWizard, designed for
image generation, manipulation, and translation based on free-from language instructions …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 3 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]

[PDF] rsc.org

Towards a science exocortex

KG Yager - Digital Discovery, 2024 - pubs.rsc.org

Artificial intelligence (AI) methods are poised to revolutionize intellectual work, with
generative AI enabling automation of text analysis, text generation, and simple decision …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 2 Σχετικά άρθρα Όλες οι 4 εκδοχές

[Free GPT-4]

[PDF] arxiv.org

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

E Karypidis, I Kakogeorgiou, S Gidaris… - arxiv preprint arxiv …, 2025 - arxiv.org

Semantic future prediction is important for autonomous systems navigating dynamic
environments. This paper introduces FUTURIST, a method for multimodal future semantic …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Enhancing foundation models for scientific discovery via multimodal knowledge graph representations

V Lopez, L Hoang, M Martinez-Galindo… - Journal of Web …, 2025 - Elsevier

Abstract Foundation Models (FMs) hold transformative potential to accelerate scientific
discovery, yet reaching their full capacity in complex, highly multimodal domains such as …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές

[Free GPT-4]

[PDF] arxiv.org

BiFold: Bimanual Cloth Folding with Language Guidance

O Barbany, A Colomé, C Torras - arxiv preprint arxiv:2501.16458, 2025 - arxiv.org

Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their
complicated dynamics, and the disparate materials, geometries, and textures that garments …

Αποθήκευση Παράθεση Σχετικά άρθρα Προβολή ως HTML

[Free GPT-4]

[PDF] arxiv.org

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

J Zhang, O Liu, T Yu, J Hu, W Neiswanger - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal large language models (MLLMs) have made rapid progress in recent years, yet
continue to struggle with low-level visual perception (LLVP)--particularly the ability to …

Αποθήκευση Παράθεση Σχετικά άρθρα Προβολή ως HTML

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

BRAVE: Broadening the visual encoding of vision-language models

Slowfast-llava: A strong training-free baseline for video large language models

Omnigen: Unified image generation

Multi-modal LLMs in agriculture: A comprehensive review

PixWizard: Versatile image-to-image visual assistant with open-language instructions

Towards a science exocortex

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

[HTML][HTML] Enhancing foundation models for scientific discovery via multimodal knowledge graph representations

BiFold: Bimanual Cloth Folding with Language Guidance

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions