- Academic Search

L Wang, X Chen, J Zhao, K He - Advances in Neural …, 2025 - proceedings.neurips.cc

One of the roadblocks for training generalist robotic models today is heterogeneity. Previous
robot learning methods often collect data to train with one specific embodiment for one task …

Zapisz Cytuj Cytowane przez 19 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Z Gao, Z Chen, E Cui, Y Ren, W Wang, J Zhu, H Tian… - Visual Intelligence, 2024 - Springer

Multi-modal large language models (MLLMs) have demonstrated impressive performance in
vision-language tasks across a wide range of domains. However, the large model scale and …

Zapisz Cytuj Cytowane przez 8 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhancing the reasoning ability of multimodal large language models via mixed preference optimization

W Wang, Z Chen, W Wang, Y Cao, Y Liu, Z Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing open-source multimodal large language models (MLLMs) generally follow a
training process involving pre-training and supervised fine-tuning. However, these models …

Zapisz Cytuj Cytowane przez 9 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Aria: An open multimodal native mixture-of-experts model

D Li, Y Liu, H Wu, Y Wang, Z Shen, B Qu, X Niu… - arxiv preprint arxiv …, 2024 - arxiv.org

Information comes in diverse modalities. Multimodal native AI models are essential to
integrate real-world information and deliver comprehensive understanding. While …

Zapisz Cytuj Cytowane przez 20 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Your mixture-of-experts llm is secretly an embedding model for free

Z Li, T Zhou - arxiv preprint arxiv:2410.10814, 2024 - arxiv.org

While large language models (LLMs) excel on generation tasks, their decoder-only
architecture often limits their potential as embedding models if no further representation …

Zapisz Cytuj Cytowane przez 3 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

W Liang, L Yu, L Luo, S Iyer, N Dong, C Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

The development of large language models (LLMs) has expanded to multi-modal systems
capable of processing text, images, and speech within a unified framework. Training these …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

NV Nguyen, TT Doan, L Tran, V Nguyen… - arxiv preprint arxiv …, 2024 - arxiv.org

Mixture of Experts (MoEs) plays an important role in the development of more efficient and
effective large language models (LLMs). Due to the enormous resource requirements …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

Y Liu, X Cao, T Chen, Y Jiang, J You, M Wu… - arxiv preprint arxiv …, 2025 - arxiv.org

Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation

W Shi, X Han, C Zhou, W Liang, XV Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

We present LlamaFusion, a framework for empowering pretrained text-only large language
models (LLMs) with multimodal generative capabilities, enabling them to understand and …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

H Diao, X Li, Y Cui, Y Wang, H Deng, T Pan… - arxiv preprint arxiv …, 2025 - arxiv.org

Existing encoder-free vision-language models (VLMs) are rapidly narrowing the
performance gap with their encoder-based counterparts, highlighting the promising potential …

Zapisz Cytuj Powiązane artykuły Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Moma: Efficient early-fusion pre-training with mixture of modality-aware experts

Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Enhancing the reasoning ability of multimodal large language models via mixed preference optimization

Aria: An open multimodal native mixture-of-experts model

Your mixture-of-experts llm is secretly an embedding model for free

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities

LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models