Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers
L Wang, X Chen, J Zhao, K He - Advances in Neural …, 2025 - proceedings.neurips.cc
One of the roadblocks for training generalist robotic models today is heterogeneity. Previous
robot learning methods often collect data to train with one specific embodiment for one task …
robot learning methods often collect data to train with one specific embodiment for one task …
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance
Multi-modal large language models (MLLMs) have demonstrated impressive performance in
vision-language tasks across a wide range of domains. However, the large model scale and …
vision-language tasks across a wide range of domains. However, the large model scale and …
Enhancing the reasoning ability of multimodal large language models via mixed preference optimization
Existing open-source multimodal large language models (MLLMs) generally follow a
training process involving pre-training and supervised fine-tuning. However, these models …
training process involving pre-training and supervised fine-tuning. However, these models …
Aria: An open multimodal native mixture-of-experts model
Information comes in diverse modalities. Multimodal native AI models are essential to
integrate real-world information and deliver comprehensive understanding. While …
integrate real-world information and deliver comprehensive understanding. While …
Your mixture-of-experts llm is secretly an embedding model for free
While large language models (LLMs) excel on generation tasks, their decoder-only
architecture often limits their potential as embedding models if no further representation …
architecture often limits their potential as embedding models if no further representation …
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
The development of large language models (LLMs) has expanded to multi-modal systems
capable of processing text, images, and speech within a unified framework. Training these …
capable of processing text, images, and speech within a unified framework. Training these …
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Mixture of Experts (MoEs) plays an important role in the development of more efficient and
effective large language models (LLMs). Due to the enormous resource requirements …
effective large language models (LLMs). Due to the enormous resource requirements …
A Survey of Embodied AI in Healthcare: Techniques, Applications, and Opportunities
Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and
personalization. Powered by modern AI technologies such as multimodal large language …
personalization. Powered by modern AI technologies such as multimodal large language …
LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation
We present LlamaFusion, a framework for empowering pretrained text-only large language
models (LLMs) with multimodal generative capabilities, enabling them to understand and …
models (LLMs) with multimodal generative capabilities, enabling them to understand and …
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Existing encoder-free vision-language models (VLMs) are rapidly narrowing the
performance gap with their encoder-based counterparts, highlighting the promising potential …
performance gap with their encoder-based counterparts, highlighting the promising potential …