Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Qwen2. 5 technical report

A Yang, B Yang, B Zhang, B Hui, B Zheng, B Yu… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we introduce Qwen2. 5, a comprehensive series of large language models
(LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has …

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Llamafactory: Unified efficient fine-tuning of 100+ language models

Y Zheng, R Zhang, J Zhang, Y Ye, Z Luo… - arxiv preprint arxiv …, 2024 - arxiv.org
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …

Pissa: Principal singular values and singular vectors adaptation of large language models

F Meng, Z Wang, M Zhang - Advances in Neural …, 2025 - proceedings.neurips.cc
To parameter-efficiently fine-tune (PEFT) large language models (LLMs), the low-rank
adaptation (LoRA) method approximates the model changes $\Delta W\in\mathbb …

Minicache: Kv cache compression in depth dimension for large language models

A Liu, J Liu, Z Pan, Y He, R Haffari… - Advances in Neural …, 2025 - proceedings.neurips.cc
A critical approach for efficiently deploying computationally demanding large language
models (LLMs) is Key-Value (KV) caching. The KV cache stores key-value states of …

Deepseek-v3 technical report

A Liu, B Feng, B Xue, B Wang, B Wu, C Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B
total parameters with 37B activated for each token. To achieve efficient inference and cost …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Llama-moe: Building mixture-of-experts from llama with continual pre-training

T Zhu, X Qu, D Dong, J Ruan, J Tong… - Proceedings of the …, 2024 - aclanthology.org
Abstract Mixture-of-Experts (MoE) has gained increasing popularity as a promising
framework for scaling up large language models (LLMs). However, training MoE from …

Openmoe: An early effort on open mixture-of-experts language models

F Xue, Z Zheng, Y Fu, J Ni, Z Zheng, W Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
To help the open-source community have a better understanding of Mixture-of-Experts
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …