„Google“ mokslinčius

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Išsaugoti Cituoti Cituoja 49 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qwen2. 5 technical report

A Yang, B Yang, B Zhang, B Hui, B Zheng, B Yu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this report, we introduce Qwen2. 5, a comprehensive series of large language models
(LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has …

Išsaugoti Cituoti Cituoja 1031 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Išsaugoti Cituoti Cituoja 203 Susiję straipsniai Visos 7 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llamafactory: Unified efficient fine-tuning of 100+ language models

Y Zheng, R Zhang, J Zhang, Y Ye, Z Luo… - arxiv preprint arxiv …, 2024 - arxiv.org

Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …

Išsaugoti Cituoti Cituoja 322 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Pissa: Principal singular values and singular vectors adaptation of large language models

F Meng, Z Wang, M Zhang - Advances in Neural …, 2025 - proceedings.neurips.cc

To parameter-efficiently fine-tune (PEFT) large language models (LLMs), the low-rank
adaptation (LoRA) method approximates the model changes $\Delta W\in\mathbb …

Išsaugoti Cituoti Cituoja 84 Susiję straipsniai Visos 5 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Minicache: Kv cache compression in depth dimension for large language models

A Liu, J Liu, Z Pan, Y He, R Haffari… - Advances in Neural …, 2025 - proceedings.neurips.cc

A critical approach for efficiently deploying computationally demanding large language
models (LLMs) is Key-Value (KV) caching. The KV cache stores key-value states of …

Išsaugoti Cituoti Cituoja 33 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deepseek-v3 technical report

A Liu, B Feng, B Xue, B Wang, B Wu, C Lu… - arxiv preprint arxiv …, 2024 - arxiv.org

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B
total parameters with 37B activated for each token. To achieve efficient inference and cost …

Išsaugoti Cituoti Cituoja 151 Susiję straipsniai Visos 2 versijos HTML kopija

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Išsaugoti Cituoti Cituoja 74 Susiję straipsniai Visos 4 versijos „Google“ kopija

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Llama-moe: Building mixture-of-experts from llama with continual pre-training

T Zhu, X Qu, D Dong, J Ruan, J Tong… - Proceedings of the …, 2024 - aclanthology.org

Abstract Mixture-of-Experts (MoE) has gained increasing popularity as a promising
framework for scaling up large language models (LLMs). However, training MoE from …

Išsaugoti Cituoti Cituoja 35 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Openmoe: An early effort on open mixture-of-experts language models

F Xue, Z Zheng, Y Fu, J Ni, Z Zheng, W Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

To help the open-source community have a better understanding of Mixture-of-Experts
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …

Išsaugoti Cituoti Cituoja 59 Susiję straipsniai Visos 7 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models

Knowledge graphs meet multi-modal learning: A comprehensive survey

Qwen2. 5 technical report

MM1: methods, analysis and insights from multimodal LLM pre-training

Llamafactory: Unified efficient fine-tuning of 100+ language models

Pissa: Principal singular values and singular vectors adaptation of large language models

Minicache: Kv cache compression in depth dimension for large language models

Deepseek-v3 technical report

A survey on mixture of experts

Llama-moe: Building mixture-of-experts from llama with continual pre-training

Openmoe: An early effort on open mixture-of-experts language models