MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems

Y Fu, Y Jiang, Y Huang, P Nie, Z Lu, L Xue… - arxiv preprint arxiv …, 2024 - arxiv.org
The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large
Language Models (LLMs) efficiently; however, MoE systems rely on heterogeneous …