Llm inference serving: Survey of recent advances and opportunities
B Li, Y Jiang, V Gadepally, D Tiwari - ar**
Mixture of experts (MoE) is a popular technique in deep learning that improves model
capacity with conditionally-activated parallel neural network modules (experts). However …
capacity with conditionally-activated parallel neural network modules (experts). However …
Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling
As machine learning models scale in size and complexity, their computational requirements
become a significant barrier. Mixture-of-Experts (MoE) models alleviate this issue by …
become a significant barrier. Mixture-of-Experts (MoE) models alleviate this issue by …
FLAME: Fully Leveraging MoE Sparsity for Transformer on FPGA
X Lin, H Tian, W Xue, L Ma, J Cao, M Zhang… - Proceedings of the 61st …, 2024 - dl.acm.org
MoE (Mixture-of-Experts) mechanism has been widely adopted in transformer-based models
to facilitate further expansion of model parameter size and enhance generalization …
to facilitate further expansion of model parameter size and enhance generalization …
APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes
Recently, the sparsely-gated Mixture-Of-Experts (MoE) architecture has garnered significant
attention. To benefit a wider audience, fine-tuning MoE models on more affordable clusters …
attention. To benefit a wider audience, fine-tuning MoE models on more affordable clusters …