From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arxiv preprint arxiv …, 2023‏ - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

A survey on scheduling techniques in computing and network convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023‏ - ieeexplore.ieee.org
The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Megablocks: Efficient sparse training with mixture-of-experts

T Gale, D Narayanan, C Young… - … of Machine Learning …, 2023‏ - proceedings.mlsys.org
We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …

Enhancing simplified chinese poetry comprehension in llama-7b: A novel approach to mimic mixture of experts effect

Y Zhang, X Chen - 2023‏ - researchsquare.com
This study explored the potential of manual augmentation in enhancing the comprehension
and translation capabilities of large language models, specifically focusing on the LLaMA …

Accelerating distributed {MoE} training and inference with lina

J Li, Y Jiang, Y Zhu, C Wang, H Xu - 2023 USENIX Annual Technical …, 2023‏ - usenix.org
Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …

Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference

R Hwang, J Wei, S Cao, C Hwang… - 2024 ACM/IEEE 51st …, 2024‏ - ieeexplore.ieee.org
Large language models (LLMs) based on transformers have made significant strides in
recent years, the success of which is driven by scaling up their model size. Despite their high …

Schemoe: An extensible mixture-of-experts distributed training system with tasks scheduling

S Shi, X Pan, Q Wang, C Liu, X Ren, Z Hu… - Proceedings of the …, 2024‏ - dl.acm.org
In recent years, large-scale models can be easily scaled to trillions of parameters with
sparsely activated mixture-of-experts (MoE), which significantly improves the model quality …

Janus: A unified distributed training framework for sparse mixture-of-experts models

J Liu, JH Wang, Y Jiang - Proceedings of the ACM SIGCOMM 2023 …, 2023‏ - dl.acm.org
Scaling models to large sizes to improve performance has led a trend in deep learning, and
sparsely activated Mixture-of-Expert (MoE) is a promising architecture to scale models …

A hybrid tensor-expert-data parallelism approach to optimize mixture-of-experts training

S Singh, O Ruwase, AA Awan, S Rajbhandari… - Proceedings of the 37th …, 2023‏ - dl.acm.org
Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert
blocks to a base model, increasing the number of parameters without impacting …