Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - ar** the generative artificial intelligence (ai) research landscape
TR McIntosh, T Susnjak, T Liu, P Watters… - arxiv preprint arxiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models

D Dai, C Deng, C Zhao, RX Xu, H Gao, D Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for
managing computational costs when scaling up model parameters. However, conventional …

Orca: A distributed serving system for {Transformer-Based} generative models

GI Yu, JS Jeong, GW Kim, S Kim, BG Chun - 16th USENIX Symposium …, 2022 - usenix.org
Large-scale Transformer-based models trained for generation tasks (eg, GPT-3) have
recently attracted huge interest, emphasizing the need for system support for serving models …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale

RY Aminabadi, S Rajbhandari, AA Awan… - … Conference for High …, 2022 - ieeexplore.ieee.org
The landscape of transformer model inference is increasingly diverse in model size, model
characteristics, latency and throughput requirements, hardware requirements, etc. With such …

Making ai less" thirsty": Uncovering and addressing the secret water footprint of ai models

P Li, J Yang, MA Islam, S Ren - arxiv preprint arxiv:2304.03271, 2023 - arxiv.org
The growing carbon footprint of artificial intelligence (AI) models, especially large ones such
as GPT-3, has been undergoing public scrutiny. Unfortunately, however, the equally …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

A review of sparse expert models in deep learning

W Fedus, J Dean, B Zoph - arxiv preprint arxiv:2209.01667, 2022 - arxiv.org
Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …