Gmlake: Efficient and transparent gpu memory defragmentation for large-scale dnn training with virtual memory stitching

C Guo, R Zhang, J Xu, J Leng, Z Liu, Z Huang… - Proceedings of the 29th …, 2024‏ - dl.acm.org
Large-scale deep neural networks (DNNs), such as large language models (LLMs), have
revolutionized the artificial intelligence (AI) field and become increasingly popular. However …

A survey on dynamic neural networks for natural language processing

C Xu, J McAuley - arxiv preprint arxiv:2202.07101, 2022‏ - arxiv.org
Effectively scaling large Transformer models is a main driver of recent advances in natural
language processing. Dynamic neural networks, as an emerging research direction, are …