- Academic Search

H Wang, J Li, H Wu, E Hovy, Y Sun - Engineering, 2023 - Elsevier

Pre-trained language models have achieved striking success in natural language
processing (NLP), leading to a paradigm shift from supervised learning to pre-training …

保存引用被引用数: 274 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

A review of sparse expert models in deep learning

W Fedus, J Dean, B Zoph - arxiv preprint arxiv:2209.01667, 2022 - arxiv.org

Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

保存引用被引用数: 145 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in Neural …, 2022 - proceedings.neurips.cc

Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

保存引用被引用数: 964 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] jmlr.org

Palm: Scaling language modeling with pathways

A Chowdhery, S Narang, J Devlin, M Bosma… - Journal of Machine …, 2023 - jmlr.org

Large language models have been shown to achieve remarkable performance across a
variety of natural language tasks using few-shot learning, which drastically reduces the …

保存引用被引用数: 5592 関連記事全 10 バージョン HTMLバージョン

[Free GPT-4]

[PDF] usenix.org

Orca: A distributed serving system for {Transformer-Based} generative models

GI Yu, JS Jeong, GW Kim, S Kim, BG Chun - 16th USENIX Symposium …, 2022 - usenix.org

Large-scale Transformer-based models trained for generation tasks (eg, GPT-3) have
recently attracted huge interest, emphasizing the need for system support for serving models …

保存引用被引用数: 357 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

保存引用被引用数: 4687 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] usenix.org

{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving

Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X **… - … USENIX Symposium on …, 2023 - usenix.org

Model parallelism is conventionally viewed as a method to scale a single large deep
learning model beyond the memory limits of a single device. In this paper, we demonstrate …

保存引用被引用数: 134 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Content-aware local gan for photo-realistic super-resolution

JK Park, S Son, KM Lee - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Recently, GAN has successfully contributed to making single-image super-resolution (SISR)
methods produce more realistic images. However, natural images have complex distribution …

保存引用被引用数: 48 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Efficient large-scale language model training on gpu clusters using megatron-lm

D Narayanan, M Shoeybi, J Casper… - Proceedings of the …, 2021 - dl.acm.org

Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …

保存引用被引用数: 719 関連記事全 11 バージョン

[Free GPT-4]

[PDF] jmlr.org

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

W Fedus, B Zoph, N Shazeer - Journal of Machine Learning Research, 2022 - jmlr.org

In deep learning, models typically reuse the same parameters for all inputs. Mixture of
Experts (MoE) models defy this and instead select different parameters for each incoming …

保存引用被引用数: 2017 関連記事全 4 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Mesh-tensorflow: Deep learning for supercomputers

[HTML][HTML] Pre-trained language models and their applications

A review of sparse expert models in deep learning

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

Palm: Scaling language modeling with pathways

Orca: A distributed serving system for {Transformer-Based} generative models

On the opportunities and risks of foundation models

{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving

Content-aware local gan for photo-realistic super-resolution

Efficient large-scale language model training on gpu clusters using megatron-lm

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity