- Academic Search

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Save Cite Cited by 605 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

KS Kalyan - Natural Language Processing Journal, 2024 - Elsevier

Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

Save Cite Cited by 253 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Llm-pruner: On the structural pruning of large language models

X Ma, G Fang, X Wang - Advances in neural information …, 2023 - proceedings.neurips.cc

Large language models (LLMs) have shown remarkable capabilities in language
understanding and generation. However, such impressive capability typically comes with a …

Save Cite Cited by 486 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Save Cite Cited by 2650 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Taxonomy of risks posed by language models

L Weidinger, J Uesato, M Rauh, C Griffin… - Proceedings of the …, 2022 - dl.acm.org

Responsible innovation on large-scale Language Models (LMs) requires foresight into and
in-depth understanding of the risks these models may pose. This paper develops a …

Save Cite Cited by 609 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing

P He, J Gao, W Chen - arxiv preprint arxiv:2111.09543, 2021 - arxiv.org

This paper presents a new pre-trained language model, DeBERTaV3, which improves the
original DeBERTa model by replacing mask language modeling (MLM) with replaced token …

Save Cite Cited by 1013 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Save Cite Cited by 3292 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Z Yao, R Yazdani Aminabadi… - Advances in …, 2022 - proceedings.neurips.cc

How to efficiently serve ever-larger trained natural language models in practice has become
exceptionally challenging even for powerful cloud servers due to their prohibitive …

Save Cite Cited by 385 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Glm-130b: An open bilingual pre-trained model

A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding… - arxiv preprint arxiv …, 2022 - arxiv.org

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model
with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as …

Save Cite Cited by 597 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models

N Thakur, N Reimers, A Rücklé, A Srivastava… - arxiv preprint arxiv …, 2021 - arxiv.org

Existing neural information retrieval (IR) models have often been studied in homogeneous
and narrow settings, which has considerably limited insights into their out-of-distribution …

Save Cite Cited by 919 Related articles All 6 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

Llm-pruner: On the structural pruning of large language models

A survey on vision transformer

Taxonomy of risks posed by language models

Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing

Knowledge distillation: A survey

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Glm-130b: An open bilingual pre-trained model

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models