Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Lightweight deep learning for resource-constrained environments: A survey
Over the past decade, the dominance of deep learning has prevailed across various
domains of artificial intelligence, including natural language processing, computer vision …
domains of artificial intelligence, including natural language processing, computer vision …
Pre-trained models for natural language processing: A survey
Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …
Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale
Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …
Quip: 2-bit quantization of large language models with guarantees
This work studies post-training parameter quantization in large language models (LLMs).
We introduce quantization with incoherence processing (QuIP), a new method based on the …
We introduce quantization with incoherence processing (QuIP), a new method based on the …
Q-diffusion: Quantizing diffusion models
Diffusion models have achieved great success in image synthesis through iterative noise
estimation using deep neural networks. However, the slow inference, high memory …
estimation using deep neural networks. However, the slow inference, high memory …
Zeroquant: Efficient and affordable post-training quantization for large-scale transformers
How to efficiently serve ever-larger trained natural language models in practice has become
exceptionally challenging even for powerful cloud servers due to their prohibitive …
exceptionally challenging even for powerful cloud servers due to their prohibitive …
Squeezellm: Dense-and-sparse quantization
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …
wide range of tasks. However, deploying these models for inference has been a significant …
A white paper on neural network quantization
While neural networks have advanced the frontiers in many applications, they often come at
a high computational cost. Reducing the power and latency of neural network inference is …
a high computational cost. Reducing the power and latency of neural network inference is …
A survey of quantization methods for efficient neural network inference
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …
Neural Network computations, covering the advantages/disadvantages of current methods …
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale
As the training of giant dense models hits the boundary on the availability and capability of
the hardware resources today, Mixture-of-Experts (MoE) models have become one of the …
the hardware resources today, Mixture-of-Experts (MoE) models have become one of the …