J Ji,
T Qiu,
B Chen, B Zhang,
H Lou, K Wang… - ar** layers of pre-trained transformer models
Transformer-based NLP models are trained using hundreds of millions or even billions of
parameters, limiting their applicability in computationally constrained environments. While …