- Academic Search

Quantization-aware and tensor-compressed training of transformers for natural language understanding

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

Save Cite Cited by 75 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Introduction to transformers: an nlp perspective

T **ao, J Zhu - arxiv preprint arxiv:2311.17633, 2023 - arxiv.org

Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

Save Cite Cited by 23 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

RoBERTa-CoA: RoBERTa-based effective finetuning method using co-attention

JH Kim, SW Park, JY Kim, J Park, SH Jung… - IEEE Access, 2023 - ieeexplore.ieee.org

In the field of natural language processing, artificial intelligence (AI) technology has been
utilized to solve various problems, such as text classification, similarity measurement …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization

J Tian, J Lu, H Li, X Wang, I Young, Z Zhang - arxiv preprint arxiv …, 2025 - arxiv.org

Transformer models have achieved state-of-the-art performance across a wide range of
machine learning tasks. There is growing interest in training transformers on resource …

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diagonal Gaussian mixture models and higher order tensor decompositions

B Guo, J Nie, Z Yang - arxiv preprint arxiv:2401.01337, 2024 - arxiv.org

This paper studies how to recover parameters in diagonal Gaussian mixture models using
tensors. High-order moments of the Gaussian mixture model are estimated from samples …

Save Cite Cited by 1 Related articles All 5 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] proquest.com

[BOOK][B] Low-Rank Tensorized Neural Networks With Tensor Geometry Optimization

R Solgi - 2024 - search.proquest.com

Deep neural networks have demonstrated significant achievements across various fields,
yet their memory and time complexities present obstacles for implementing them on …

Create alert

Cite

Advanced search

Saved to My library

Quantization-aware and tensor-compressed training of transformers for natural language understanding

Beyond efficiency: A systematic survey of resource-efficient large language models

Introduction to transformers: an nlp perspective

RoBERTa-CoA: RoBERTa-based effective finetuning method using co-attention

Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization

Diagonal Gaussian mixture models and higher order tensor decompositions

[BOOK][B] Low-Rank Tensorized Neural Networks With Tensor Geometry Optimization