Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025 - dl.acm.org
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

The What, Why, and How of Context Length Extension Techniques in Large Language Models--A Detailed Survey

S Pawar, SM Tonmoy, SM Zaman, V Jain… - arxiv preprint arxiv …, 2024 - arxiv.org
The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural
Language Processing (NLP), contributing to substantial progress in both text …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation

Z Yao, X Wu, C Li, S Youn, Y He - arxiv preprint arxiv:2303.08302, 2023 - arxiv.org
Post-training quantization (PTQ) has emerged as a promising technique for mitigating
memory consumption and computational costs in large language models (LLMs). However …

Response length perception and sequence scheduling: An llm-empowered llm inference pipeline

Z Zheng, X Ren, F Xue, Y Luo… - Advances in Neural …, 2024 - proceedings.neurips.cc
Large language models (LLMs) have revolutionized the field of AI, demonstrating
unprecedented capacity across various tasks. However, the inference process for LLMs …

Exploring post-training quantization in llms from comprehensive study to low rank compensation

Z Yao, X Wu, C Li, S Youn, Y He - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Post-training quantization (PTQ) has emerged as a promising technique for mitigating
memory consumption and computational costs in large language models (LLMs). However …

Zeroquant-fp: A leap forward in llms post-training w4a8 quantization using floating-point formats

X Wu, Z Yao, Y He - arxiv preprint arxiv:2307.09782, 2023 - arxiv.org
In the complex domain of large language models (LLMs), striking a balance between
computational efficiency and maintaining model quality is a formidable challenge …