Retrieval-augmented generation for large language models: A survey

Y Gao, Y **ong, X Gao, K Jia, J Pan, Y Bi, Y Dai… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) demonstrate powerful capabilities, but they still face
challenges in practical applications, such as hallucinations, slow knowledge updates, and …

Mobile edge intelligence for large language models: A contemporary survey

G Qu, Q Chen, W Wei, Z Lin, X Chen… - … Surveys & Tutorials, 2025 - ieeexplore.ieee.org
On-device large language models (LLMs), referring to running LLMs on edge devices, have
raised considerable interest since they are more cost-effective, latency-efficient, and privacy …

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2024 - dl.acm.org
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

Lorahub: Efficient cross-task generalization via dynamic lora composition

C Huang, Q Liu, BY Lin, T Pang, C Du, M Lin - arxiv preprint arxiv …, 2023 - arxiv.org
Low-rank adaptations (LoRA) are often employed to fine-tune large language models
(LLMs) for new tasks. This paper investigates LoRA composability for cross-task …

Searching for best practices in retrieval-augmented generation

X Wang, Z Wang, X Gao, F Zhang, Y Wu… - Proceedings of the …, 2024 - aclanthology.org
Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating
up-to-date information, mitigating hallucinations, and enhancing response quality …

[PDF][PDF] Skeleton-of-thought: Large language models can do parallel decoding

X Ning, Z Lin, Z Zhou, Z Wang, H Yang… - Proceedings ENLSP …, 2023 - lirias.kuleuven.be
This work aims at decreasing the end-to-end generation latency of large language models
(LLMs). One of the major causes of the high generation latency is the sequential decoding …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression

H Jiang, Q Wu, X Luo, D Li, CY Lin, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal …

Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

[PDF][PDF] Soaring from 4k to 400k: Extending llm's context with activation beacon

P Zhang, Z Liu, S **ao, N Shao, Q Ye… - arxiv preprint arxiv …, 2024 - openreview.net
The utilization of long contexts poses a big challenge for large language models due to their
limited context window length. Although the context window can be extended through fine …