Retrieval-augmented generation for large language models: A survey
Y Gao, Y **ong, X Gao, K Jia, J Pan, Y Bi, Y Dai… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) demonstrate powerful capabilities, but they still face
challenges in practical applications, such as hallucinations, slow knowledge updates, and …
challenges in practical applications, such as hallucinations, slow knowledge updates, and …
Mobile edge intelligence for large language models: A contemporary survey
On-device large language models (LLMs), referring to running LLMs on edge devices, have
raised considerable interest since they are more cost-effective, latency-efficient, and privacy …
raised considerable interest since they are more cost-effective, latency-efficient, and privacy …
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …
natural language processing (NLP), fueling a paradigm shift in information acquisition …
Lorahub: Efficient cross-task generalization via dynamic lora composition
Low-rank adaptations (LoRA) are often employed to fine-tune large language models
(LLMs) for new tasks. This paper investigates LoRA composability for cross-task …
(LLMs) for new tasks. This paper investigates LoRA composability for cross-task …
Searching for best practices in retrieval-augmented generation
Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating
up-to-date information, mitigating hallucinations, and enhancing response quality …
up-to-date information, mitigating hallucinations, and enhancing response quality …
[PDF][PDF] Skeleton-of-thought: Large language models can do parallel decoding
This work aims at decreasing the end-to-end generation latency of large language models
(LLMs). One of the major causes of the high generation latency is the sequential decoding …
(LLMs). One of the major causes of the high generation latency is the sequential decoding …
Towards efficient generative large language model serving: A survey from algorithms to systems
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression
In long context scenarios, large language models (LLMs) face three main challenges: higher
computational/financial cost, longer latency, and inferior performance. Some studies reveal …
computational/financial cost, longer latency, and inferior performance. Some studies reveal …
Advancing transformer architecture in long-context large language models: A comprehensive survey
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
[PDF][PDF] Soaring from 4k to 400k: Extending llm's context with activation beacon
The utilization of long contexts poses a big challenge for large language models due to their
limited context window length. Although the context window can be extended through fine …
limited context window length. Although the context window can be extended through fine …