Personal llm agents: Insights and survey about the capability, efficiency and security

Y Li, H Wen, W Wang, X Li, Y Yuan, G Liu, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Since the advent of personal computing devices, intelligent personal assistants (IPAs) have
been one of the key technologies that researchers and engineers have focused on, aiming …

Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

A survey on large language models for code generation

J Jiang, F Wang, J Shen, S Kim, S Kim - arxiv preprint arxiv:2406.00515, 2024 - arxiv.org
Large Language Models (LLMs) have garnered remarkable advancements across diverse
code-related tasks, known as Code LLMs, particularly in code generation that generates …

The unreasonable ineffectiveness of the deeper layers

A Gromov, K Tirumala, H Shapourian… - arxiv preprint arxiv …, 2024 - arxiv.org
We empirically study a simple layer-pruning strategy for popular families of open-weight
pretrained LLMs, finding minimal degradation of performance on different question …

Llava-prumerge: Adaptive token reduction for efficient large multimodal models

Y Shang, M Cai, B Xu, YJ Lee, Y Yan - arxiv preprint arxiv:2403.15388, 2024 - arxiv.org
Large Multimodal Models (LMMs) have shown significant visual reasoning capabilities by
connecting a visual encoder and a large language model. LMMs typically take in a fixed and …

Tinyllava: A framework of small-scale large multimodal models

B Zhou, Y Hu, X Weng, J Jia, J Luo, X Liu, J Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
We present the TinyLLaVA framework that provides a unified perspective in designing and
analyzing the small-scale Large Multimodal Models (LMMs). We empirically study the effects …