A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Llamafactory: Unified efficient fine-tuning of 100+ language models

Y Zheng, R Zhang, J Zhang, Y Ye, Z Luo… - arxiv preprint arxiv …, 2024 - arxiv.org
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …

Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling

H Bansal, A Hosseini, R Agarwal, VQ Tran… - arxiv preprint arxiv …, 2024 - arxiv.org
Training on high-quality synthetic data from strong language models (LMs) is a common
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …

Recranker: Instruction tuning large language model as ranker for top-k recommendation

S Luo, B He, H Zhao, W Shao, Y Qi, Y Huang… - ACM Transactions on …, 2024 - dl.acm.org
Large Language Models (LLMs) have demonstrated remarkable capabilities and have been
extensively deployed across various domains, including recommender systems. Prior …

[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …

F Wang, Z Zhang, X Zhang, Z Wu, T Mo, Q Lu… - arxiv preprint arxiv …, 2024 - ai.radensa.ru
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …

Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding

Z Wu, X Chen, Z Pan, X Liu, W Liu, D Dai… - arxiv preprint arxiv …, 2024 - arxiv.org
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-
Language Models that significantly improves upon its predecessor, DeepSeek-VL, through …

Tablegpt2: A large multimodal model with tabular data integration

A Su, A Wang, C Ye, C Zhou, G Zhang, G Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
The emergence of models like GPTs, Claude, LLaMA, and Qwen has reshaped AI
applications, presenting vast new opportunities across industries. Yet, the integration of …

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Q Ren, H Li, D Liu, Z **e, X Lu, Y Qiao, L Sha… - arxiv preprint arxiv …, 2024 - arxiv.org
This study exposes the safety vulnerabilities of Large Language Models (LLMs) in multi-turn
interactions, where malicious users can obscure harmful intents across several queries. We …

Memorize step by step: Efficient long-context prefilling with incremental memory and decremental chunk

Z Zeng, Q Guo, X Liu, Z Yin, W Shu… - Proceedings of the …, 2024 - aclanthology.org
Abstract The evolution of Large Language Models (LLMs) has led to significant
advancements, with models like Claude and Gemini capable of processing contexts up to 1 …

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

W An, X Bi, G Chen, S Chen, C Deng… - … Conference for High …, 2024 - ieeexplore.ieee.org
The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has
exponentially increased demands of computational power and bandwidth. This, combined …