A comprehensive overview of large language models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …
natural language processing tasks and beyond. This success of LLMs has led to a large …
Llamafactory: Unified efficient fine-tuning of 100+ language models
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …
However, it requires non-trivial efforts to implement these methods on different models. We …
Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling
Training on high-quality synthetic data from strong language models (LMs) is a common
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …
Recranker: Instruction tuning large language model as ranker for top-k recommendation
Large Language Models (LLMs) have demonstrated remarkable capabilities and have been
extensively deployed across various domains, including recommender systems. Prior …
extensively deployed across various domains, including recommender systems. Prior …
[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …
question answering, and reasoning, facilitating various tasks and domains. Despite their …
Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-
Language Models that significantly improves upon its predecessor, DeepSeek-VL, through …
Language Models that significantly improves upon its predecessor, DeepSeek-VL, through …
Tablegpt2: A large multimodal model with tabular data integration
A Su, A Wang, C Ye, C Zhou, G Zhang, G Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
The emergence of models like GPTs, Claude, LLaMA, and Qwen has reshaped AI
applications, presenting vast new opportunities across industries. Yet, the integration of …
applications, presenting vast new opportunities across industries. Yet, the integration of …
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
This study exposes the safety vulnerabilities of Large Language Models (LLMs) in multi-turn
interactions, where malicious users can obscure harmful intents across several queries. We …
interactions, where malicious users can obscure harmful intents across several queries. We …
Memorize step by step: Efficient long-context prefilling with incremental memory and decremental chunk
Abstract The evolution of Large Language Models (LLMs) has led to significant
advancements, with models like Claude and Gemini capable of processing contexts up to 1 …
advancements, with models like Claude and Gemini capable of processing contexts up to 1 …
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
W An, X Bi, G Chen, S Chen, C Deng… - … Conference for High …, 2024 - ieeexplore.ieee.org
The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has
exponentially increased demands of computational power and bandwidth. This, combined …
exponentially increased demands of computational power and bandwidth. This, combined …