A comprehensive overview of large language models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …
natural language processing tasks and beyond. This success of LLMs has led to a large …
Challenges and applications of large language models
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
Scaling data-constrained language models
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …
Deepseek-vl: towards real-world vision-language understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …
world vision and language understanding applications. Our approach is structured around …
Textbooks are all you need
We introduce phi-1, a new large language model for code, with significantly smaller size
than competing models: phi-1 is a Transformer-based model with 1.3 B parameters, trained …
than competing models: phi-1 is a Transformer-based model with 1.3 B parameters, trained …
Focused transformer: Contrastive training for context scaling
Large language models have an exceptional capability to incorporate new information in a
contextual manner. However, the full potential of such an approach is often restrained due to …
contextual manner. However, the full potential of such an approach is often restrained due to …
Textbooks are all you need ii: phi-1.5 technical report
We continue the investigation into the power of smaller Transformer-based language
models as initiated by\textbf {TinyStories}--a 10 million parameter model that can produce …
models as initiated by\textbf {TinyStories}--a 10 million parameter model that can produce …
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
Foundation models and fair use
Existing foundation models are trained on copyrighted material. Deploying these models
can pose both legal and ethical risks when data creators fail to receive appropriate …
can pose both legal and ethical risks when data creators fail to receive appropriate …
Llemma: An open language model for mathematics
We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …