Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
G Team, P Georgiev, VI Lei, R Burnell, L Bai… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we introduce the Gemini 1.5 family of models, representing the next generation
of highly compute-efficient multimodal models capable of recalling and reasoning over fine …
of highly compute-efficient multimodal models capable of recalling and reasoning over fine …
Scaling vision transformers to 22 billion parameters
The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …
present, the largest large language models (LLMs) contain upwards of 100B parameters …
A general theoretical paradigm to understand learning from human preferences
The prevalent deployment of learning from human preferences through reinforcement
learning (RLHF) relies on two important approximations: the first assumes that pairwise …
learning (RLHF) relies on two important approximations: the first assumes that pairwise …
Pali: A jointly-scaled multilingual language-image model
Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …
Offline reinforcement learning with implicit q-learning
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that
improves over the behavior policy that collected the dataset, while at the same time …
improves over the behavior policy that collected the dataset, while at the same time …
Training diffusion models with reinforcement learning
Diffusion models are a class of flexible generative models trained with an approximation to
the log-likelihood objective. However, most use cases of diffusion models are not concerned …
the log-likelihood objective. However, most use cases of diffusion models are not concerned …
Scaling open-vocabulary object detection
M Minderer, A Gritsenko… - Advances in Neural …, 2024 - proceedings.neurips.cc
Open-vocabulary object detection has benefited greatly from pretrained vision-language
models, but is still limited by the amount of available detection training data. While detection …
models, but is still limited by the amount of available detection training data. While detection …
Structured denoising diffusion models in discrete state-spaces
Denoising diffusion probabilistic models (DDPMs)[Ho et al. 2021] have shown impressive
results on image and waveform generation in continuous state spaces. Here, we introduce …
results on image and waveform generation in continuous state spaces. Here, we introduce …
On efficient training of large-scale deep learning models: A literature review
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …
(CV), natural language processing (NLP), and speech. The use of large-scale models …
Patch n'pack: Navit, a vision transformer for any aspect ratio and resolution
The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution
before processing them with computer vision models has not yet been successfully …
before processing them with computer vision models has not yet been successfully …