Multimodal chain-of-thought reasoning in language models

Z Zhang, A Zhang, M Li, H Zhao, G Karypis… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have shown impressive performance on complex reasoning
by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Holistic evaluation of text-to-image models

T Lee, M Yasunaga, C Meng, Y Mai… - Advances in …, 2024 - proceedings.neurips.cc
The stunning qualitative improvement of text-to-image models has led to their widespread
attention and adoption. However, we lack a comprehensive quantitative understanding of …

Scaling laws for generative mixed-modal language models

A Aghajanyan, L Yu, A Conneau… - International …, 2023 - proceedings.mlr.press
Generative language models define distributions over sequences of tokens that can
represent essentially any combination of data modalities (eg, any permutation of image …

Trusting your evidence: Hallucinate less with context-aware decoding

W Shi, X Han, M Lewis, Y Tsvetkov… - arxiv preprint arxiv …, 2023 - arxiv.org
Language models (LMs) often struggle to pay enough attention to the input context, and
generate texts that are unfaithful or contain hallucinations. To mitigate this issue, we present …

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

Replug: Retrieval-augmented black-box language models

W Shi, S Min, M Yasunaga, M Seo, R James… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce REPLUG, a retrieval-augmented language modeling framework that treats the
language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike …