Dissociating language and thought in large language models

K Mahowald, AA Ivanova, IA Blank, N Kanwisher… - Trends in Cognitive …, 2024 - cell.com
Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …

A review of sparse expert models in deep learning

W Fedus, J Dean, B Zoph - arxiv preprint arxiv:2209.01667, 2022 - arxiv.org
Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Mixtral of experts

AQ Jiang, A Sablayrolles, A Roux, A Mensch… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has
the same architecture as Mistral 7B, with the difference that each layer is composed of 8 …

Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need

DW Zhou, ZW Cai, HJ Ye, DC Zhan, Z Liu - arxiv preprint arxiv …, 2023 - arxiv.org
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Content-aware local gan for photo-realistic super-resolution

JK Park, S Son, KM Lee - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Recently, GAN has successfully contributed to making single-image super-resolution (SISR)
methods produce more realistic images. However, natural images have complex distribution …

Large language models are visual reasoning coordinators

L Chen, B Li, S Shen, J Yang, C Li… - Advances in …, 2024 - proceedings.neurips.cc
Visual reasoning requires multimodal perception and commonsense cognition of the world.
Recently, multiple vision-language models (VLMs) have been proposed with excellent …

Can knowledge graphs reduce hallucinations in llms?: A survey

G Agrawal, T Kumarage, Z Alghamdi, H Liu - arxiv preprint arxiv …, 2023 - arxiv.org
The contemporary LLMs are prone to producing hallucinations, stemming mainly from the
knowledge gaps within the models. To address this critical limitation, researchers employ …

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arxiv preprint arxiv:2302.11529, 2023 - arxiv.org
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …