Dissociating language and thought in large language models
Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …
human language, yet opinions about their linguistic and cognitive capabilities remain split …
A review of sparse expert models in deep learning
Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …
deep learning. This class of architecture encompasses Mixture-of-Experts, Switch …
The llama 3 herd of models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …
presents a new set of foundation models, called Llama 3. It is a herd of language models …
Mixtral of experts
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has
the same architecture as Mistral 7B, with the difference that each layer is composed of 8 …
the same architecture as Mistral 7B, with the difference that each layer is composed of 8 …
Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …
Efficient large language models: A survey
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …
tasks such as natural language understanding and language generation, and thus have the …
Content-aware local gan for photo-realistic super-resolution
Recently, GAN has successfully contributed to making single-image super-resolution (SISR)
methods produce more realistic images. However, natural images have complex distribution …
methods produce more realistic images. However, natural images have complex distribution …
Large language models are visual reasoning coordinators
Visual reasoning requires multimodal perception and commonsense cognition of the world.
Recently, multiple vision-language models (VLMs) have been proposed with excellent …
Recently, multiple vision-language models (VLMs) have been proposed with excellent …
Can knowledge graphs reduce hallucinations in llms?: A survey
The contemporary LLMs are prone to producing hallucinations, stemming mainly from the
knowledge gaps within the models. To address this critical limitation, researchers employ …
knowledge gaps within the models. To address this critical limitation, researchers employ …
Modular deep learning
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …
trained models fine-tuned for downstream tasks achieve better performance with fewer …