In-context learning and Occam's razor
A central goal of machine learning is generalization. While the No Free Lunch Theorem
states that we cannot obtain theoretical guarantees for generalization without further …
states that we cannot obtain theoretical guarantees for generalization without further …
Feature forgetting in continual representation learning
In continual and lifelong learning, good representation learning can help increase
performance and reduce sample complexity when learning new tasks. There is evidence …
performance and reduce sample complexity when learning new tasks. There is evidence …
Larger Language Models Provably Generalize Better
Why do larger language models generalize better? To explore this question, we develop
generalization bounds on the pretraining objective of large language models (LLMs) in the …
generalization bounds on the pretraining objective of large language models (LLMs) in the …
Information distance for neural network functions
We provide a practical distance measure in the space of functions parameterized by neural
networks. It is based on the classical information distance, and we propose to replace the …
networks. It is based on the classical information distance, and we propose to replace the …
Model information as an analysis tool in deep learning
Information-theoretic perspectives can provide an alternative dimension of analyzing the
learning process and complements usual performance metrics. Recently several works …
learning process and complements usual performance metrics. Recently several works …