In-context learning and Occam's razor

E Elmoznino, T Marty, T Kasetty, L Gagnon… - arxiv preprint arxiv …, 2024 - arxiv.org
A central goal of machine learning is generalization. While the No Free Lunch Theorem
states that we cannot obtain theoretical guarantees for generalization without further …

Feature forgetting in continual representation learning

X Zhang, D Dou, J Wu - arxiv preprint arxiv:2205.13359, 2022 - arxiv.org
In continual and lifelong learning, good representation learning can help increase
performance and reduce sample complexity when learning new tasks. There is evidence …

Larger Language Models Provably Generalize Better

MA Finzi, S Kapoor, D Granziol, A Gu… - … Conference on Learning … - openreview.net
Why do larger language models generalize better? To explore this question, we develop
generalization bounds on the pretraining objective of large language models (LLMs) in the …

Information distance for neural network functions

X Zhang, D Dou, J Wu - openreview.net
We provide a practical distance measure in the space of functions parameterized by neural
networks. It is based on the classical information distance, and we propose to replace the …

Model information as an analysis tool in deep learning

X Zhang, D Hu, X Li, D Dou, J Wu - openreview.net
Information-theoretic perspectives can provide an alternative dimension of analyzing the
learning process and complements usual performance metrics. Recently several works …