Universality, Generalization, and Compression in Machine Learning

Author: Akhtiamov, Danil

Year: 2026

Degree: Dissertation (Ph.D.)

Advisor: Hassibi, Babak

Committee Members: Chandrasekaran, Venkat; Hassibi, Babak; Anandkumar, Anima; Abu-Mostafa, Yaser S.

Option: Computer Science

DOI: 10.7907/3e1y-3d66

Abstract

The primary contribution of this manuscript is a systematic asymptotic analysis of a range of learning and approximation algorithms, together with the development of analytical tools that enable such studies and may be broadly applicable to related problems. To deepen the understanding of generalization and compression, we establish novel Gaussian universality results and combine them with Gaussian comparison inequalities to derive precise asymptotic performance characterizations.

Gaussian Universality is a general principle suggesting that, for a wide class of learning problems, key quantities such as training and test performance can be characterized by replacing complicated data or design distributions with high-dimensional Gaussians having matching first- and second-order statistics. This perspective is used to analyze transfer learning in linear models, performance of the one-bit random features model, and the approximation error of the Randomized Singular Value Decomposition.

Model compression, another central theme in modern machine learning, refers to the reduction of model size while preserving predictive performance. This is typically achieved through techniques such as quantization, sparsification, or low-rank factorization. In this thesis, we investigate one-bit quantization in random features models, sparsification and one-bit compression in regularized linear classification, and low-rank approximation algorithms for matrix-valued optimization problems. Our results demonstrate that, in the over-parameterized regime, aggressive compression is often possible with only minimal degradation in predictive accuracy.

In addition to the results mentioned above, the present manuscript studies linear denoisers in the proportional regime and develops new Gaussian comparison tools for high-dimensional inference. Taken together, the results of this thesis contribute to a broader picture in which properties of data distribution, model structure, and the optimizer jointly determine generalization behavior in modern machine learning.