Aligning and Comparing Vision Representations to Improve Understanding and Performance

Author: Kondapaneni, Neehar

Year: 2025

Degree: Dissertation (Ph.D.)

Advisor: Perona, Pietro

Committee Members: Yue, Yisong; Gkioxari, Georgia; Mac Aodha, Oisin; Perona, Pietro

Option: Computation and Neural Systems

DOI: 10.7907/0crh-zb71

Abstract

Recent advances in large artificial intelligence (AI) models have enabled these models to perform a wide range of real-world tasks with skill levels comparable to or surpassing those of humans. In this thesis, we develop methods to compare, analyze, and align data representations from these powerful models. In Part 1, we develop methods for estimating human knowledge during a learning task and for comparing various data representations. These methods are steps towards a system designed to help us learn from AI.

In Part 2, we show how aligning models can be useful in two separate domains. First, we discover and fix a misalignment in the inputs to a powerful foundation model and show how it improves performance. Second, we show that biologically inspired object manipulation tasks can be used as a training signal for learning human-aligned representations of number. Our results demonstrate the potential for alignment and comparison methods to improve the overall performance of AI models, improve our understanding of biological intelligence, and help us discover new patterns in the natural world.

Files