The Brain’s Second Look: Generative Feedback and Dynamic Coding in Primate Vision

Author: Shi, Yuelin

Year: 2026

Degree: Dissertation (Ph.D.)

Advisor: Tsao, Doris Y.

Committee Members: Meister, Markus; Andersen, Richard A.; Perona, Pietro; Tsao, Doris Y.

Option: Neurobiology

Abstract

Vision must infer the latent causes of retinal input from signals that are incomplete, noisy, and often ambiguous. This thesis asks whether primate vision is best understood as a predominantly feedforward computation or as a generative, recurrent process in which higher-order hypotheses help shape sensory representations over time. I propose that the visual brain is best described as performing analysis by synthesis: internal scene hypotheses generate predicted sensory structure, and perception emerges through iterative interactions between these predictions and incoming evidence.

The first part of the thesis develops this proposal at the conceptual and neurobiological levels. Drawing on phenomena such as imagery, dreaming, active vision, bistable perception, perceptual completion, and the effects of prior knowledge, I argue that a purely feedforward account cannot fully explain the coordinated, context-sensitive, and inference-like nature of visual experience. I then review computational families of generative models and neuroscientific evidence for feedback, iterative dynamics, and laminar circuitry, proposing that these features are well suited to implement hypothesis-driven visual inference in the brain.

The second part tests this framework in the macaque face-patch system using simultaneous Neuropixels recordings from face patches ML and AM. To ask whether degraded-face recognition engages feedback-supported inference, I measured the time at which degraded-face responses become aligned with intact identity representations. Under passive viewing of a wide range of degradations, both response timing and intact-to-degraded generalization followed the canonical posterior-to-anterior ordering, with ML preceding AM. Thus, degradation alone did not reveal an anterior-leading signature of top-down inference within inferotemporal cortex. In contrast, learning to associate ambiguous Mooney faces with their intact counterparts selectively reshaped ML population activity, increasing both representational separability and cross-condition generalization for upright Mooney faces. These findings suggest that top-down recruitment in the ventral stream is constrained and may depend not simply on degraded input, but on the availability of learned priors that can disambiguate it.

The final part shows that high-level visual coding itself is dynamically reformatted over time. Large-scale recordings from ML and AM during viewing of faces and non-face objects revealed that face-selective neurons do not use a single, fixed encoding axis. Instead, responses to faces initially align with a domain-general object code, consistent with rapid face detection, but then undergo a rapid, concerted switch within 20 ms. This switch includes reversal of tuning in low dimensions of object space, emergence of new tuning in higher-dimensional face space, increased response sparsity, and improved reconstruction and discrimination of individual faces. The effect is stimulus-gated, appearing for faces but not for non-face objects, and resolves a long-standing debate by showing that inferotemporal face coding is both domain general and domain specific, but at different moments in time.

Together, these studies support a view of primate vision as a dynamic and knowledge-sensitive process. Rather than relying on a static feedforward code alone, the visual system appears to use recurrent computations that allow sensory representations to be revised, sharpened, and reformatted as incoming evidence interacts with internal models and prior knowledge.