Genetic Interrogation of Expression Regulation
Author: Carilli, Maria Theresa Natalina
Year: 2026
Degree: Dissertation (Ph.D.)
Advisor: Pachter, Lior S.
Committee Members: Phillips, Robert B.; Yue, Yisong; Engelhardt, Barbara E.; Pachter, Lior S.
Option: Biochemistry and Molecular Biophysics
DOI: 10.7907/qnzh-rf18
Abstract
Understanding the effects of variation in the genome on organisms' phenotypes is a central goal of biology. For the past several decades, this has been done by performing large-scale association tests to identify links between genetic variants across the genome and particular diseases or traits. As most variants are in non-coding regions of the genome and act only in specific contexts, there arose the intermediate step of associating variants with gene expression patterns in particular tissues using bulk RNA-sequencing data and, with the recent widespread adoption of single-cell RNA-sequencing, particular cell types. However, associative tests are generally restricted to variants within some sequence distance of the gene, as testing tens of millions of distal variants against tens of thousands of genes in hundreds of contexts is computationally and statistically burdensome. Moreover, associating these proximal variants to changes in average gene expression using scRNA-seq data is far from pinpointing the cellular process they may affect. To move beyond analysis of averages, recent work has advocated for a mechanistic approach to analyzing scRNA-seq by modeling the underlying biological processes of transcription, splicing, and degradation.
In this thesis, we first show how biophysical models are useful for identifying changes in biophysical processes not accessible at the level of mean expression. We next develop accelerated inference procedures for these models to make feasible their application at the scale required for genetic association tests. We then propose a framework for testing for the presence or absence of proximal or distal gene regulation using homozygous crosses. Finally, we couple the genetic testing framework and biophysical models to identify regulatory strategies of biophysical processes across cell types in eight tissues of eight genetically diverse mouse strains.