Mixtures of Latent Variable Models for Interpreting Gene Expression Covariation from Pathways to Transcriptomes

Author: Markarian, Nicholas

Year: 2026

Degree: Dissertation (Ph.D.)

Advisor: Sternberg, Paul W.

Committee Members: Pachter, Lior S.; Sternberg, Paul W.; Bois, Justin S.; Thomson, Matthew

Option: Biology

DOI: 10.7907/dcbd-wy35

Abstract

This thesis centers on interpretable subspace learning and latent variable models for characterizing covariation modulated by categorical variables in the context of biology. First, it introduces a probabilistic model with ties to Principal Component Analysis and k-means clustering, k-spaces, which has implications across different biological analyses through its interpretations as a subspace learning technique, a latent variable model, and a dimension reduction technique. Second, it establishes the problem of simultaneously characterizing gene covariation and expression level in known pathways in human tissue samples and applies k-spaces to GTEx data to lay the foundations for this line of research. Finally, it outlines a path forward to being able to use such data as references for clinical samples from patients.