Illuminating the Regulatory Dark Matter of E. coli with Massively Parallel Reporter Assays
Author: Röschinger, Tom
Year: 2026
Degree: Dissertation (Ph.D.)
Advisor: Phillips, Robert B.
Committee Members: Thomson, Matthew; Rothenberg, Ellen V.; Elowitz, Michael B.; Bois, Justin S.; Phillips, Robert B.
Option: Biochemistry and Molecular Biophysics
DOI: 10.7907/3qy4-em46
Abstract
All cells respond to changes in their environment through the regulation of their genes. Despite decades of effort in Escherichia coli, huge gaps remain in our knowledge of both the function of many genes - the so-called y-ome - and how they are regulated. For roughly 40% of genes, no function has been assigned, and for the majority we do not know which, if any, transcription factors control their expression. Here we describe a joint experimental and theoretical dissection of the regulation of 117 promoters in E. coli across 39 diverse environments, enabling us to identify the binding sites and transcription factors that mediate regulatory control at base-pair resolution.
Using Reg-Seq - a combination of saturation mutagenesis, massively parallel reporter assays, mass spectrometry, and tools from information theory - we go from complete ignorance of a promoter's environment-dependent regulatory architecture to detailed models of its behavior. We first develop the theoretical framework for interpreting the information footprints that are the primary readout of Reg-Seq, using toy models to establish the expected scale of mutual information at binding site positions and how the noise floor scales with sequencing depth. We then describe improvements to the genome-integrated Reg-Seq protocol, including redesigned constructs and an expanded condition panel spanning carbon source shifts, antibiotic stress, anaerobiosis, osmotic shock, and stationary phase.
As proof of principle, we chose a combination of gold standard promoters with well-characterized regulation, genes from the y-ome, toxin-antitoxin pairs, and genes hypothesized to be part of regulatory modules. At well-characterized promoters, Reg-Seq recovers known binding sites for LexA, CpxR, MarA, Rob, MprA, and CRP under the expected conditions, while also revealing previously unreported features: new transcription start sites, a novel CRP binding site at the uncharacterized gene yadI, and condition-specific activation patterns not predicted by existing annotations. Extending the method to 34 promoters with no prior regulatory information, we discovered a host of new insights into the regulatory landscape of the y-ome. A cluster of osmotically induced promoters shares a conserved binding motif for an unidentified transcription factor, and genome-wide scanning with this motif identifies a putative regulon spanning osmoprotectant transport, trehalose metabolism, and envelope modification. At the anaerobic gene ybiY, we identify YciT as a repressor - correcting the existing database annotation - and reveal an unidentified activator that does not correspond to any characterized anaerobic regulator. At the cryptic prophage gene yagB, mass spectrometry identifies both XynR and H-NS, and the overlapping architecture of the repressor and sigma^S binding sites explains the stationary phase specificity of expression.
A systematic survey of single-mutation effects across the library reveals that many promoter regions harbor latent sequences one base change away from creating a functional sigma^70 promoter. These de novo promoters are strongly enriched at loci that require a specific activator, consistent with the signal only being detectable against a silent background. In a complementary set of results, we find that several loci with multiple annotated transcription start sites resolve to fewer active sites under physiological conditions.
Together, these results demonstrate that Reg-Seq can systematically annotate the regulatory architecture of uncharacterized genes, correct existing annotations, and generate testable hypotheses about transcription factor identity and condition-specificity, bridging the gap between single-gene studies and the largely uncharacterized regulatory landscape of E. coli.