Complexity of Transcriptomic Data Analysis and Implications for Biological Discovery

Author: Luebbert, Laura

Year: 2024

Degree: Dissertation (Ph.D.)

Advisor: Pachter, Lior S.

Committee Members: Van Valen, David A.; Murray, Richard M.; Bjorkman, Pamela J.; Pachter, Lior S.

Option: Biology

DOI: 10.7907/xnw5-v914

Abstract

Over the past decade, the advancement of ‘omics’ technologies has ushered in a new era for the life sciences. Given the high-throughput nature of omics technologies, this era is characterized by unique computational challenges pertaining to data size and dimensionality, and technical and biological noise. Concurrently, it offers opportunities, as global, untargeted, and parallel measurement of large amounts of information often captures unexpected insights.

This thesis describes challenges inherent to the omics era of life sciences, particularly highlighting the increasing importance of merging expertise in biology and computer science. It describes the development of multiple software tools designed to address several of these challenges, which were immediately adopted and widely implemented in transcriptomics and proteomics research. Additionally, it contains three chapters focused on unraveling previously unquantifiable information, including the interpretation of sequencing data from organisms with low-quality reference genome assemblies and workflows for identifying novel viruses using single-cell RNA sequencing data already massively generated in research, healthcare, and agriculture.

Files