Statistical Methods for Gene Differential Expression Analysis of RNA-Sequencing

Author: Yi, Lynn Donglin

Year: 2019

Degree: Dissertation (Ph.D.)

Advisor: Pachter, Lior S.

Committee Members: Chan, David C.; Thomson, Matthew; Pachter, Lior S.; Chandrasekaran, Venkat

Option: Systems Biology

Abstract

RNA-Sequencing ("RNA-Seq") is performed to measure gene expression, often to ask the question of what genes are differentially expressed across various biological conditions. Statistical methods have been used to model RNA-Seq quantifications in order to determine differential expression, and have traditionally be divided into gene-level methods and transcript-level methods. There has been little attempt to connect the statistical divide, although transcript expression and gene expression are biologically inextricably linked. In this thesis, we provide a case study of a comparative differential expression analysis, demonstrating that many differential expression events happen on the isoform-level, and that performing an analysis using only summarized gene quantifications would fail to capture these events. Furthermore, we develop statistical methods that unify the transcript-level and gene-level analysis. In bulk RNA-Seq, by using p-value aggregation methods, we are able to translate transcript-level results into gene-level results under a unified framework. For single cell RNA-Seq, we propose using multiple logistic regression, leveraging the high dimensionality of the data in order to determine if the transcript quantifications pertaining to a gene are able to constitute a linear discriminant for cell type. This method combines differential transcript expression analysis and differential gene expression analysis into a unified framework which we call “gene differential expression.” Lastly, we demonstrate that our methods could be used on transcript compatibility counts instead of transcript quantifications in order to bypass ambiguous read assignment and improve accuracy. We show that transcript compatibility counts obtained via transcriptome pseudoalignment are comparable in quantification accuracy to quantifications from genome alignment methods.

Files

ChapterIV Supplement.pdf (application/pdf)
ChapterIII Supplement.pdf (application/pdf)
5.16.Thesis.pdf (application/pdf)