Statistical Methods for Gene Differential Expression Analysis of RNA-Sequencing
Author: Yi, Lynn Donglin
Year: 2019
Degree: Dissertation (Ph.D.)
Advisor: Pachter, Lior S.
Committee Members: Chan, David C.; Thomson, Matthew; Pachter, Lior S.; Chandrasekaran, Venkat
Option: Systems Biology
DOI: 10.7907/0YE6-2217
Abstract
RNA-Sequencing ("RNA-Seq") is performed to measure gene expression, often to ask the question of what genes are differentially expressed across various biological conditions. Statistical methods have been used to model RNA-Seq quantifications in order to determine differential expression, and have traditionally be divided into gene-level methods and transcript-level methods. There has been little attempt to connect the statistical divide, although transcript expression and gene expression are biologically inextricably linked. In this thesis, we provide a case study of a comparative differential expression analysis, demonstrating that many differential expression events happen on the isoform-level, and that performing an analysis using only summarized gene quantifications would fail to capture these events. Furthermore, we develop statistical methods that unify the transcript-level and gene-level analysis. In bulk RNA-Seq, by using p-value aggregation methods, we are able to translate transcript-level results into gene-level results under a unified framework. For single cell RNA-Seq, we propose using multiple logistic regression, leveraging the high dimensionality of the data in order to determine if the transcript quantifications pertaining to a gene are able to constitute a linear discriminant for cell type. This method combines differential transcript expression analysis and differential gene expression analysis into a unified framework which we call “gene differential expression.” Lastly, we demonstrate that our methods could be used on transcript compatibility counts instead of transcript quantifications in order to bypass ambiguous read assignment and improve accuracy. We show that transcript compatibility counts obtained via transcriptome pseudoalignment are comparable in quantification accuracy to quantifications from genome alignment methods.
Files
- [ChapterIV Supplement.pdf](/11226/07/ChapterIV Supplement.pdf) (application/pdf)
- [ChapterIII Supplement.pdf](/11226/12/ChapterIII Supplement.pdf) (application/pdf)
- 5.16.Thesis.pdf (application/pdf)