Tobias Wängberg: Statistical methods for molecular tissue-profiling
Time: Wed 2023-01-25 15.15 - 16.00
Location: Cramér room, Albano building 1
Doctoral student: Tobias Wängberg
Opponent: Nghia Vu (MEB, KI)
Supervisor: Joanna Tyrcha and Chun-Biu Li
Abstract
Through the advent of sequencing technologies, researchers have been able to map the entire human genome. The next major step is to classify the cells in order to obtain a cellular basis for health and disease. The previous ground for a taxonomy of the human cells have been mainly based on biological function and morphology. Now due to the advances in single-cell sequencing a molecular basis for classification is possible by grouping the cells based on their gene expression profiles. Development of statistical tools for the analysis of such data is the main focus of the two papers included in this thesis.
In the first paper we propose a method that enables visualising the high dimensional sequencing data in a 2-dimensional scatter plot. It is demonstrated that the method is able to robustly reveal key structures of the data, such as clusters, hierarchical organisations of cell types and continuous developmental trajectories. The proposed method is shown to outperform existing state-of-the-art methods on simulated and real data sets.
In the second paper we first propose a method for clustering single-cell sequencing data. The method is shown to more accurately group the data compared to the commonly used Louvain clustering method. The quality of the clustering is monitored by validation indices. Unlike the previous method, the proposed method outputs cluster membership probabilities that thus takes into account the uncertainty in assigning cells to cell types. The proposed method is automatic in the sense that their are no free parameters needing to be subjectively set by the user. In addition, a mathematical model is presented that the describes the stochastic evolution of the number of messenger RNA molecules present in a cell. This can serve as a basis for statistical inference and comparison of cell types. Finally, approximations of the stationary distribution are derived based on perturbation analysis