Single-cell RNA sequencing data imputation using similarity preserving network

Abstract

Recent advancements in single-cell RNA sequencing (scRNA-seq) technologies have allowed us to monitor the gene expression of individual cells. This level of detail in monitoring and characterization enables the research of cells in rapidly changing and heterogeneous environments such as early stage embryo or tumor tissue. However, the current scRNA-seq technologies are still facing many outstanding challenges. Due to the low amount of starting material, a large portion of expression values in scRNA-seq data is missing and reported as zeros. Moreover, scRNA-seq platforms are trending toward prioritizing high throughput over sequencing depth, which makes the problem become more serious in large datasets. These missing values can greatly affect the accuracy of downstream analyses. Here we introduce scINN, a neural network-based approach, that can reliably recover the missing values in single-cell data and thus can effectively improve the performance of downstream analyses. To impute the dropouts in single-cell data, we buil a neural network that consists of two sub-networks: imputation sub-network and quality assessment sub-network. We compare scINN with stateof- the-art imputation methods using 10 scRNA-seq datasets with a total of more than 100,000 cells. In an extensive analysis, we demonstrate that scINN outperforms existing imputation methods in improving the identification of cell sub-populations and the quality of transcriptome landscape visualization.

Publication
The 13th International Conference on KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2021)
Avatar
Duc Tran
Bioinformatics Scientist