MIA: Multi-cohort Integrated Analysis for Biomarker Identification


Advanced high-throughput technologies have produced vast amounts of biological data. Data integration is the key to obtain the power needed to pinpoint the biological mechanisms and biomarkers of the underlying disease. Two critical drawbacks of computational approaches for data integration is that they do not account for study bias, as well as the noisy nature of molecular data. This leads to unreliable and inconsistent results, i.e., the results change drastically when the input is slightly perturbed or when additional datasets are added to the analysis. Here we propose a multi-cohort integrated approach, named MIA, for biomarker identification that is robust to noise and study bias. We deploy a leave-one-out strategy to avoid the disproportionate influence of a single cohort. We also utilize techniques from both p-value-based and effect-size-based meta-analyses to ensure that the identified genes are significantly impacted. We compare MIA versus classical approaches (Fisher’s, Stouffer’s, maxP, minP, and the additive method) using 7 microarray and 4 RNASeq datasets. For each approach, we construct a disease signature using 3 datasets and then classify patients from 8 remaining datasets. MIA outperforms all existing approaches in terms of both the highest sensitivity and specificity by accurately distinguishing symptomatic patients from healthy controls.

Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
Brian Marks
Undergraduate Student