Motivation: Recent advances in biomedical research have made massive amount of transcriptomic data available in public repositories from different sources. Due to the heterogeneity present in the individual experiments, identifying reproducible biomarkers for a given disease from multiple independent studies has become a major challenge. The widely used meta-analysis approaches, such as Fisher’s method, Stouffer’s method, minP and maxP, have at least two major limitations: i) they are sensitive to outliers, and ii) they perform only one statistical test for each individual study, and hence do not fully utilize the potential sample size to gain statistical power.
Results: Here we propose GSMA, an intra- and inter-level meta-analysis framework that overcomes these limitations and provides a gene signature that is reliable and reproducible across multiple independent studies of a given disease. The approach provides a comprehensive global signature that can be used to understand the underlying biological phenomena, and a smaller test signature that can be used to classify future samples of a given disease. We demonstrate the utility of the framework by constructing disease signatures for influenza and Alzheimer’s disease using 9 data sets including 1,108 individuals. These signatures are then validated on 12 independent data sets including 912 individuals. The results indicate that the proposed approach performs better than the majority of the existing meta-analysis approaches in terms of both sensitivity as well as specificity. The proposed signatures could be further used in diagnosis, prognosis and identification of therapeutic targets.
Availability: For the review purpose, source code is currently available at https://bit.ly/2AXg3qS. It will be available as a package in Bioconductor soon.