CSIE: cancer subtyping via inference and ensemble

Abstract

While multi-omics integration is the gold standard for precision oncology, its clinical utility is severely hampered by the incomplete data problem, where cost and technical barriers often leave researchers with only single-omics profiles. Our manuscript introduces CSIE (cancer subtyping via inference and ensemble), a framework that bridges this gap by using a novel transformer-based inference module which incorporates systems-level knowledge to accurately infer missing omics layers from gene expression data. Furthermore, CSIE employs an ensemble clustering module that simultaneously integrates multi-omics data via different similarity metrics and clustering algorithms to capture molecular patterns of cancer subtypes. The robustness of CSIE is validated through extensive benchmarking against 12 state-of-the-art methods across 66 cancer datasets with over 15 000 patients and 22 diverse data modalities/platforms. Our results demonstrate that CSIE significantly outperforms existing tools, particularly in scenarios with incomplete data. This work shifts the paradigm from requiring exhaustive data collection to leveraging biological intelligence for data completion, offering a scalable solution for high-resolution cancer subtyping in real-world clinical settings. All source code of CSIE and scripts for regenerating results reported in this article are available at https://github.com/tinnlab/CSIE.

Publication
Briefings in Bioinformatics
Avatar
Dao Tran
PhD Student