QSEA for fuzzy subgraph querying of KEGG pathways


As biological pathway databases continually increase in size and availability, efficient tools and techniques to query these databases are needed to mine useful biological information. A plethora of existing techniques already allow for exact or approximate query matching. Despite initial success, powerful techniques used for XML and RDF query matching have yet to be sufficiently exploited for use in query matching in the bioinformatics domain. In this paper, we employ the transitive closure to focus on matching hierarchical queries, i.e., finding pathways or graphs that possess a query’s overall hierarchical structure. This approach allows for a greater latitude in fuzzy matching by focusing on the overall hierarchies of queries and graphs. Since hierarchies are only inherent in directed acyclic graphs, we have also developed a robust heuristic to heuristically solve the minimum feedback arc set problem. Analysis on 53 H. sapiens and 23 S. cerevisiae cyclic KEGG pathways have shown that our heuristic performs quite favorably. We have implemented the techniques in an easy to use GUI software QSEA (Query Structure Enrichment Analysis). Binaries are freely available at http://code.google.com/p/s-e-a/ for Windows and MAC.

BCB ‘12 Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine