Scalable Algorithms for Mining Maximal Quasi Frequent Subnetworks
Abstract
Frequent graph mining has received considerable attention from researchers. Existing algorithms for frequent subgraph mining do not scale for large networks, and take hours to finish. Mining multiple gene coexpressions networks allows for identifying context-specific modules. Frequent subnetworks represent essential biological modules. In this thesis, we propose two algorithms for mining frequent subgraphs. In the first algorithm, we propose a parallel algorithm for mining maximal frequent subgraphs from gene coexpression networks. Despite the algorithm’s parallelization, it takes much time and it does not allow relaxation. This inspired us to develop a second algorithm that solves those problems. In the second algorithm, we propose a greedy approach for mining approximate frequent subgraphs. Experiments on real tissue-specific RNA-seq expression networks and synthetic data demonstrate the effectiveness of the proposed algorithms. Moreover, biological enrichment analysis shows that the reported patterns are biologically relevant and enriched with known biological processes and KEGG pathways.