Mining Interesting Subnetworks from Graphs with Node Attributes
View/ Open
Abstract
A lot of complex data in many scientific domains such as social networks, computational biology and internet of things (IoT) is represented using graphs. With the global expansion of internet, social networks had an explosive growth with billions of users in FaceBook. Similarly research in Bio-informatics generated massive amounts of genomic data (protein protein interaction networks) from several high throughput techniques. Due to the large amount of data involved, researchers have turned to data mining techniques to discover meaningful and relevant information from large graphs. One of the most intriguing questions in graphs representing complex data is to find communities or clusters. The members in a clusters have high density of edges to other members within the cluster while very low edges to members outside of the cluster. Real world graphs often have additional attribute data characterizing either the nodes or edges of a graph, such as age or interests of a person in a social network. Recent research has combined the problem of community detection with subspace similarity over attribute data. For example, in the context of social networks, we might be interested in finding groups of friends who are of similar age and share common interests. The use of attribute data in finding clusters is shown to be effective in many application areas such as targeted advertising in social network or detecting protein complexes in protein protein interaction networks which might be indicative of diseases such as cancer. In this dissertation, we propose multiple algorithms for mining communities with similarity in attributes from node-attributed graphs. Experiments on real world datasets show that the proposed approach is effective in mining meaningful clusters.