STRUCTURAL PATTERN DISCOVERY IN DYNAMIC GRAPHS
MetadataShow full item record
Structural pattern discovery problems, which contain graph isomorphism as a sub-problem, are computationally very hard. The runtime of complete algorithms for discovering structural patterns such as frequent subgraphs and highly compressing subgraphs are exponential with respect to the size of the graph. The inefficiency of these complete algorithms is exacerbated by the increase in size of dynamic graphs. Thus in this dissertation, we study efficient methods of mining such structural patterns in large dynamic labeled graphs. First we look at parallel techniques, where we can accumulate the graph data into a single large graph that we can then partition into parts and process in a parallel manner. We propose SP-Subdue-2 that uses a MPI cluster and a data-parallel approach to find highly compressing subgraphs in a large graph. We evaluate our approach using a large artificial graph and a Nuclear Smuggling dataset. While we outperform previous approaches to the same problem, the method cannot handle data sources which produce an infinite stream of changes. Our objective is to design an algorithm that can continuously process an infinitely changing graph. Hence, we propose StreamFSM, which takes batches of node and edge additions to a graph and continuously finds and reports the frequent subgraphs at the end of each batch. StreamFSM can be easily tuned via its parameters to trade off efficiency and accuracy in its results. StreamFSM outperforms its static counterparts. While StreamFSM finds frequent subgraphs in the entire graph, it requires the user to specify a frequency threshold. This might not always be possible. Hence we propose StreamCompress which discovers the top-K highly compressing subgraphs in a graph stream using the set covering metric. We show that StreamCompress' runtime scales similar to StreamFSM and that it is accurate in discovering known highly compressing subgraphs. Both StreamFSM and StreamCompress, however, require the entire graph to be stored in memory. We address this issue by using a time based fixed size sliding window on the graph stream. We see a significant improvement in the performance of StreamFSM when using this fixed size window.