Supervised Learning in Dynamic Streaming Graphs
With the emergence of networked data, graph classification has received considerable interest during the past years. Most approaches to graph classification focus on designing effective kernels to compute similarities for static graphs. However, they become computationally intractable in terms of time and space when a graph is presented in an incremental fashion with continuous updates, i.e., insertions of nodes and edges. In this thesis, we examine the problem of classification in large-scale and incrementally changing graphs. We focus on two classification scenarios: the graph transaction scenario, and the one large graph scenario. We propose a unified framework to study this problem. The framework consists of three components: a subgraph extraction process, an online version of an existing fast graph kernel, and two kernel-based classifiers (SVM and Perceptron). To classify nodes in a single large graph, we design an entropy-based subgraph extraction strategy to select informative neighbor nodes and discard those with less discriminative power, to facilitate an effective classification process. By retaining the support vectors from each learning step, the classification models built using the kernel-based classifiers can be incrementally updated whenever new changes are made to the subject graph. We demonstrate the advantages of our learning techniques by conducting an empirical evaluation on several large-scale real-world graph datasets. On the other hand, detecting concept drift in data streams has been widely studied in the data mining community. Conventional drift detection methods use classifiers' outputs (e.g., classification accuracy, error rate) as indicators to signal concept changes. As a result, their performance greatly depends on the chosen classifiers. However, there is little work on addressing concept drift in graph-structured data. We present an entropy-based method to effectively detect concept drift in graph streams. Contrary to many related works, we investigate the intrinsic properties of data (i.e., subgraph distribution w.r.t. category labels), instead of monitoring classification outputs. Our approach is combined with the proposed incremental graph learning techniques and tested on synthetic and real-world graph data streams. The experimental results demonstrate the advantage of our method in detecting concept drift as well as improving classification performance.
Showing items related by title, author, creator and subject.
Choudhury, Sutanay (2014)Subgraph search is the problem of searching a data graph for the occurrences of another graph, typically referred to as the query or pattern graph. This thesis is dedicated to studying a specific class of subgraph search, ...
Wu, Changjun (2011)Developing high performance computing solutions for modern day biological problems present a unique set of challenges. The field is experiencing a data revolution due to a rapid introduction of several disruptive experimental ...
Ray, Abhik (2015)Structural pattern discovery problems, which contain graph isomorphism as a sub-problem, are computationally very hard. The runtime of complete algorithms for discovering structural patterns such as frequent subgraphs and ...