Querying graph databases
Real life data can often be modeled as graphs, in which nodes represent objects and edges indicate their relationships. Large graph datasets are common in many emerging applications. To fully exploit the wealth of information encoded in graphs, systems for managing and analyzing graph data are critical.
To address the need of complex analysis on graph data, this thesis presents a graph querying toolkit, called Periscope/GQ. This toolkit is built on top of a commodity RDBMS. It provides a uniform schema for storing graphs and supports various graph query operations. Users can easily combine several operations to perform complex analysis on graphs.
The key feature of Periscope/GQ is the support of various sophisticated graph query operations besides the simple ones like node/edge selection and path search. In particular, this thesis focuses on two classes of sophisticated queries: graph matching and graph summarization.
The database community has largely focus on exact graph matching problems. However, due to the noisy and incomplete nature of real graph datasets, approximate, rather than exact graph matching is required. This thesis presents a novel approximate graph matching technique, called SAGA. SAGA employs a flexible graph similarity model and utilizes an index-based matching algorithm to efficiently evaluate matching queries.
SAGA is effective and efficient for small query graphs (with tens of nodes and edges), but is expensive when applied to large query graphs (with hundreds to thousands of nodes and edges). To handle large query graphs, TALE is proposed. TALE employs a novel indexing technique, which achieves high pruning power and scales linearly with the database sizes. The matching algorithm utilizes the index to first match the important nodes in the query, and then extends them to produce large graph matches.
Graph summarization techniques are useful for understanding underlying characteristics of graphs. To summarize large graphs, this thesis introduces an aggregation method. This method produces summary graphs by grouping nodes based on user-selected node attributes and relationships. It further allows users to control the resolutions of summaries, and provides the “drill-down” and “roll-up” abilities to navigate through summaries with different resolutions.