Making sense of graph databases part 5 - Types of graph databases
Main article can be found here.
When doing my research on how graph databases work, I eventually discovered that there are at least two kinds of graph databases that differ on how they store and process graph data. I was surprised because I thought that there could not be many optimal ways of creating a graph database, because I was comparing them to relational database theory which has been standardized for quite some time. This simply means that graph databases are still a little bleeding edge. In this post I will present the different types of graph databases and say a little about how they work.
The two types I found were:
- Triple store based on RDF
- Native graph based on property graph model
- These databases were developed with aim to support the SPARQL query language from the Semantic Web movement.
- This allows of using features like ontologies and reasoning using OWL.
- Often, they store all data in a single relational table with columns subject, predicate, object corresponding to RDF triples. Sometimes, a graph column is also added to relate triples to a named graph.
- Virtuoso falls under this category.
- Triple stores often generates huge tables. To traverse a the graph the database has to do self-joins, which can kill performance.
- Since triple stores use joins to connect graph nodes, they have same disadvantages as relational databases on graph data. Therefore clever tuning and indexing is required, but the performance is still not comparable with native graph databases (I have not found a performance comparison, though).
- One approach (which is used by Virtuoso) is to use B+ trees together with bitmap based indexing on different combinations of the 4 columns - this can generate upto 16 indexes for the single table. To avoid that the indexes take more space than the actual data, duplicate rows are often reused, together with compression of index files. Compression works is extremely well on bitmap index! 1
- Called native graphs because they store data on disc in a graph oriented format that benefits graph query performance, instead of “simulating” a graph on a relational database.
- They often have proprietary query language, instead of a standard.
- Examples: Neo4j, Dex, HypergraphDB
- A recent (2015) survey of such databases can be found here