Esat Erkec

How to implement a graph database in SQL Server 2017

December 19, 2017 by

Introduction

Graph database

A graph database is a type of database whose concept is based on nodes and edges.

Graph databases are based on graph theory (a graph is a diagram of points and lines connected to the points). Nodes represent data or entity and edges represent connections between nodes. Edges own properties that can be related to nodes. This capability allows us to show more complex and deep interactions between our data. Now, to explain this interaction we will show it in a simple diagram

The diagram above shows the basic model of the graph database concept.

The nodes are Andera, Bob and Camila and Follows (edges) provide connections between nodes. This database model cannot be treated as an alternative to a relational database model but faced with some specific problems the graph database model can be alternative and effective.

If you look at the diagram closely, maybe you can design this data model in a relational database by joins but imagine that if you have a lot of nodes and edges then how many joins will you need? And, another consideration could be how this design would perform? For this reason, when handling some business problems we need a graph database.

In the context of social media, for example, there are a lot of social actions like connect, follow etc. and each social action creates a mark. When we combine these marks, it looks like a spider’s web. The graph database model is ideally suited to store this type of data.

SQL Server 2017 and graph database

Microsoft announced graph database in SQL Server 2017. This feature allows us to create graph data models. SQL Server 2017 and graph database architecture contains two types of tables. They include the node table and edge table.

We can demonstrate it with a diagram.

NODE TABLE: Node table defines entity in a graph model.

$NODE_ID: It is an important column in a node table. When a node table is created, this calculated field is automatically generated by the SQL engine. This field describes a given node uniquely. After we create the objects, we will look at the objects through the object explorer in Management Studio. You will see a new folder named as Graph Tables. This folder contains all graph tables.

SQL Server adds a GUID to the end of $NODE_ID column’s name but we can also use this column without GUID extension (pseudo-column). If we do not create a unique constraint or index on $NODE_ID column, the SQL engine automatically creates unique, non-clustered indexes when the node table is created. This guarantees the uniqueness of $NODE_ID column.

EDGE TABLE: An edge table defines connection between node table entities

When we create an edge table, the SQL engine creates three implicit columns.

$EDGE_ID: It defines unique edge in edge table. For this reason, the SQL engine automatically creates a unique non clustered index

$FROM_ID: It defines the starting point for the entity of edge.

$TO_ID: It defines the end point for the entity of edge.

Now we will define an edge connection to the edge table.

The insert statement for the step when Bob follows Andera is

The insert statement for the step when Camila follows Andera is

The insert statement for the step when Camila follows Bob is

The following query explains Camila’s connections

In this query we saw some new T-SQL syntax “MATCH”, “-“, “->”.

“-“sign represents $FROM_ID and “->” sign represents $TO_ID field on edge table.

Have a look at the execution plan of this query

The screenshot is from ApexSQL Plan, a tool to view and analyze SQL Server query execution plans

There is a table scan in the execution plan of this query because there isn’t any index in the $FROM_ID and $TO_ID columns in FlowInfo edge table. We will create a unique non clustered index in these columns and we will look at the execution plan again

After adding an index and then avoiding table scans from the key lookup and nested loop we can see its performance. In case of having a heavy read request on a graph model, we have to add an index.

Now, we will create a little bit more complex example using the graph database model. Imagine that we have an online book application and customers. Customers can read books online and can connect with other customers. And while using the application we want to show a pop-up to our customers that if your connections like this author or authors

After this, we will create our graph database model objects

In the next step, edge tables will be created and connections between nodes are defined

This query shows the book with its author’s name in the adjacent column to it

This query gives an answer to our question mentioned above

Graph database model or relational database model

The major difference between these two database models is how they define relations within your data. In relational database model we can create relationships with primary or foreign keys but, on the other hand, in a graph database model we define connections (edges) and we can add properties to these connections (edges). Let us differentiate it with an example.

Imagine that our customers rank books. We want to store this rank point of books. If we create this model in relational model it will look like:

Now, as in case of graph model

Now we will create node and edge tables. The key point here is that we will create our edge table with a rank property.

The node table is defined as follows:

The edge table with rank property is defined as follows

Firstly, we will insert data in the edge table with connections and rank amount:

“Brian” gives “Lord of the Flies” “5” points.

After this, we will generate some dummy data

These queries will show us which customer likes which book and which book is highly ranked

Another major difference is that the graph database model happens to give better performance in heavy connections. Like, in some business problems, the application needs a complex hierarchy. A graph database would be a compelling option in that case because graph database offers better performance and simple data modeling.

There is the possibility of finding other differences as well but generally these two topics are discussed.

Conclusions

In this article, we discussed graph database and SQL Server 2017 graph database features. SQL Server graph database is a fantastic feature. We can implement both graph database and relational database models in the same database engine. This hybrid architecture allows us to use SQL Server engine capabilities with a graph database. T-SQL syntax support graph database queries. Graph database does some limitations notwithstanding these limitations there are many exceptional features in SQL Server 2017, that make it a compelling technology to consider.


Esat Erkec
Latest posts by Esat Erkec (see all)
168 Views