Gerald Britton

Recursive CTEs and Foreign Key References in SQL Server

May 16, 2018 by

Introduction

Foreign key constraints are a powerful mechanism for preserving referential integrity in a database. They can also represent a challenge when doing bulk table loads, since you need to find a “base” table to start with – that is, a table that has no foreign key constraints defined. Let’s label tables like this as level 0, or ground level if you like. Once that is loaded, you can begin to load other tables that have foreign key references to the base table. We can label those tables level 1, and so on. If you start with table data that already has referentially integrity and load tables by their level numbers — level 0, level 1, level 2 and so on – the load should proceed without problems. Let’s look at a simple example:

This set of three tables are at levels 0, 1, and 2, respectively, since “base” has no FK references, “facts” refers to “base” and “morefacts” has an FK referring to “facts”. Now, imagine that you have new data for all three tables, in the form of INSERT statements:

Now, you know that you can’t insert them that way, or you’ll get an error message like this one:

The INSERT statement conflicted with the FOREIGN KEY constraint “FK__morefacts__facts”

You need to do these in reverse order to preserve referential integrity. This is easy with this little example since we are in total control. Now, imagine that you were asked to load up a database with lots of foreign key relationships, but you didn’t know the levels of any of the tables. How would you proceed? There are a few different ways to tackle the problem and in this article I’m going to leverage the power of recursive Common Table Expressions, or CTEs, to do it.

The System Catalog View sys.foreign_keys

SQL Server now provides almost 300 system catalog views that are useful for all sorts of metadata operations. If I include the dynamic management views the total is almost 500!. sys.foreign_keys, as the name implies, shows you the foreign key relationships for a database. I can combine this view with a recursive CTE to dig out the foreign key relationships in the database. A full discussion of recursive CTEs is outside the scope of this article. See the references section for more detail on how they work. For now, just keep in mind that a recursive CTE has two parts, just like a mathematical recurrence:

  1. A base case
  2. A recursive case that builds on the base case

For our example, a query to get the base case would look like this:

Run this on any database you have access to and observe the results. On my test database I see:

The recursive case builds on this by finding tables referenced by the base case:

This query is almost the same as the one above except that it joins with the base case, matching the parent object id, which is the table containing the FK reference. To the base case referenced object id. In this way we can get the tables referring to the base case tables and continue until there are no more, since this is recursive!

Putting the two queries – the base case and the recursive case – together in a recursive CTE yields this query:

There are two commented lines that I’ll come back to in a moment. Running this on my test database I get:

Here, the column “Depth” represents the level a table is with respect to one referring to it. In this case, the table “morefacts” refers to the table “facts”. So “morefacts” is at ground level (Depth=0) and “facts” is in the first basement (Depth = -1). With such a report I know I need to load the deepest levels first, then those above them and so on until I reach ground level.

Now, let’s look at the two commented lines. The first one, if uncommented, lets me just look at the references from a specific table:

Running that produces a smaller report:

No surprise there. The second commented line is trickier. On my SQL Server instance I also have the sample database WideWorldImporters installed. Let’s try the query with both lines commented on that database. I get an error:

Msg 530, Level 16, State 1, Line 1

The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

The problem is that this database contains a table that is self-referential:

This comes from the fact that the People table contains a hierarchy. Hierarchies can be used to show people in a reporting structure, where an employee points to their manager in the same table. Here it actually looks like a mistake! The foreign key says, “Be sure that a person in the table really is also in the table”. Of course that is always true! For our discussion though, it means that we are chasing our own tail during the recursive part of the query. If I uncomment the second comment:

The problem will disappear:

I just pasted part of the output. It is actually quite a bit longer, which you can verify for yourself.

Bottom up?

The query we’ve been using takes a top-down approach. The problem we had with the People table suggests another approach. Can we find tables that refer to it? We can! We’ll use a bottom up approach. Actually the query changes very little:

I’ve highlighted the changes. Basically, we start with tables that refer to Application. People and work up from there. This query yields the desired result for the WideWorldImporters database, though they are too big to post here (325 lines). The Level goes all the way up to 10, indicating a little of the complexity of the data model used here.

Summary

This brief excursion into recursive CTEs applied to system views shows how easy it can be to tease out the relationships between objects in a SQL Server database. If you’re new to common table expressions, especially the recursive variant, this gives you a simple example to understand how they work. Don’t use them for everything, however! Sometimes developers are tempted to use recursive CTEs in place of cursors or while loops, thinking that there will be some performance advantage. Usually, those hopes are dashed! Internally, recursive CTEs are processed “Row By Agonizing Row”, or RBAR, a term created by Jeff Moden, a veritable super-DBA in the Microsoft SQL Server space.

If you’re new to system catalog views, let this serve as the briefest of introductions to a large topic!

References


Gerald Britton

Gerald Britton

Gerald Britton is a Senior SQL Server Solution Designer, Author, Software Developer, Teacher and a Microsoft Data Platform MVP. He has many years of experience in the IT industry in various roles.

Gerald specializes in solving SQL Server query performance problems especially as they relate to Business Intelligence solutions. He is also a co-author of the eBook "Getting Started With Python" and an avid Python developer, Teacher, and Pluralsight author.

You can find him on LinkedIn, on Twitter at twitter.com/GeraldBritton or @GeraldBritton, and on Pluralsight

View all posts by Gerald Britton
Gerald Britton
65 Views