Jefferson Elias

Managing untrusted foreign keys

March 17, 2016 by

Introduction

Intended audience

This article is intended for application developers and database administrators who plan to develop, deploy, and/or assess solutions for Microsoft SQL Server on a Microsoft Windows platform.

Typographical Conventions

Convention Meaning
Stylized Consolas Font Used for blocks of code, commands and script examples.
Text should be interpreted exactly as presented
Consolas Font Used for inline code, commands or examples.
Text should be interpreted exactly as presented
<italic font in brackets> Italic texts set in angle brackets denote a variable requiring substitution for a real value
Italic font Used to denote the title of a book, article, or other publication
Note Additional information or caveats

Overview

The subject of this article is untrusted foreign keys. Those three words together have a meaning but each word has to be well understood before that. That’s the reason why this article will start with a definitions section where we will define what is a key, what is a foreign key, and how a foreign key can be “untrusted”. Then, we will cover the way to discover and repair untrusted foreign keys for a given database. Finally, we will demonstrate in detail how foreign keys influence the execution of a query when it does not exist, when it exists and is “trusted” and when it exists and is “untrusted”.

Definitions

Before going any further, let’s set the base and answer the following questions: « What is a key? », « What is a primary key? » and finally « What is a foreign key? », which is the topic of this article.

Then, we can talk about what the difference between a trusted and an untrusted foreign key.

What is a key for a relational table?

A key is basically an identifier, something that can uniquely identify a record in a relational table. Keys may consist of a single attribute or multiple attributes in combination. Based on the design of a table, there can be more than just one key which identify a record. We will refer to them as candidate keys.

Example:

Let’s have a look at the following table. It represents a list of students with 4 columns: a random numeric identifier and the firstname, lastname and sex of the student.

StudentID FirstName LastName Sex
5345664 Adam Kent Male
8795165 Jefferson Elias Male

There are two candidate keys here: the StudentID, and the combination of FirstName and LastName columns.

Let’s add a table with the courses that are given in a particular school. The table will contain a unique identifier for the course, a title and a description.

CourseId Title CourseDescription
7897 Networks An introductive course on network topologies and standards
8975 Numerical Analysis Basics on numerical analysis.

There are, also, two candidate keys here: the CourseId column alone and the Title column alone.

What is a primary key for a relational table?

The primary key of a relational table is a key that is considered as the most appropriate one among all the acceptable keys for this table. As a key, it can either be a normal attribute that is guaranteed to be unique such as a unique random alpha-numeric identifier or it can be generated by the DBMS.

In our example, the most appropriate key to identify a given student record is the StudentID column.

What is a foreign key in a relational table?

A foreign key is a basically a reference to another table in a DBMS. It comprises all the key columns composing the primary key of that « foreign » or « parent » table. One who defines a foreign key creates at the same time a referential constraint that checks data provided as a foreign key reference an existing primary key value in the parent table. This constraint must be validated by the DBMS before accepting any kind of modification on the table. So, we can say that a foreign key ensures data integrity.

If we carry on with our example, let’s say we have table called StudentEnrollments which keeps track of the enrollments a student has made for current year and his final mark.

StudentId CourseId FinalMark
5345664 7897 16
8795165 8975 9
5345664 7897 18

Here we have two foreign keys:

  1. StudentID column refers to the primary key in the Students table presented previously ;
  2. CourseId column refers to the primary key in the StudentEnrollments table also presented previously.

As a final question for readers, what could possibly be the primary key we can use for this particular table?

Answer: the combination of both foreign keys, i.e. (StudentId,CourseId).

Note
Usually, foreign key references the primary key in a different table. It’s what have been shown in the examples. It’s also good to use it inside the same table. Such a design can be used to create hierarchical ordering in a table. You will find below an example of this: a table called department which has a unique identifier and can be member of another department.

DeptId Name HeadOfDept_id ParentDeptId
10 Finance 2 (null)
20 Accounting 23 10

Here the Finance department has no parent department but has a child department which is the Accounting.

Disabling a foreign key constraint or the way to untrust a foreign key

SQL Server allows you to temporarily disable any CHECK or FOREIGN KEY constraint. Common use cases for this particular features are in the area of copying or importing data to a table faster. The process will work for sure, but one who does this operation must ensure that data he’s copying won’t violate those constraints before activate it again.

Here is the syntax to be used to disable then enable a constraint:

Specifying WITH CHECK in a statement tells to SQL Server the user wants it to validate the constraint against every single row in the table, then, if successful, enable it.

In contrast, specifying WITH NOCHECK, which is the default for an existing constraint, means that the constraint is enabled but no validation has been made on it. Even if this mode is faster to run, it can lead to severe side effects on performance: SQL Server doesn’t trust the constraint as it has not validated it. We refer to such a foreign key as an « untrusted foreign key ». As a consequence, the query optimizer won’t use the constraint to do his job…

Human error can occur: one can forget to re-enable the constraint.

The proof of this will be given in the demo section.

Detect untrusted foreign keys and take the appropriate action

There is a simple way to detect whether a database contains one or more untrusted foreign keys. It’s simply by querying the sys.foreign_keys view in that database and check the is_not_trusted column. If this column is set to 1, it means the constraint is untrusted.

Here is a possible version for the query:

This method is simple but not practical because you must do it against every single database on a SQL Server instance. Instead, we developed a stored procedure that will return a dataset with all the untrusted foreign keys for every database plus the code to solve the issue.

The procedure is called [Administration].[GetUntrustedForeignKeys]. It creates a global temporary table you can then reuse in any other procedure. This table will contain the information described above.

You will find below its interface. You can specify a database name. If you don’t, it will run against every accessible database on the instance. There are also some parameters that influence the behavior of the procedure.

The table that is sent back is of the current form:

In addition, we defined another stored procedure built on the previous one that effectively runs the code defined in the DDL2Resolve column.

It’s called [Administration].[RunCheckUntrustedForeignKeys].

Foreign key and performance (Demo)

Let’s begin this section by creating a database for testing and some tables with primary keys. The tables will be the ones shown in the examples: Students, StudentEnrollments and Courses.

Database Creation Statements.


Creation of the Students table.


Creation of the Courses table.


Creation of the StudentEnrollments table.


As you may have noticed, we haven’t created the foreign key constraints yet. Let’s first get an overview of the way SQL Server handles a query without this foreign key.

You will find below a query that uses the StudentEnrollments table with the other two tables to get back a view of the name and firstname of a student following a course.

Here is its actual execution plan.

Here are statistics on I/O and time:

Let’s now create the foreign key references and see if there is a difference.

Create the Foreign Key constraint to the Courses table.


Create the Foreign Key constraint to the Students table.

The newer foreign keys change the execution plan of the query we used for test as you will see below. But first, the previous query:

And the resulting actual plan:

There is no access to the [dbo].[Courses] table introduced by the inner join due to foreign key addition. As a conclusion, foreign keys may reduce I/O, CPU and time for SELECT queries.

Now, we will deactivate the foreign keys and insert data into tables then reactivate the foreign key. We will see whether this implies a different behavior or not and check that the foreign keys are untrusted.

Are those foreign keys untrusted (using the query shown in Section 4).

The query:

Its result:

Conclusion: No, the foreign keys are not untrusted at the moment.

Let’s populate the tables.

Are the foreign keys untrusted now?

No, not yet as shown by the result of our simple detection query:

So, let’s re-enable foreign keys.

Are the foreign keys untrusted now?

Definitely, yes as shown by the result of our simple detection query:

Did it change something in the query plan for our test query? Let’s run it again and check its actual execution plan and statistics.

The actual query plan is the following one:

While the statistics are:

Now let’s run our procedure [Administration].[GetUntrustedForeignKeys].

Here is the T-SQL code to run the procedure for the test database alone:

Here is the result given by the procedure:

The DDL2Resolve value for the first untrusted foreign is:

Let’s run both commands as the number of untrusted foreign keys is limited.

Did we finally solve the problem? Let’s check that there is no untrusted foreign key anymore.

The simple audit query:

Its result:

The test query:

Its actual plan:

Its execution statistics:

As the demo comes to its end, there is a final action to do: cleanup.

Database Cleanup Statements.

Jefferson Elias
General database design, Maintenance

About Jefferson Elias

Living in Belgium, I obtained a master degree in Computer Sciences in 2011 at the University of Liege. I'm one of the rare guys out there who started to work as a DBA immediately after his graduation. So, I work at the university hospital of Liege since 2011. Initially involved in Oracle Database administration (which are still under my charge), I had the opportunity to learn and manage SQL Server instances in 2013. Since 2013, I've learned a lot about SQL Server in administration and development. I like the job of DBA because you need to have a general knowledge in every field of IT. That's the reason why I won't stop learning (and share) the products of my learnings. View all posts by Jefferson Elias

168 Views