Haroon Ashraf
Fact and Dimensions in a Data Warehouse

SQL Unit Testing Data Warehouse Extracts with tSQLt

January 7, 2020 by

This article talks about basic concepts of SQL unit testing from a Data Warehouse point of view using tSQLt, a highly productive and acclaimed SQL unit testing framework.

This article is both for beginners to learn the basics and experienced Data Warehouse professionals to review their Data Warehouse unit testing approach in favor of the tSQLt framework.

Additionally, the readers of this article are going to learn the basic tips of applying tSQLt in the SQL unit testing of Data Warehouse Extracts alongside a quick overview of Data Warehouse concepts.

About Data Warehouse

It is very important to first get familiar with Data Warehouse concepts if you are not already familiar.

Bear in mind that Data Warehouse is a massive subject that cannot be covered in one or two articles. However, it is not difficult to get an overview of Data Warehouse keeping in mind the scope of this article.

What is Data Warehouse?

Data Warehouse is a centralized repository or storage of data that is highly optimized for reporting and analysis purposes.

What does Data Warehouse do?

Data Warehouse helps businesses to understand their weaknesses and strengths by providing deep insights into their data by using special storage, architecture and processes.

What else Data Warehouse does?

Data Warehouse also helps in making quick decisions based on the facts provided by it, thereby making business more productive and less susceptible.

Types of Data Warehouse

A traditional Data Warehouse can be built in the form of cubes or data models that can be accessed for reporting and real-time analysis by the business users.

Dimension and facts

A dimension is anything that is of interest to the business such as Product, Customer or Supplier, whereas FACT, as the name indicates, contains facts and figures alongside keys to link with dimensions.

The architecture of Data warehouse

A typical Data Warehouse either follows a star-like schema where a FACT is surrounded by dimensions (like a star) or a snowflake-like schema where a dimension can be directly linked with Fact or can also be linked with another dimension. However, their further details are beyond the scope of this article:

Fact and Dimensions in a Data Warehouse

Data Warehouse Staging Extracts

Let us now talk about Data Warehouse Extracts, particularly staging extracts.

What are Data Warehouse Staging Extracts?

Data Warehouse staging extract is basically a process that copies data from source to Data Warehouse Staging Area where it becomes available for further processing into dimensions and fact(s).

What is Staging Area?

Staging from a Data Warehouse perspective is a one-to-one mapping of data from source to destination where data from multiple sources is stored to be processed further.

Example of Staging

For example, if you have a table named Customer, which consists of only two columns CustomerId and Name, then this table must also be present in staging with the same columns to be populated from the source or multiple sources.

Benefits of Staging Area

The main benefit of the Staging Area is the independence from the source once the data is extracted, which means the Data Warehouse further processing does not need to refer to the original source as long as its staging extract has been captured.

Importance of Staging Extract

The most important part of the initial phase of Data Warehouse processing is the staging extract because this is the entrance to the Data Warehouse database, which then serves the Data Warehouse business intelligence needs.

Data Warehouse Staging Extract

SQL Unit Testing Staging Extract

As mentioned earlier, staging extract(s) is the most important starting point as far as Data Warehouse workflows are concerned, so we should be then somehow SQL unit testing these extracts.

However, it is also crucial to identify what needs to be unit tested and what can be excluded from SQL Unit Testing.

We are first going to clarify the following two things in the context of SQL Unit Testing Data Warehouse Extracts:

  1. What is that we are going to write our SQL Unit Tests against?
  2. What SQL Unit Testing tool or framework is going to be used?

Staging Extract and Source Script

The process of running staging extract (copying data from source to destination) typically depends on the source script, which runs against the source database to extract data, which is then copied over to the staging area.

A traditional source script selects some or all the columns of a table from source (database).

So, the source script is actually the potential object against which we are going to write and run our SQL Unit Tests.

TSQLt – SQL unit testing framework

The most suitable testing framework for SQL Unit Testing Data Warehouse Staging Extract is tSQLt, which is, by default, purpose-built and feature-rich.

Please read the following articles to get started with tSQLt if you are not already familiar with it:

Replicating SQL Unit Testing scenario

As already mentioned in this article that a Data Warehouse extract runs between source and Data Warehouse database, then it is easy to guess that at least two sample databases are required to implement the walkthrough to get a better understanding.

Setup source sample database

Let us first setup source sample database called SQLDevArticlesV4 by running the following T-SQL script against the master database:

Setup Data Warehouse sample database

Once the source database sample is set up the next step is to create the Data Warehouse database SQLDevArticlesDW sample with staging area (schema) and tables similar to the ones in source by running the following T-SQL script:

Quick check

Have a quick look at both source and Data Warehouse sample (database) that you have just created:

Source and Data Warehouse Sample Databases

What is next

We have both samples ready to be used, so could you please guess what next step is?

Well, the answer is to think about the potential object, which helps us to get the source data to be copied over to Data Warehouse.

Those who are already familiar with Data Warehouse practices are well aware that we actually need here Integration Services Packages also called SSIS Packages to perform this Data Warehouse act, but we are more focused to author (build) an object which sources the required data which can then be initiated or activated via SSIS Packages and for which SQL Unit Testing can be applied.

Analyze source script

Analyze the source script for author extract, which is simply selecting all the columns from author tables from the source database as follows:

The output is as follows:

Source Script Output

Source script mapping options

Please remember you can map this simple source script to any one of the following objects in the database:

  • SQL view
  • Stored procedure

Creating GetAuthorExtract object from source script

We are going to create a stored procedure called GetAuthorExtract (database) in the source database as follows:

Test run the procedure

Go for a quick test run of the stored procedure as follows:

The results should match the output shown below:

Test Run the Procedure

So now, you are ready to write a tSQLt Unit Test against this procedure because this is actually the object of interest that takes data from the source and loads it into the destination and we would like to make sure that it must pass SQL unit test to prove it serves the purpose.

Setup tSQLt framework

Run tSQLt setup (by running tSQLt.class.sql script you get when you download it) to install the framework in the same source database (SQLDevArticlesV4) where GetAuthorExtract stored procedure was created.

Please read the article Conventional SQL Unit Testing with tSQLt in Simple Words to install tSQLt if you are not familiar with tSQLt installation.

However, I strongly recommend to please use a test-driven database development approach in your day to day professional SQL unit testing work.

Quick check

In Object Explorer, navigate to SQLDevArticlesV4 | Tables node to view the tSQLt installed objects:

tSQLt Installed

Create unit test class

You have to create a new class (schema) for SQL unit testing with tSQLt as follows:

Create a unit test to check GetAuthorExtract works

In order to make sure that the procedure which extracts data from source works, we have to create a SQL Unit Test, which helps us to determine the reliability of the procedure prior to running it to copy the data from source to Data Warehouse.

Create a SQL Unit Test to check GetAuthorExtract object as follows:

Run SQL Unit Test to check GetAuthorExtract

Finally, run the SQL unit test by the following command:

If you have followed all the steps of the walkthrough then we are expecting this SQL Unit Test to pass as shown below:

SQL Unit Test Passed

Congratulations! You have just learned how to write and run SQL Unit Test against Data Warehouse staging extract and this is also applicable to any other similar scenario where data needs to be extracted from source to destination.

Haroon Ashraf
SQL unit testing, Testing

About Haroon Ashraf

Haroon’s deep interest in logic and reasoning at an early age of his academic career paved his path to become a data professional. He holds BSc and MSc Degrees in Computer Science and also received the OPF merit award. He began his professional life as a computer programmer more than a decade ago, working on his first data venture to migrate and rewrite a public sector database driven examination system from IBM AS400 (DB2) to SQL Server 2000 using VB 6.0 and Classic ASP along with developing reports and archiving many years of data. His work and interest revolves around Database-Centric Architectures and his expertise include database and reports design, development, testing, implementation and migration along with Database Life Cycle Management (DLM). He has also received passing grade to earn DevOps for Databases verified certificate, an area in which he finds particular interest and potential. View all posts by Haroon Ashraf

941 Views