Haroon Ashraf
Typical Data Warehouse Business Intelligence Solution Architecture

The Concept of Test-Driven Data Warehouse Development (TDWD) with tSQLt

May 7, 2020 by

This is a conceptual article consisting of two parts with enough supported material for any data professional or enthusiast with databases or data warehouse development background willing to go a step ahead by using an industry-recognized SQL unit testing framework called tSQLt.

This article also highlights the importance of a not so well known innovative development strategy called Proof of Concept used by specialists (including programmers) to test a new idea or concept by developing the model rather than the actual solution.

Additionally, the readers of this article are going to get familiar with the method of mapping existing database development practices to a basic data warehouse business intelligence solution using tSQLt.

The Concept (Proof of Concept)

Let us begin our journey with the concept or the proof of concept.

What is proof of concept

A proof of concept is like a pilot project, which is all about developing (testing) an idea or concept to check its feasibility or potential to use it as a base to begin actual work if all goes well.

Example 1

One of the most interesting examples of proof of concept is a concept car that is specifically designed to test new technology or design, which may or may not become part of commercial manufacturing to be available for the general public.

Example 2

In the software world, proof of concept is whether the new idea of developing your software is workable or not.

Example 3

In the database or data warehouse world, we can say the proof of concept is to check if the newly proposed way of developing a database or data warehouse is feasible or not.

Example 4

The proof of concept can also be whether the new database development strategy works or test the new way of using existing database development and testing tools (such as tSQLt) to see if this is more productive for the development and testing team and business or not.

Proof of Concept strategies

There are a number of ways called strategies that can be used as a basis to develop or test the proof of concept. Let us have a look at a few of them.

Reverse Engineering Strategy

This strategy is based on reverse engineering of an existing process to improve it further.

Mapping Strategy

This is the strategy used in most common scenarios where the developers map the existing process to a newly proposed process in proof of concept.

Mapping Strategy Example 1

One of the examples of mapping strategies, in general, is, think of some database development scenarios when we map requirements to the stored procedures such that a business requirement received by a development team is generally met by writing a stored procedure to be tested by SQL unit testing framework tSQLt which then satisfies the business requirement when it runs successfully.

Mapping Strategy Example 2

Another good example of mapping strategy is in the context of Data Warehouse business intelligence solutions when business requirements are mapped to data models, which are then exposed to internal and external business users to be used for analysis and reporting.

Point of Interest

An interesting point to note is that we are using a highly productive and commercially in use SQL unit testing framework tSQLt to work on our conceptual model, thereby leaving enough room for implementation, as mentioned in the introductory passage of this article.

Test-Driven Data Warehouse Development (TDWD)

Let us now focus on the test-driven data warehouse development with tSQLt proof of concept by first designing a toolkit also serving as pre-requisites.

Test-driven data warehouse development Tool Kit

The following are required to work on test-driven data warehouse development using tSQLt proof of concept:

  1. SQL Database and Data Warehouse concepts and understanding

  2. Strong T-SQL skills
  3. SQL unit testing using tSQLt skills

  4. Test-driven database development concepts and implementation

Mission Statement

Evaluating Test-driven data warehouse development with tSQLt proof of concept is based on Test-driven database development methodology.

Scope

It is very important to define the scope of the work, and in our case, the scope of the proof of concept is limited to the following essential processes of a traditional data warehouse:

  1. Staging extracts
  2. Transform-Loads

They can also be represented by ETL workflows where E stands for extract(s), and TL stands for Transform-Load(s).

If you would like to know more staging extracts, please refer to SQL Unit Testing Data Warehouse Extracts with tSQLt.

In simple words, a typical data warehouse environment from the data point of view mainly consists of two phases:

  1. Staging environment
  2. BI (business intelligence) environment

In the first phase, data is copied from source to staging environment and then from the staging environment to the BI environment.

Typical Data Warehouse Business Intelligence Solution Architecture

A polite reminder for the beginners to please at this point not to confuse ETL (Extract Transform Load) with ELT (Extract Load Transform) since they are both data movement strategies, and both can work equally with data warehouse solution.

Mapping test-driven database development

We are going to use test-driven database development as a reference to work on test-driven data warehouse development with tSQLt proof of concept.

Set up a sandbox (environment)

The first thing in order to start working on the proof of concept is to set up your sandbox environment, which is going to serve as your mini-research lab.

Set up sample source database

We need to set up a sample database called SQLDevArticlesV5 which is going to serve as a source by running the following T-SQL script against any dev environment:

Please run the following script to check your source sample database:

Authors tabular data

Set up a sample data warehouse

Create a sample database warehouse database called SQLArticlesV5DW as follows:

Add tSQLt to the source database and data warehouse

The next thing is to download and install the tSQLt (SQL unit testing framework) for both source and data warehouse databases from the tsqlt.org.

Please follow the instructions in the article Conventional SQL Unit Testing with tSQLt in Simple Words to install tSQLt if you have not installed it before.

The following output upon running the tSQLt.class.sql script indicates that tSQLt has been installed successfully onto your desired database(s):

tSQLt installed

The sandbox environment to begin the proof of concept with tSQLt is ready to be used if all the above requirements are met.

Multiphase Requirements

The requirements from the business point of view can be as follows:

As a business user, I would like to view strictly tested authors report preferably from a system optimized for reporting and analysis so that further reports can be added to it

Your business analyst receives the requirements and modifies them for you as follows:

The clients are in need of a data warehouse which must be thoroughly tested to be eventually used for multiple reports and analysis beginning with authors report.

You have had a quick chat with your team and come up with a plan as follows:

We must try to adopt a test-driven approach to meet these requirements, which means a proof of concept to see test-driven data warehouse development in operation if approved can be used to meet these specific business requirements considering the available time and effort.

The Proof of Concept journey

At this point, if you are not already familiar with, then please review the test-driven database development concept by going through the article Fundamentals of Test-Driven Database Development (TDDD) with tSQLt unit testing.

Mapping potential object TDDD

Just like test-driven database development where we assume a database object is going to meet the requirements and testing that object with a SQL unit testing framework such as tSQLt can help us even create that object properly since it has to pass to prove that it can meet the requirements.

However, the data warehouse is a bit more complex as compared to a database where we can start with a potential object by creating a SQL unit test to check its functionality, which has to meet the specification, and this effort ultimately helps us to build the requirement specific object without over-engineering.

Yes, we can think of an object called AuthorsReport as in the case of test-driven database development, but then how does it fit in a data warehouse scenario and establishing the proof of concept is the most important question to move forward.

Choosing AuthorsReport Object

There is no problem with choosing the same object in test-driven data warehouse development as we would choose in test-driven database development; however, the logic required to build this object has to go through a couple of more steps.

Create SQL unit test Class

Once tSQLt is installed please create a SQL unit test class called AuthorReportTests in data warehouse database as follows:

Create the first SQL unit test

Create the first SQL unit test for the data warehouse database SQLDevArticlesV5DW to check if an object exists as follows:

Run SQL unit test(s)

Now please run all SQL unit tests by issuing the following tSQLt command:

As per test-driven database development principles that test must fail since the object is not yet created as follows:

Test to check if object exists fails

Create Object (AuthorsReport)

Now we are going to create the desired object in the same data warehouse database SQLDevArticlesV5DW so the basic test to check its presence must pass:

Rerun SQL unit test(s)

Please run the SQL unit tests for the sample data warehouse:

The test output should be as follows:

Test to check if object exists passed

So far, so good, the proof of concept with tSQLt is on the run, but it is a slightly long way before we see it working more robustly.

Conclusion

Congratulations, you have just gone through the halfway, and things seem to be fine, but please be prepared to do some serious work in the next part of this article to get the things on track since this trial must show that it is worth the effort. Stay tuned

Haroon Ashraf
SQL unit testing, Testing

About Haroon Ashraf

Haroon’s deep interest in logic and reasoning at an early age of his academic career paved his path to become a data professional. He holds BSc and MSc Degrees in Computer Science and also received the OPF merit award. He began his professional life as a computer programmer more than a decade ago, working on his first data venture to migrate and rewrite a public sector database driven examination system from IBM AS400 (DB2) to SQL Server 2000 using VB 6.0 and Classic ASP along with developing reports and archiving many years of data. His work and interest revolves around Database-Centric Architectures and his expertise include database and reports design, development, testing, implementation and migration along with Database Life Cycle Management (DLM). He has also received passing grade to earn DevOps for Databases verified certificate, an area in which he finds particular interest and potential. View all posts by Haroon Ashraf

539 Views