Haroon Ashraf
Identifying potential objects in test-driven data warehouse development with tSQLt (concept)

Using tSQLt for Test-Driven Data Warehouse Development (TDWD)

May 21, 2020 by

This is the second part of a conceptual article for data professionals and enthusiasts interested to work on a test-driven data warehouse development concept by using tSQLt an industry recognized SQL unit testing framework.

This article takes you to the next step in the test-driven data warehouse development proof of concept adding more value to the effort we have put into it when the initiative was taken.

Additionally, the readers of this article are going to experience the tremendous capacity of tSQLt when up against proof of concept rather than a traditional database testing scenario.

Next Step Prerequisites

Before we jump to the next step it is better to review the prerequisites so that following the steps ahead becomes easier.

It is highly recommended to go through the first part of the article The Concept of Test-Driven Data Warehouse Development (TDWD) with tSQLt to fully understand the concepts and follow the walkthroughs.

This article assumes the readers have already gone through the following steps:

  1. Knowhow of Proof of concept including the strategies
  2. Scope of the work has been defined and understood
  3. Mapping Strategy has been selected and understood
  4. Business requirements are fully understood
  5. Sandbox environment concept has been understood
  6. Sample database SQLDevArticlesV5 has been setup
  7. Sample data warehouse SQLArticlesV5DW has been setup
  8. tSQLt has been installed on both sample databases
  9. AthorsReport object has been chosen
  10. The business requirements are known and also the reason behind working on test-driven data warehouse development proof of concept
  11. tSQLt test to check AuthorsReport has been written and run successfully

Developer Sandbox Check

We set up a sandbox environment as we began this work (in the first part of the article) consisting of the above-mentioned sample databases with tSQLt installation on both sample databases followed by a SQL unit test to check if the desired object exists (in the data warehouse).

Let us rerun the very first SQL unit test to check if the object still exists by running the following script against the data warehouse SQLArticlesV5DW:

Please ensure that your test output should be the same as shown below:

Developer Sandbox Check

On another note, one would think do we really need to run the SQL unit test to see whether the object exists or not when we can check it under the related database folder in the Object Explorer as it is obvious from the above screenshot.

The answer is very simple that we are running the SQL unit test primarily to check if the object exists or not but also to check the following:

  1. tSQLt is installed and working
  2. There are no other SQL unit tests which we are not aware of
  3. Test-driven data warehouse development is in operation

So the sandbox (environment) check has been done, we are moving to the next step, but can you please guess what the next step is going to be?

AuthorsReport Object functionality breakdown

If I say that the next step is to check if the object is functioning properly or not then this makes sense. A passed SQL unit test confirms that the AuthorsReport object exists but this is not enough as we now need to check if the object shows the authors report or not and this is the beginning of the next step.

However, there are more important things before we check the functionality of the object.

Actually, this is where the things get slightly more complex and the path of test-driven database development seems to be separate from test-driven data warehouse development.

According to test-driven database development, we need to write a unit test using tSQLt to check if the object (AuthorsReport) is working properly such that it is showing the required output or not and in our case, the output is authors report.

To write such a unit test actually requires us to do a lot of things in the background and that is what test-driven data warehouse development is all about from the proof of concept perspective.

If we don’t understand this breakdown of functioning then we are almost lost in the proof of concept so please be focussed on this passage of the article as much as possible.

Rather than creating a SQL unit test straight away, we have to first think of its prototype so that it becomes easier for us to write, test, refactor and retest as per test-driven database development.

Please refer to Fundamentals of Test-Driven Database Development (TDDD) with tSQLt unit testing if you are not sure why we are talking about writing the SQL unit test first before adding the functionality to the object.

Let us call AuthorsReport object as the primary object and breakdown the functionality of the primary object (AuthorsReport) in a reverse engineering style as follows:

  1. We are focussed on an object called AuthorsReport which must show report of the authors
  2. The report of the authors cannot be shown if there is no Author table in the data warehouse
  3. The Author table is not useful if there is no data in it
  4. The data cannot be stored in the Author table if it cannot be pulled from the desired source
  5. The data cannot be pulled from the source if there is no Extract-Transform-Load (ETL) process in operation
  6. The ETL process cannot run if there is no GetAuthorExtract object
  7. GetAuthorExtract cannot extract the data from the source if there is no Author table in the source
  8. The Author table in the source cannot be used to extract the data if there is no data in it

Yes, it is a mind-spinning breakdown even I am feeling the effects but we have to proceed since it is ultimately a business requirement that needs to be satisfied (as we are visualizing the pressure on the development team while working on this project/prototype).

AuthorsReport Object functionality breakdown

If you understand this point then you understand the whole concept because this is the trigger in our proof of concept and it must be fully understood although this is going to be revised further.

Please remember objects can be easily handled by tSQLt.

Secondary Object (GetAuthorsExtract)

Now one thing is clear that we need at least one more object to carry on with our proof of concept and this object is GetAuthorsExtract.

We have identified the secondary object as well and this means our test-driven data warehouse development concept is now based on the following objects:

  1. AuthorsReport
  2. GetAuthorsExtract

Tertiary Object (TransformLoadAuthorsExtract)

We are talking from a typical data warehouse perspective where both staging extracts and transform loads take place. As a reminder staging extract grabs the data from the source and puts it into a staging environment and transforms load then processes that data to put it into BI tables for reporting and analysis.

So we have the following three objects now:

  1. GetAuthorsExtract (to get the data from source database)
  2. TransformLoadAuthorsExtract (to transform data from staging into BI)
  3. AuthorsReport (show data based on Author table)

Identifying potential objects in test-driven data warehouse development with tSQLt (concept)

We installed tSQLt on both source and data warehouse databases in the first part of this article because the second (secondary) object is going to be unit tested in the source database.

Now we have another solid object GetAuthorsExtract and more reasons to move forward but with caution.

Three SQL unit tests to check three objects

We know that the authors’ report can be shown if the above three objects are present and working properly.

As per test-driven data warehouse development, we are going to create SQL unit tests with the help tSQLt framework to check if they exist.

We have already created one such SQL unit test so let us create the rest of the tests.

Create SQL unit test Class for the second object

This SQL unit test is going to be created in the source database (SQLDevArticlesV5) that’s why we installed tSQLt on both source and data warehouse in the beginning (first part of the article) of the proof of concept.

Let us create a new test class AuthorsReportTets in the source database:

Create SQL unit test for the second object

Create a unit test with tSQLt in the source database SQLDevArticlesV5 to check if the object GetAuthorsExtract exists as follows:

Run SQL unit test(s) for the second object

Now please run all SQL unit tests in the source database by issuing the following tSQLt command:

Applying test-driven database development principles and failure of test is as expected:

Expected failed test as the object does not exist

Create a second object (GetAuthorsExtract)

Create the desired object stub (place holder) in the source database SQLDevArticlesV5:

Rerun SQL unit test(s) against the source database

Please run the SQL unit tests against the source database:

The output is as follows:

SQL unit test has passed

Create SQL unit test for the third object

We have to write another SQL unit test for the third object in the data warehouse database this time as follows:

Run SQL unit test(s) for the third object

Time to run the SQL unit tests against the Data Warehouse database by using the same script:

The output is as below:

SQL unit test failed as TransformLoadAuthorsExtract does not exist

The output clearly shows that the second object needs to be created while the first object exists in the database and both objects unit tests must pass for us to proceed further.

Create a third object (GetAuthorsExtract)

As per test-driven database development rules please simply create the object without its functionality as follows:

Rerun SQL unit tests against the data warehouse

Please rerun SQL unit tests to see how the output has changed now:

Both SQL unit tests have passed

Both tests have passed now and all three objects exist in our sandbox environment.

Congratulations, our proof of concept has proved its worth by showing the results at the beginning of this massive project.

Please refer to SQL Unit Testing Data Warehouse Extracts with tSQLt article which can help you to proceed further on your own with proof of concept if you stick to the core concepts of this article.

Conclusion

At this point, we can say that the potential objects required to meet the business requirement (to show authors report) have been successfully identified and developed with the exception of their functionality since the SQL unit tests to check if they exist are all passed.

This is the beauty of test-driven database development as it keeps you focussed on mini objectives by first understanding the total objects required to achieve the objective and then encouraging you to write tests to check if they exist even before you start creating these objects.

Another plus of test-driven approach is that you only do what you are supposed to do and you only work with those things which you should be working with like in our case the whole project revolves around just three objects.

I know it is still a long way but this is a taster session since these types of articles do get pretty complicated as we move along and the more you proceed further the more expertise are required but we have so far done a great job by identifying the objects and trying out the proposed approach (test-driven data warehouse development).

It may be possible that some of the upcoming articles may refer back to our development sandbox where we left it and we may start incorporating the functionality to observe more practical benefits of using tSQLt with this newly proposed and challenging methodology.

Haroon Ashraf
SQL unit testing, Testing

About Haroon Ashraf

Haroon’s deep interest in logic and reasoning at an early age of his academic career paved his path to become a data professional. He holds BSc and MSc Degrees in Computer Science and also received the OPF merit award. He began his professional life as a computer programmer more than a decade ago, working on his first data venture to migrate and rewrite a public sector database driven examination system from IBM AS400 (DB2) to SQL Server 2000 using VB 6.0 and Classic ASP along with developing reports and archiving many years of data. His work and interest revolves around Database-Centric Architectures and his expertise include database and reports design, development, testing, implementation and migration along with Database Life Cycle Management (DLM). He has also received passing grade to earn DevOps for Databases verified certificate, an area in which he finds particular interest and potential. View all posts by Haroon Ashraf

206 Views