This is the second part of a conceptual article for data professionals and enthusiasts interested to work on a test-driven data warehouse development concept by using tSQLt an industry recognized SQL unit testing framework.
This article takes you to the next step in the test-driven data warehouse development proof of concept adding more value to the effort we have put into it when the initiative was taken.
Additionally, the readers of this article are going to experience the tremendous capacity of tSQLt when up against proof of concept rather than a traditional database testing scenario.
Next Step Prerequisites
Before we jump to the next step it is better to review the prerequisites so that following the steps ahead becomes easier.
It is highly recommended to go through the first part of the article The Concept of Test-Driven Data Warehouse Development (TDWD) with tSQLt to fully understand the concepts and follow the walkthroughs.
This article assumes the readers have already gone through the following steps:
- Knowhow of Proof of concept including the strategies
- Scope of the work has been defined and understood
- Mapping Strategy has been selected and understood
- Business requirements are fully understood
- Sandbox environment concept has been understood
- Sample database SQLDevArticlesV5 has been setup
- Sample data warehouse SQLArticlesV5DW has been setup
- tSQLt has been installed on both sample databases
- AthorsReport object has been chosen
- The business requirements are known and also the reason behind working on test-driven data warehouse development proof of concept
- tSQLt test to check AuthorsReport has been written and run successfully
Developer Sandbox Check
We set up a sandbox environment as we began this work (in the first part of the article) consisting of the above-mentioned sample databases with tSQLt installation on both sample databases followed by a SQL unit test to check if the desired object exists (in the data warehouse).
Let us rerun the very first SQL unit test to check if the object still exists by running the following script against the data warehouse SQLArticlesV5DW:
1 2 3 4 5 |
USE SQLDevArticlesV5DW GO -- Run all unit tests tSQLt.RunAll |
Please ensure that your test output should be the same as shown below:
On another note, one would think do we really need to run the SQL unit test to see whether the object exists or not when we can check it under the related database folder in the Object Explorer as it is obvious from the above screenshot.
The answer is very simple that we are running the SQL unit test primarily to check if the object exists or not but also to check the following:
- tSQLt is installed and working
- There are no other SQL unit tests which we are not aware of
- Test-driven data warehouse development is in operation
So the sandbox (environment) check has been done, we are moving to the next step, but can you please guess what the next step is going to be?
AuthorsReport Object functionality breakdown
If I say that the next step is to check if the object is functioning properly or not then this makes sense. A passed SQL unit test confirms that the AuthorsReport object exists but this is not enough as we now need to check if the object shows the authors report or not and this is the beginning of the next step.
However, there are more important things before we check the functionality of the object.
Actually, this is where the things get slightly more complex and the path of test-driven database development seems to be separate from test-driven data warehouse development.
According to test-driven database development, we need to write a unit test using tSQLt to check if the object (AuthorsReport) is working properly such that it is showing the required output or not and in our case, the output is authors report.
To write such a unit test actually requires us to do a lot of things in the background and that is what test-driven data warehouse development is all about from the proof of concept perspective.
If we don’t understand this breakdown of functioning then we are almost lost in the proof of concept so please be focussed on this passage of the article as much as possible.
Rather than creating a SQL unit test straight away, we have to first think of its prototype so that it becomes easier for us to write, test, refactor and retest as per test-driven database development.
Please refer to Fundamentals of Test-Driven Database Development (TDDD) with tSQLt unit testing if you are not sure why we are talking about writing the SQL unit test first before adding the functionality to the object.
Let us call AuthorsReport object as the primary object and breakdown the functionality of the primary object (AuthorsReport) in a reverse engineering style as follows:
- We are focussed on an object called AuthorsReport which must show report of the authors
- The report of the authors cannot be shown if there is no Author table in the data warehouse
- The Author table is not useful if there is no data in it
- The data cannot be stored in the Author table if it cannot be pulled from the desired source
- The data cannot be pulled from the source if there is no Extract-Transform-Load (ETL) process in operation
- The ETL process cannot run if there is no GetAuthorExtract object
- GetAuthorExtract cannot extract the data from the source if there is no Author table in the source
- The Author table in the source cannot be used to extract the data if there is no data in it
Yes, it is a mind-spinning breakdown even I am feeling the effects but we have to proceed since it is ultimately a business requirement that needs to be satisfied (as we are visualizing the pressure on the development team while working on this project/prototype).
If you understand this point then you understand the whole concept because this is the trigger in our proof of concept and it must be fully understood although this is going to be revised further.
Please remember objects can be easily handled by tSQLt.
Secondary Object (GetAuthorsExtract)
Now one thing is clear that we need at least one more object to carry on with our proof of concept and this object is GetAuthorsExtract.
We have identified the secondary object as well and this means our test-driven data warehouse development concept is now based on the following objects:
- AuthorsReport
- GetAuthorsExtract
Tertiary Object (TransformLoadAuthorsExtract)
We are talking from a typical data warehouse perspective where both staging extracts and transform loads take place. As a reminder staging extract grabs the data from the source and puts it into a staging environment and transforms load then processes that data to put it into BI tables for reporting and analysis.
So we have the following three objects now:
- GetAuthorsExtract (to get the data from source database)
- TransformLoadAuthorsExtract (to transform data from staging into BI)
- AuthorsReport (show data based on Author table)
We installed tSQLt on both source and data warehouse databases in the first part of this article because the second (secondary) object is going to be unit tested in the source database.
Now we have another solid object GetAuthorsExtract and more reasons to move forward but with caution.
Three SQL unit tests to check three objects
We know that the authors’ report can be shown if the above three objects are present and working properly.
As per test-driven data warehouse development, we are going to create SQL unit tests with the help tSQLt framework to check if they exist.
We have already created one such SQL unit test so let us create the rest of the tests.
Create SQL unit test Class for the second object
This SQL unit test is going to be created in the source database (SQLDevArticlesV5) that’s why we installed tSQLt on both source and data warehouse in the beginning (first part of the article) of the proof of concept.
Let us create a new test class AuthorsReportTets in the source database:
1 2 3 4 5 6 7 8 |
USE SQLDevArticlesV5; GO -- Creating unit test class AuthorsReportTests CREATE SCHEMA [AuthorsReportTests] Authorization dbo GO EXECUTE sp_addextendedproperty @name='tSQLt.TestClass',@value=1,@level0type='SCHEMA',@level0name='AuthorsReportTests' |
Create SQL unit test for the second object
Create a unit test with tSQLt in the source database SQLDevArticlesV5 to check if the object GetAuthorsExtract exists as follows:
1 2 3 4 5 6 7 8 9 10 11 12 |
USE SQLDevArticlesV5 GO CREATE PROCEDURE AuthorsReportTests.[test to check GetAuthorsExtract exists] AS BEGIN --Assemble --Act --Assert EXEC tSQLt.AssertObjectExists @ObjectName = N'GetAuthorsExtract' END; |
Run SQL unit test(s) for the second object
Now please run all SQL unit tests in the source database by issuing the following tSQLt command:
1 2 3 4 5 |
USE SQLDevArticlesV5 GO -- Run all unit tests tSQLt.RunAll |
Applying test-driven database development principles and failure of test is as expected:
Create a second object (GetAuthorsExtract)
Create the desired object stub (place holder) in the source database SQLDevArticlesV5:
1 2 3 4 5 6 7 8 |
USE SQLDevArticlesV5 GO -- Creating database object GetAuthorsExtract stub (placeholder) in the source database CREATE View GetAuthorsExtract AS SELECT 1 AS Stub ; |
Rerun SQL unit test(s) against the source database
Please run the SQL unit tests against the source database:
1 2 3 4 5 |
USE SQLDevArticlesV5 GO -- Run all unit tests tSQLt.RunAll |
The output is as follows:
Create SQL unit test for the third object
We have to write another SQL unit test for the third object in the data warehouse database this time as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
USE SQLDevArticlesV5dw GO CREATE PROCEDURE AuthorsReportTests.[test to check TransformLoadAuthorsExtract exists] AS BEGIN --Assemble --Act --Assert EXEC tSQLt.AssertObjectExists @ObjectName = N'TransformLoadAuthorsExtract' END; GO |
Run SQL unit test(s) for the third object
Time to run the SQL unit tests against the Data Warehouse database by using the same script:
1 2 3 4 5 |
USE SQLDevArticlesV5DW GO -- Run all unit tests tSQLt.RunAll |
The output is as below:
The output clearly shows that the second object needs to be created while the first object exists in the database and both objects unit tests must pass for us to proceed further.
Create a third object (GetAuthorsExtract)
As per test-driven database development rules please simply create the object without its functionality as follows:
1 2 3 4 5 6 7 8 |
USE SQLDevArticlesV5DW GO -- Creating database object TransformLoadAuthorsExtract stub (placeholder) in the data warehouse (database) CREATE View TransformLoadAuthorsExtract AS SELECT 1 AS Stub ; |
Rerun SQL unit tests against the data warehouse
Please rerun SQL unit tests to see how the output has changed now:
1 2 3 4 5 |
USE SQLDevArticlesV5DW GO -- Run all unit tests tSQLt.RunAll |
Both tests have passed now and all three objects exist in our sandbox environment.
Congratulations, our proof of concept has proved its worth by showing the results at the beginning of this massive project.
Please refer to SQL Unit Testing Data Warehouse Extracts with tSQLt article which can help you to proceed further on your own with proof of concept if you stick to the core concepts of this article.
Conclusion
At this point, we can say that the potential objects required to meet the business requirement (to show authors report) have been successfully identified and developed with the exception of their functionality since the SQL unit tests to check if they exist are all passed.
This is the beauty of test-driven database development as it keeps you focussed on mini objectives by first understanding the total objects required to achieve the objective and then encouraging you to write tests to check if they exist even before you start creating these objects.
Another plus of test-driven approach is that you only do what you are supposed to do and you only work with those things which you should be working with like in our case the whole project revolves around just three objects.
I know it is still a long way but this is a taster session since these types of articles do get pretty complicated as we move along and the more you proceed further the more expertise are required but we have so far done a great job by identifying the objects and trying out the proposed approach (test-driven data warehouse development).
It may be possible that some of the upcoming articles may refer back to our development sandbox where we left it and we may start incorporating the functionality to observe more practical benefits of using tSQLt with this newly proposed and challenging methodology.
Table of contents
- How to create and query the Python PostgreSQL database - August 15, 2024
- SQL Machine Learning in simple words - May 15, 2023
- MySQL Cluster in simple words - February 23, 2023