Aveek Das
Executing the master package - ETL in SSIS

Implementing a Modular ETL in SSIS

November 24, 2020 by

In this article, I am going to demonstrate about implementing the Modular ETL in SSIS practically. In my previous article on Designing a Modular ETL Architecture, I have explained in theory what a modular ETL solution is and how to design one. We have also understood the concepts behind a modular ETL solution and the benefits of it in the world of data warehousing. We have also related the concept of microservices architecture in software development to that of the modular ETL solution.

In this tutorial, I am going to design the entire ETL package and will provide step-by-step guidelines on how to implement the same. Please note that the primary focus of this article is to implement the Modular ETL in SSIS and not to tutor creating the SSIS packages. It is assumed that you have a sound knowledge of developing SSIS packages and the control flow and data flow components within SSIS. For the sake of clear understanding, I have created the following three databases.

Database Name

Description

ApplicationDB

This database is considered as the OLTP database.

DataWarehouse

This can be considered as the warehouse database.

ControlDB

This database is used by the master package to log the executions.

You can use the following script to create the databases and the relevant data.

Now that the databases are ready, let’s go ahead and start creating the packages in SSIS. We will create three packages in SSIS as follows.

Package Name

Description

Orders.dtsx

Extracts and loads the data for the orders module.

Sales.dtsx

Extracts and loads the data for the sales module.

Master.dtsx

Controls the execution flow between the various components of the modules.

Creating the connection managers in SSIS

As we have created the above three packages in SSIS, let us now create the connection managers that we will be using in order to connect and communicate with the database.

Connection Managers - ETL in SSIS

Figure 1 – Connection Managers

Building the Orders and Sales packages

Now that we have our connection managers ready, we can start creating the SSIS packages for Orders and Sales. For the sake of simplicity, I am not going to use any transformations. It will be just extracted and load for both the packages.

Orders Package - ETL in SSIS

Figure 2 – Orders Package

As you can see in the above package, we have two Sequence Control – “Extract” and “Load” which defines the extract and the load components within the package. Let us now try to dive deep and understand the contents of each of the components.

Extract

Data Flow Task – This is used to connect to the source database (ApplicationDB) and load data into the stage tables of the DataWarehouse.

Update ControlDB – This is an audit component that logs the execution to the ExecutionLogs table in the ControlDB.

Load

Execute SQL Task – Runs the stored procedure “usp_Populate_Orders” in the DataWarehouse.

Update ControlDB – This is an audit component that logs the execution to the ExecutionLogs table in the ControlDB.

Once all the components are added, add the control flow as shown in the above figure.

Adding parameters in the packages

One of the important features that we will be using to implement the modular ETL in SSIS is to be leveraging the parameters in SSIS. These parameters can be used to control the flow and execution of the individual modules from the master package also.

Parameters - ETL in SSIS

Figure 3 – Parameters – ETL in SSIS

As you can see in the above figure, we have added two parameters, namely p_Enable_Orders_Extract and p_Enable_Orders_Load. These are Boolean parameters, the default value of which is set to True.

Adding Expressions to the Sequence Control

So far, we have created the parameter in SSIS, however did not use them in the package. In this step, we will use the value of these parameters to enable or disable the Sequence Control components in the package. The reason behind doing this is that we can externally control which component to execute and which one to exclude from the execution pipeline.

Click the Extract component and select Expressions from Properties.

Selecting Expressions - ETL in SSIS

Figure 4 – Selecting Expressions

In the Expressions window, select Disable as the property from the dropdown and in the Expression, add the following.

@[$Package::p_Enable_Orders_Extract] ?False:True

Adding expression property to Disable

Figure 5 – Adding Extract property to Disable

Basically, what we are trying to achieve in the above is to set the Disabled property on or off based on the parameter value. If the parameter value is set to True, which means the component is enabled, then the Disabled property will be set to False. Repeat the same for the load component, just by altering the parameter name.

Adding Load property to Disable

Figure 6 – Adding Load property to Disable

Repeat the above steps for the Sales module and create the parameters. Also, add the expression controls to the individual extract and load components. An important point to note will be the names of the parameters should represent the name of the package and the name of the component. Otherwise, when you will work with multiple packages and components, it will be difficult to keep a track of all the parameters from various packages.

Parameters for the Sales Package

Figure 7 – Parameters for the Sales Package

Creating the Master Package

At this moment, you should have created both the Orders and Sales SSIS packages and now we are good to go ahead with the development of the Master package. You can consider this as a parent package that will execute the Orders and Sales package from within it. In this case, both the Orders and the Sales packages can be considered as child packages.

Master Package - ETL in SSIS

Figure 8 – Master Package

If you look at the package above, there are four Execute Package Tasks contained within two Sequence Control components, one each for extract and load. Connect the individual components as shown in the figure and add the package names to be executed.

Within the master package, we are going to create four variables, which will be used to bind to the parameters from the child packages. By default, we are going to set the value for all the variables to False.

Variables in the master package

Figure 9 – Variables in the master package

Now, when binding the parameters, keep in mind which component should be executed and which one to be disabled. For example, if we want to execute the Extract component of the Orders module, this is how we should bind the parameters.

  • p_Enable_Orders_Extract = True
  • p_Enable_Orders_Load = False

If you recollect, in the previous steps, we have already set the default value for p_Enable_Orders_Extract as True, so now, we will only set the value of p_Enable_Orders_Load to be False.

Binding the variables

Figure 10 – Binding the variables

Repeat the same for the other Execute Package Tasks as well. Please remember to exactly specify which component should be disabled or enabled while executing the packages.

Executing the master package

Now that our development of all three packages has been completed, we can now go ahead and start the execution of the master package. Once you start the execution, you will see that the individual child packages are loaded and executed by the master package.

Executing the master package - ETL in SSIS

Figure 11 – Executing the master package

Also, for your understanding, you can disable and play with the values of the variables in the master package to learn more about the execution sequence. Alternatively, you can also monitor the execution log of the packages by querying the ExecutionLogs table in the ControlDB.

Viewing the ExecutionLogs

Figure 12 – Viewing the ExecutionLogs

Conclusion

In this article, we have implemented the concept of Modular ETL in SSIS. Designing a modular ETL in SSIS is a vast topic and cannot be covered in one article alone. However, I have tried my best to explain the same with simple examples. I have provided the sample solution for a better understanding. You can now get started with this concept and start modifying the above-explained solution as per your requirements. The main idea is to separate the various components of the individual modules such that all those modules can also be executed independently of one another after deployment. This solution explained here is also scalable and you can execute multiple packages in parallel.

Sample SSIS solution

Aveek Das
ETL, Integration Services (SSIS)

About Aveek Das

Aveek is an experienced Data and Analytics Engineer, currently working in Dublin, Ireland. His main areas of technical interest include SQL Server, SSIS/ETL, SSAS, Python, Big Data tools like Apache Spark, Kafka, and cloud technologies such as AWS/Amazon and Azure. He is a prolific author, with over 100 articles published on various technical blogs, including his own blog, and a frequent contributor to different technical forums. In his leisure time, he enjoys amateur photography mostly street imagery and still life. Some glimpses of his work can be found on Instagram. You can also find him on LinkedIn View all posts by Aveek Das

168 Views