SQL Server Integration Services or SSIS is used as an ETL tool to extract-transform-load data from heterogeneous data sources to different databases. After extracting data from the different sources, most often there are a lot of transformations needed. One of the frequent transformations is SSIS Conditional Split.
Let us assume we have a set of employees, who has different payment types such as Permanent, Temporary, and Commission bases. As you know, different payment types need different calculations. Let us assume following is the data set.
Now the requirement is to perform the calculation for different payment types. This means that you need to split this data set into different payment types and they perform the relent calculation.
Let us implement this in SSIS.
First, create an SSIS project in Visual Studio and open the existing DTSX package.
Since this is a data flow task drag and drop Data Flow Task to the control flow as shown in the below image.
Then double-click the Data Flow Task which will open in the data flow pane.
Since we are extracting data from a text a file, let us create a connection to the text file and create a source for it from the Flat file Source.
Since this text file is a comma-separated value, the Delimited format is selected which is the default setting in the flat file connection along with the other default settings.
Following is the sample data set which can be seen after setting up the flat file connection.
Now we are ready to separate incoming data set to different payment types.
SSIS Conditional Split Control
SSIS Conditional Split control can be seen in the SSIS toolbox under the Common category as shown in the below image.
Drag and drop the SSIS Conditional Split control to the data flow and connect with the flat file connection as shown below.
As shown in the above image, SSIS Conditional Split control is renamed to Split for Different Pay Types for better readability.
Next is to configure the SSIS Conditional Split Control which can be done by double-clicking the Conditional Split Control.
Above are most of the important configurations in the SSIS Conditional Split Control. In this configuration page, you need to provide the condition which will be used to split the data set. Scripting in the conditional split configuration needs the VBScript format.
In the above configuration, three conditions are configured. In this configuration, the dataset is divided into three conditions. Developers have the option of drag and drop the column names from the above to the condition which will become much easier for the developers. These conditions can incorporate with inbuilt string functions, mathematical functions, Date/Time functions, NULL functions.
Though the above configurations are fairly straightforward, there can be instances where the split condition can be complexed. When there is a complex condition, there can be instances where one record may fall into multiple conditions. Due to the Priority order, when a record satisfies a condition, it will not be evaluated again.
In the above configuration, there is a Default output name called Other. This is to transfer all records which do not fall into any of the above conditions. This means that all the records coming into the SSIS Conditional Split control will be output from the control.
Next is to configure the output from the SSIS Conditional Split.
As you can see in the above screenshot, there are four paths coming out from the SSIS Conditional Split control.
As seen from the above screenshot, the output is split into four paths and after this, each path is independent of the other path. This means that different transformations can be done for different paths as seen in the following image after executing the package.
As shown in the above screenshot, all eight records are split into four paths. Relevant records can be viewed by enabling data viewer at the relevant path.
Most of the time, developers forget to configure the Other path as shown in the above example. Most of the time, developers will configure the split conditions for their requirement. However, with overtime, when there is a new configuration comes to the data source, this will not be considered. The better option would be, at least move the default path to the audit log so that it can be viewed different times to identify whether those records need to be considered so that conditions can be modified accordingly.
Always remember to configure the error output. This applies to every transformation in SSIS. Since you are dealing with data which you don’t have control over, you don’t know what are the data coming in. Therefore, it is always better to configure the error and redirect to a different target.
- Use Replication to improve the ETL process in SQL Server - November 4, 2021
- Text Classification in Azure Machine Learning using Word Vectors - October 1, 2021
- Building Ensemble Classifiers in Azure Machine Learning - September 21, 2021