Hadi Fadlallah
This image shows the data flow task in the second package

SSIS Derived Columns with Multiple Expressions vs Multiple Transformations

September 25, 2019 by

In this article, we will first give an overview on SSIS derived column transformation, then we will run an experiment to check if there is any difference between adding multiple expressions within one derived column transformation and adding a derived column transformation for each expression.

This article is the sixth article in the SSIS feature series which aims to remove confusion and to illustrate some of the differences between similar features provided by SQL Server Integration Services.

Derived Column Transformation

SSIS Derived Column Transformation is one of the SQL Server Integration Services that can be added within a Data Flow Task, it is used to add a new column to the data pipeline by applying SSIS expressions. The developer can choose whether to add a new derived column or to replace an existing column.

SSIS expressions are a combination of literals, functions, operators that yields a single data value. An expression can be composed of a single value (“abc”) or a variable (@[User::Variable1]) or a function (GETDATE()), or it can be more complex and contains some conditionals (CASE statement) or some mathematical operations or some string manipulation. To learn more about SSIS expressions you can refer to the following article from earlier in the series: SSIS Expression Tasks Vs Evaluating variables as expressions

SSIS Derived Column Transformation can be added from the SSIS toolbox.

This image shows the descirption of SSIS dervied column from the SSIS tollbox

Figure 1 – SSIS Derived Column description from SSIS toolbox

The SSIS Derived Column editor is composed of 4 parts as shown in the image below:

  1. Columns and Variables List
  2. Expressions built-in functions
  3. Derived Columns Grid
  4. Error Handling

This image shows the SSIS derived column transformation editor

Figure 2 – SSIS Derived Column Editor

The first two parts are explained in the previous article that we mentioned above. In this section, we will describe the other parts and we will try to give some of the most known expression and use cases.

Derived Columns Grid

The Derived Columns Grid contains the following configurations:

  • Derived Column Name: Specify the derived column name
  • Derived Column: Select between <add new column> and <replace an existing column> options
  • Expression: Write the SSIS expression that produce the derived column
  • Data Type: This column is read-only and it shows the data type of the expression result
  • Length: This column is read =-only and it shows the column length based on the expression result
  • Scale: This column is read-only and it automatically sets the scale for numeric data based on the data type
  • Precision: This column is read-only and it automatically sets the precision for numeric data based on the data type
  • Code Page: This column can be edited and it automatically sets code page for the DT_STR (string) data type

The Derived Columns Grid contains an empty line that is used to add a new derived column. When start filling the derived column properties within this line a new empty line is added to allow the user to add more columns.

If the expression cannot be applied, its color is changed to red as shown in the image below:

This image shows the SSIS derived column grid from the transformation editor

Figure 3 – Invalid expression example

When you put the mouse cursor over the expression, a tooltip appears with the error description:

This image shows the error message tooltip when expression is invalid

Figure 4 – Expression error message tooltip

To learn more about SSIS Derived Column Transformation you can refer to the following official documentation:

Error Handling

In SSIS Derived column, errors are mostly caused by expression evaluation errors, which occur if expressions that are evaluated at a run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values.

Error output configuration is an option that is included in most of the Data Flow Task components. In this configuration you can select how this component behaves when an evaluation error or a truncation occurs:

  • Fail component
  • Ignore Failure
  • Redirect Row (Error output Flow)

This image shows the error output configuration form in the SSIS derived column transformation

Figure 5 – Error output configuration

In this article, we have considered error output configuration as a part of SSIS Derived Column Transformation since it is required to implement many popular use cases such as TRY_PARSE SQL function (will be described in the next section).

To learn more about Error output configuration in SSIS, you can refer to the following documentation:

Examples and Use cases

In this section, we will list some of the most popular SSIS Derived Column examples and use cases and provide an expression or an external link that contains an example:

  • Concatenating first name and last name into one column:

    [FirstName] + “ “ + [LastName]

  • Extract a part of a string:

    SUBSTRING([Address],1,5)

  • Change date to integer in yyyyMMdd format:

    (DT_I4)((DT_WSTR,4)YEAR([DateColumn]) +

    RIGHT( “00” + (DT_WSTR,2)MONTH([DateColumn]),2) +

    RIGHT( “00” + (DT_WSTR,2)DAY([DateColumn]),2))

  • Remove Leading and Trailing spaces from string:

    LTRIM(RTRIM([Address]))

Multiple Expressions Vs Multiple Derived Column Transformation

After describing the SSIS Derived Column Transformation component, now we will try to make a comparison between adding multiple expressions into one Derived Column Transformation or adding multiple Derived Column Transformations.

Performance

To compare both approaches from a performance perspective we have run the following experiment; we created two packages each one contains one Data Flow Task. We have used the [Person].[Person] table from the AdventureWorks2017 database as source. And we applied the following derived columns expressions in each package:

  • Name:

    [FirstName] + ” ” + [LastName]

  • Full Name:

    [FirstName] + ” ” + [MiddleName] + ” ” + [LastName]

  • Father Full Name:

    [MiddleName] + ” ” + [LastName]

  • NumericDate:

    (DT_I4)((DT_WSTR,4)YEAR(ModifiedDate) + RIGHT(“00” + (DT_WSTR,2)MONTH(ModifiedDate),2) + RIGHT(“00” + (DT_WSTR,2)DAY(ModifiedDate),2))

In the first package we added only one Derived column transformation as shown in the images below:

This image shows the expressions defined in the SSIS derived column editor

Figure 6 – First package derived column editor

This image shows the data flow task of the first package

Figure 7 – First package data flow task

And in the second package, each expression was added in a separate SSIS derived column transformation as shown in the image below:

This image shows the data flow task in the second package

Figure 8 – Second package data flow task

After executing both packages, the results show that adding multiple expressions into one SSIS derived column transformation lasted about 02.797 seconds while adding multiple SSIS derived column transformation lasted 03.329 seconds.

The main cause was that each component has its own execution phases (Validation, Pre Execute, Execute, Post Execute) which requires more time.

Dependency

Note that, we can put all derived columns within one transformation component if they are independent, because if one derived column expression used another one, we must add a component for each one.

Error Handling

The last thing to mention is that using one Transformation component with multiple expression will make the error handling process more difficult when we need to redirect erroneous rows since the error is caused by one of the expressions and it is harder to know which expression caused the error. While when using multiple transformation components, error handling is easier.

Conclusion

In this article, we have described the Derived Column Transformation in SSIS, and we ran an experiment to illustrate the difference between adding multiple expression within one derived column transformation and adding one derived column transformation component for each expression. The results showed that from a performance perspective using the first approach is more efficient while the second provide an easier error handling.

Table of contents

SSIS OLE DB Source: SQL Command vs Table or View
SSIS Expression Tasks vs Evaluating variables as expressions
SSIS OLE DB Destination vs SQL Server Destination
Execute SQL Task in SSIS: SqlStatementSource expression vs Variable Source Type
Execute SQL Tasks in SSIS: Output Parameters vs Result Sets
SSIS Derived Columns with Multiple Expressions vs Multiple Transformations
SSIS Data types: Change from the Advanced Editor vs Data Conversion Transformations
SSIS Connection Managers: OLE DB vs ODBC vs ADO.NET

Hadi Fadlallah

Hadi Fadlallah

Hadi is a Lebanese Researcher, Data Engineer and Business Intelligence Developer.

He has been working with SQL Server for more than 10 years. Also, he's one of the top ETL and SQL Server Integration Services contributors at Stackoverflow.com

Hadi really enjoys learning new things everyday and sharing his knowledge.

View all posts by Hadi Fadlallah
Hadi Fadlallah
236 Views