Hadi Fadlallah
Unpivot input and output rows count

SSIS Pivot transformation vs. Unpivot transformation

September 8, 2021 by

This article explains the SSIS pivot transformation and how it differs from the unpivot transformation.

This article is the thirteenth article in the SSIS feature face-to-face series, which aims to remove confusion and to illustrate some of the differences between similar features provided by SQL Server Integration Services (SSIS).

Introduction

In general, the word “Pivot” means a shaft or a pin on which something turns. In the world of data, pivoting data is a data processing technique that is used to reshape tabular data by converting rows into columns. Pivoting is generally used for data analysis and visualization. Unpivot is the reverse operation of pivot where the columns are converted to rows.

Pivot vs. Unpivot operations

Figure 1 – Pivot vs. Unpivot operations

In SQL Server, there are two built-in relational operators called Pivot and Unpivot that can be used to perform the pivoting and unpivoting operations over relational data. Since these two operations are needed sometimes in the data import phase, two transformations are developed in the SQL Server Integration Services. The following sections will explain the SSIS pivot transformation and the SSIS unpivot transformation and how to use them during the data import phase.

  • Side Note: All examples in this article are made using the Stack overflow 2013 database, SQL Server 2019, Visual Studio 2019, SQL Server Integration Services projects extension version 3.4.

SSIS pivot transformation

As described in the Visual Studio toolbox, the SSIS pivot transformation “Compacts an input data flow by pivoting it on a column value, making it less normalized.” Besides, two essential requirements are mentioned:

  1. “The input data should be sorted by the pivot column because a pivot happens each time data in the pivot column changes.” This note means that if the data is not sorted, this may cause meaningless changes in the component output metadata, leading it to failure
  2. “Duplicate Rows will cause this transformation to fail.” Since duplicate columns’ will be generated in the component output

SSIS Pivot transformation description in the SSIS toolbox

Figure 2 – Pivot transformation description in the SSIS toolbox

Let us consider the following query that returns the number of earned badges per year:

Earned badges per year

Figure 3 – Earned badges per year

To explain the SSIS pivot transformation, we (1) create an Integration Services project, (2) added a package, (3) created an OLE DB connection manager to configure the connection to the Stack Overflow 2013 database.

Within the package control flow, we add a Data Flow Task. As shown in the image below, we added an OLE DB Source and configured it to read from the SQL command mentioned above.

OLE DB Source configuration

Figure 4 – OLE DB Source configuration

Now, let us add a pivot transformation that takes the OLE DB Source output as an input. In the SSIS Pivot transformation editor, we need to select the following columns:

SSIS Pivot transformation editor

Figure 5 – Pivot transformation editor

  • Pivot Key: Values in the input data from this column will become the new column names in the output. In this example, we should select the “Earned_Year” column
  • Set Key: Identifies a group of input rows that will get pivoted into one output row
  • Pivot Value: Values from this column will be mapped into the new pivot output columns

Pivot columns explanation

Figure 6 – Pivot columns explanation

Since SSIS component metadata needs to be defined before execution, the pivot key columns should be defined. If we already know the values in the data, we should write them within the values textbox and click on the “Generate columns now” button as shown below.

Generate output columns

Figure 7 – Generate output columns

If we don’t know all values found in the data, we should check the “Ignore un-matched Pivot Key values and report them after DataFlow execution” option. Execute the DataFlow in the debugger, and copy the value list reported in the debugger’s output Window (this is mentioned as a hint in the SSIS pivot transformation editor).

Ignore un-matched Pivot key option

Figure 8 – Ignore un-matched Pivot key option

To execute the Data Flow task, go to the package control flow, press right-click on the Data Flow task, and click on “Execute Task”.

Executing the Data Flow Task

Figure 9 – Executing the Data Flow Task

After execution is finished, go to the Progress tab and check the SSIS pivot transformation output as shown in the image below.

Reading the SSIS pivot transformation output

Figure 10 – Reading the pivot transformation output

You can copy this output by pressing right-click on the message and click on “Copy message text”.

Copy the SSIS pivot transformation output

Figure 11 – Copy the pivot transformation output

We should paste only the column names into the output column text box and click on “Generate columns now”. We can check the added columns in the “Existing pivoted output columns” text box.

Generating output columns

Figure 12 – Generating output columns

After configuring the SSIS pivot transformation, we will export the result into a Flat file (CSV). As shown in the image below, the number of rows is reduced to the number of distinct values found in the “Set Key” columns.

Executed data flow task

Figure 13 – Executed data flow task

If we open the created CSV file, we can check how the data is transformed to a pivot table.

Data sample from the exported csv file

Figure 14 – Data sample from the exported csv file

SSIS unpivot transformation

As mentioned in the SSIS toolbox, the unpivot transformation “expands an un-normalized data flow into a more normalized version (Values from multiple columns of a single record expand to multiple records in a single column)”.

Unpivot transformation description in the SSIS toolbox

Figure 15 – Unpivot transformation description in the SSIS toolbox

To explain the SSIS unpivot transformation, we will add a new Data Flow task with a Flat File source that reads from the CSV file we created in the previous example. Then we will add an SSIS unpivot transformation component that takes the flat file source’s output as an input.

Now, let us open the SSIS unpivot transformation editor. As shown in the image below, we should configure this transformation as follows:

  • We should check the “Pass through” option for the columns that do not need to be unpivoted
  • We should check all columns that we need to unpivot
  • In the columns grid, we should specify an output column name for each input column. In this example, we used “Times_Earned” as a column name since it was used in the initial SQL query we used
  • We should specify the pivot value output column name where the values will be stored. In this example, we used “Earned_Year” as a column name since it was initially used in this article

SSIS unpivot transformation editor

Figure 16 – SSIS unpivot transformation editor

After configuring the unpivot transformation, we add a Flat file destination to export the unpivoted data to another flat file (CSV). After executing the package, we should note that the unpivot transformation output rows count is higher than the input rows.

Unpivot input and output rows count

Figure 17 – Unpivot input and output rows count

If we open the created CSV file, we can check how pivoted data is transformed back to a normalized table.

Unpivoted data sample

Figure 18 – Unpivoted data sample

ETL, Integration Services (SSIS)

About Hadi Fadlallah

Hadi is an SQL Server professional with more than 10 years of experience. His main expertise is in data integration. He's one of the top ETL and SQL Server Integration Services contributors at Stackoverflow.com . Also, he published several article series about Biml, SSIS features, Search engines, Hadoop, and many other technologies. Besides working with SQL Server, he worked with different data technologies such as NoSQL databases, Hadoop, Apache Spark. He is a MongoDB, Neo4j, and ArangoDB certified professional. On the academic level, Hadi holds two master's degrees in computer science and business computing. Currently, he is a Ph.D. candidate in data science focusing on Big Data quality assessment techniques. Hadi really enjoys learning new things everyday and sharing his knowledge. You can reach him on his personal website. View all posts by Hadi Fadlallah

168 Views