Sifiso W. Ndlovu

Multiple options to transposing rows into columns

January 4, 2016 by

Introduction

One of the primary functions in Business Intelligence is to enable business users to understand the nature of data generated by their business systems. For instance in an insurance industry, a business case in a policy claims department would typically involve understanding the number of submitted vs outstanding documents required to successfully process a claim. A denormalized relational representation of such a business case would look as shown in Table 1:

Table 1

RecKey

PolID

PolNumber

PolType

Effective Date

DocID

DocName

Submitted

Outstanding

1

2

Pol002

Hospital Cover

01-Oct-07

1

Doc A

0

2

2

Pol002

Hospital Cover

01-Oct-07

4

Doc B

0

3

2

Pol002

Hospital Cover

01-Oct-07

5

Doc C

1

4

2

Pol002

Hospital Cover

01-Oct-07

7

Doc D

1

5

2

Pol002

Hospital Cover

01-Oct-07

10

Doc E

1

Note that, although there are 5 entries in Table 1, this is all information about the same policy – Pol002. On an operational reporting side, this may convey an incorrect message that there are 5 policies whose documents are outstanding. The correct way would be to transpose this information that would result in a view that contain a single instance of this policy and all its outstanding/submitted documents as shown in Table 2:

Table 2

PolNumber

PolType

Effective Date

Doc A

Doc B

Doc C

Doc D

Doc E

Pol002

Hospital Cover

01-Oct-07

0

0

1

1

1

The objective of this article is to demonstrate different options that could be utilised to achieve a view that is presented in Table 2. We are also going to use SQL Server Actual Execution Plans, Time and IO Statistics to indicate the cost of using one option over another.

Option #1: PIVOT

If you are already familiar with Transact-SQL (T-SQL), then you may find that the simplest way to transpose rows into columns is to use a PIVOT operator. Figure 1 shows how a script using a PIVOT operator is used to transpose rows into columns.


Figure 1

Vertical Expansion

As we add more policy numbers in our dataset, the script is able to retrieve newly added policy numbers without requiring any changes – as shown in Figure 2.


Figure 2

Horizontal Expansion

The major limitation of using a PIVOT operator is that you need to have a predefined list of all possible values that could be stored in the FOR clause. For instance, my FOR clause looks at values within [DocName] column which are Doc A – Doc E. What will happen if business introduces another requirement (i.e. Doc F) that ought to be submitted for the purposes of processing a Hospital Cover claim? Well, if you rerun the script in Figure 1, you will not see values for Doc F instead you can only see values by editing the FOR clause part of your script to include Doc F column as shown in Figure 3.

However, imagine if business later decides to add 100 more documents that are required to process a claim? It becomes impossible to maintain such a change and if the script is already in production you may have to raise a change request to implement such a change. Thus, although transposing rows using PIVOT operator may seem simple, it may later be difficult to maintain.


Figure 3

Performance Breakdown

The actual estimated plan depicted in Figure 4, indicates that only a single query was executed with majority of the cost used by the Sort operator.


Figure 4

The dtTranspose table was only scanned once with data retrieved from disk (physical reads) and then loaded and read from cache (logical read) as shown in Figure 5.


Figure 5

Option #2: CURSOR

Although the general consensus in the professional community is to stay away from SQL Server Cursors, there are still instances whereby the use of cursors is recommended. I suppose if they were totally useless, Microsoft would have deprecated their usage long ago, right? Anyway, Cursors present us with another option to transpose rows into columns that will result in a similar output as depicted in Table 2. Figure 6 displays a T-SQL code that can be used to transpose rows into columns. The execution result of the script is a print of the @message_T variable that stores the transposed data – obviously you can further take the content from the variable into a permanent SQL Server object i.e. table.


Figure 6

Already looking at the cursor script you can agree that it has more lines of code than the PIVOT option – thus, more lines of code are needed by a Cursor to produce a result similar to Table 2 (that result is show in Figure 7).


Figure 7

Vertical Expansion

Similar to the PIVOT, the cursor has the dynamic capability to append more rows as our dataset expands to include more policy numbers, as shown in Figure 7:


Figure 8

Horizontal Expansion

Unlike the PIVOT, the cursor excels in this area as it is able to expand to include newly added document, Doc F, without altering the script in Figure 6.

Performance Breakdown

The major limitation of transposing rows into columns using CURSOR is a disadvantage that is linked to using cursors in general – they come at significant performance cost. This is because the Cursor generates a separate query for each FETCH NEXT operation – the script in Figure 6 generated that 5 queries as part of the execution plan. In addition to the FETCH NEXT queries, the CURSOR added another query execution for the purposes of retrieving data from the main dataset.


Figure 9

Because of multiple queries generated per FETCH NEXT operation, it is then not surprising to observe (in Figure 10) that the IO statistics of the CURSOR usage was largely spent storing the output of the FETCH NEXT operations in an internal table – Worktable. Reminder that with all its high-maintenance requirement, the PIVOT version of the script never temporarily stored data in Worktables.


Figure 10

Option #3: XML

The XML option to transposing rows into columns is basically an optimal version of the PIVOT in that it addresses the dynamic column limitation. The XML version of the script addresses this limitation by using a combination of XML Path, dynamic T-SQL and some built-in functions (i.e. STUFF, QUOTENAME) as shown in Figure 11.


Figure 11

Vertical Expansion

Similar to the PIVOT and the Cursor, newly added policies are able to be retrieved in the XML version of the script without altering the original script.


Figure 12

Horizontal Expansion

Unlike the PIVOT, newly added documents can be displayed without altering the script.

Performance Breakdown

The execution plan in Figure 13 shows that it took 2 queries to transpose our data – which is a major improvement from the CURSOR option.


Figure 13

In terms of IO, the statistics of the XML version of the script is almost similar to the PIVOT – the only difference is that the XML has a second scan of dtTranspose table but this time from a logical read – data cache.


Figure 14

Option #4: Dynamic SQL

Another alternative to the optimal XML option is to transpose rows into columns using purely dynamic SQL – without XML functions. This option utilises the same built-in functions that are used in the XML version of the script as shown in Figure 15.


Figure 15

Vertical Expansion

Similar to the PIVOT, Cursor and XML options, newly added policies are able to be retrieved in the XML version of the script without altering the original script.

Horizontal Expansion

Similar to the XML option, newly added documents can be displayed without altering the script.

Performance Breakdown

Similar to the XML option, the execution plan in the Dynamic SQL script uses two queries to return the output. However, unlike the XML version of the script, the distribution of the workload in the Dynamic SQL option is shared equally between the two queries.


Figure 16

Conclusion

In this article we’ve had a look at available options to transposing rows into columns in T-SQL. The PIVOT option was shown to be the simplest option with a less expensive query plan but it capabilities were limited when it came to addressing dynamic column expansion. We then looked at the CURSOR as a possible option to transposing rows into columns and found that it addressed all the limitations that were found in the PIVOT yet it ability to generate different query plans for every FETCH NEXT operation was its very own performance disadvantage. Finally, the XML and the Dynamic SQL options proved to be the best optimal options in terms of transposing rows into columns with favourable performance results and effective handling of vertical and horizontal expansions.

References:

Worktables
Statistics
Using PIVOT and UNPIVOT


Sifiso W. Ndlovu

Sifiso W. Ndlovu

Sifiso is a Johannesburg based certified professional within a wide range of Microsoft Technology Competencies such SQL Server and Visual Studio Application Lifecycle Management.

He is the member of the Johannesburg SQL User Group and also hold a Master’s Degree in MCom IT Management from the University of Johannesburg.

He currently works for Clientele Life as an Assistant Manager in Business Software Solutions.

View all posts by Sifiso W. Ndlovu
Sifiso W. Ndlovu
Business Intelligence

About Sifiso W. Ndlovu

Sifiso is a Johannesburg based certified professional within a wide range of Microsoft Technology Competencies such SQL Server and Visual Studio Application Lifecycle Management. He is the member of the Johannesburg SQL User Group and also hold a Master’s Degree in MCom IT Management from the University of Johannesburg. He currently works for Clientele Life as an Assistant Manager in Business Software Solutions. View all posts by Sifiso W. Ndlovu

90,918 Views
  • Ragavendra Ph

    Great One.. Thanks a lot

  • Binoy Vudi

    Brilliant article !! Thank You Sifiso ! And what a great website ! Im creating an account so I can post/subscribe.

  • umairaslam22

    Very nice and informative indeed , comparison is excellent and you covered all methods to take rows into columns.

  • Inayat

    Nice article, but what if my sub table has more than one column? suppose i want to show all records or my sub table as cols in my main tables?

    • Sifiso Ndlovu

      Hi Inayat,

      Thanks for getting in touch, let me play around with that scenario and get back to you, shortly.

      • MVANO NICHOLAS MHLAWULI

        Hi Sifiso. Thank you for this article, any news on Inayat’s question? Thanks 🙂

        • Sifiso Ndlovu

          Just to be clear @disqus_rcw3cbN8nH:disqus @mvanonicholasmhlawuli:disqus you are both referring to the pivot option #1. You just want to see an example with more than 1 pivoted row?

  • GC

    Very Nice Article. Is there a way you can show how to view the output of the cursor as a table format instead of Print Message?

    • Sifiso Ndlovu

      thanks GC, let me have a look and I will post a sample code shortly

      • GC

        Thanks Sifiso..