Sifiso Ndlovu

SQL Server pivoting on non-numeric data types

February 19, 2016 by

Introduction

In the article, Multiple Options to Transposing Rows into Columns, I covered various options available in SQL Server to rotating a given row into columns. One of the options included the use of a PIVOT relational operator. The mandatory requirement of the operator is that you must supply the aggregate function with only a numeric data type. Such a mandatory requirement is usually not an issue as most aggregations and subsequent pivoting is performed against fields of numeric data type. However, sometimes the nature of business reporting requests may be such that you are required to cater for pivoting against non-numeric data types. In this article we take a look at how you can deal with such requirements by introducing a workaround to pivoting on non-numeric fields.

Similarly to the aforementioned article, a denormalized relational representation of our case study looks as shown in Table 1:

RecKey Policy PolType Effective Date DocID DocName Submitted Outstanding Submitted Date Captured By ID
1 Pol002 Hospital Cover 2007/10/01 1 Doc A NULL 1 NULL NULL
2 Pol002 Hospital Cover 2007/10/01 4 Doc B NULL 1 NULL NULL
3 Pol002 Hospital Cover 2007/10/01 5 Doc C 1 NULL 2016/01/13 1
4 Pol002 Hospital Cover 2007/10/01 7 Doc D 1 NULL 2016/01/14 2
5 Pol002 Hospital Cover 2007/10/01 10 Doc E 1 NULL 2016/01/15 1

Table 1: Denormalized relational representation of our claims business study

Workaround #1: Pivot on Dates

A simple business case that can be used to address this workaround is that business may be trying to determine a breakdown of documents captured by date. Ultimately, the aim of such a business case would to have a view of data that looks as shown in Table 2:

Policy PolType Effective Date Doc A Doc B Doc C Doc D Doc E
Pol002 Hospital Cover 01-Oct-07 0 0 13 Jan 2016 14 Jan 2016 15 Jan 2016

Table 2: Pivot on Dates

The first step to pivoting on date fields is to convert your date field to an integer value. Figure 1 shows how we have achieved this through a derived integer field, [Submitted Date INT], which is based off the [Submitted Date] field.


Figure 1: Converting SQL Server date field to integer value

The complete pivot script is depicted in Figure 2:


Figure 2: Pivot on date script

The execution of the script in Figure 3 result into the data shown in Table 3:

Policy PolType Effective Date Doc A Doc B Doc C Doc D Doc E
Pol002 Hospital Cover 01-Oct-07 0 0 20160113 20160114 20160115

Table 3: Results of date pivoted script

However, often business people prefer to view data related dates in a familiar format (i.e. dd/mm/yyyy, dd-mm-yyyy etc.). So we can further extend our script to format the dates according to business friendly format.

There are two ways to achieve this:

1. Formatting SQL Server Date Using Date Dimension

If you write your scripts within a data warehouse or have access to a data warehouse or have a database environment that has a date dimension, then you can make use of the dimension to format the date. Date dimension is a database object that is setup according to a Ralph Kimball’s Data Warehouse design methodology. The dimension usually contain attributes such as date name, quarter, year, and calendar months. Date dimension allows for the breakdown of report data by period over time.

Figure 3 shows some of the content contained in a date dimension.


Figure 3: Sample date dimension data

One of the common ways to lookup data in the date dimension is to use a numeric date key. In our case, we already have derived date column ([Submitted Date INT]) which is in an integer format. Figure 4, shows our revised pivoted script joined to the date dimension.

As it can be seen there are several instances of the dim date to the fields that we are using as pivots. For instance, the first condition is to join dim date to doc a and so on. As argued in the aforementioned article, this way of doing this only suffer from one thing: it is not dynamic – meaning, if business later introduce Doc F as a document that should be submitted in order to process a claim, than a developer would have to refactor the script to include the new doc column and another join condition to dim date to display a custom date format for the newly added column.


Figure 4: Pivoting on date using date dimension

The results of the script depicted in Figure 4 are shown in Table 4:

Policy PolType

Effective Date

Doc A Doc B Doc C Doc D Doc E
Pol002 Hospital Cover 01 Oct 2007 0 0 13 Jan 2016 14 Jan 2016 15 Jan 2016

Table 4: Output of pivoting on date using date dimension

2. Convert Integer to SQL Server Date

An alternative to converting a SQL Server date is not to use a join to date dimension instead you convert the numeric values back to business friendly date using T-SQL Convert function. This is done through a nested convert function in which you first convert the numeric value to variable character (varchar), then convert varchar value to date (which is usually in the format yyyy-mm-dd) and finally convert the format of the date to dd mmm yyyy. The complete syntax looks as shown in Figure 5:


Figure 5: Convert numeric value to date

The complete script is shown in Figure 6.


Figure 6: Pivoting on date using convert function

Workaround #2: Pivot on Varchar Data Types

In my experience working within an insurance industry, I have noticed that there are often business users whose job is solely to receive and capture claim documents submitted by clients. Thus, another fitting business case would be to have a breakdown of claim documents captured by users as shown in Table 5.

Policy PolType Effective Date Doc A Doc B Doc C Doc D Doc E
Pol002 Hospital Cover 01 Oct 2007 0 0 Jane Doe John Doe Jane Doe

Table 5: Breakdown of claim documents captured by users

Similar to pivoting on date data types, the key to pivoting on varchar is identifying a field that can be used as an input parameter in the aggregate function. In our case, every user is assigned a numeric user ID as shown in Figure 7 (this user id can then be used to look-up user full name).


Figure 7: Case study with documents captured by ID information

The revised script that now pivots on captured by ID is shown in Figure 8. It can also be seen that after the pivoting of data, we then join back on user id to the User dimension to retrieve associated user full name.


Figure 8: Pivot against varchar data type script

Conclusion

In this article we’ve had a look at available workarounds to pivoting using non-numeric fields. The key to pivoting using non-numeric fields is that you need to find a way to trick the aggregate function as it strictly takes in numeric values. Thus you ought to find a way to either find a numeric surrogate field (i.e. corresponding user id of user field) or derive such a field (i.e. converting date to integer) to be used as input parameters in the aggregate function.

Reference

Sifiso Ndlovu
168 Views