Sifiso Ndlovu

ETL optimization using SQL TRY Functions

October 24, 2017 by

Introduction

An enterprise data warehouse ETL solution typically includes, amongst other steps, a data transformation step that converts source data from one data type into another. It is during this step that type conversion errors may occur and depending on the type of exception handling techniques implemented in the ETL solution (or lack thereof), frustration may occur for both ETL developers and DBAs when trying to identify and resolve type conversion errors. In this article we take a look at a trio of SQL TRY built-in functions that were introduced in SQL Server 2012, namely, TRY_PARSE, TRY_CAST, and TRY_CONVERT and how they could be utilized to reduce type conversion errors in ETL solutions and thereby saving developers needless troubleshooting exercise.

Challenge

Likewise, the benefits of the SQL TRY functions TRY_PARSE, TRY_CAST, and TRY_CONVERT functions in an ETL solution are better realised by firstly demonstrating the existing limitations of their PARSE, CAST and CONVERT counterparts. To demonstrate the existing limitations, we make use of a football (soccer) related analogy wherein we use data from Table 1 to populate a Nominees dimension that will store the top 3 nominees for the 2016 FIFA Ballon d’Or.

Nominee Club Jersey Number Votes Date of Birth Place of Birth Nationality Height
Antoine Griezmann Atletico Madrid 7 198 21 March 1991 Mâcon France 1.75
Lionel Messi FC Barcelona 10 316 24 June 1987 Rosario Argentina 1.70
Cristiano Ronaldo Real Madrid 7 745 05 February 1985 Funchal Portugal 1.85

Table 1: Sample 2016 FIFA Ballon d’Or Nominee Data

The steps that will be followed to load the Nominees dimension will be according to the ETL acronym which is Extract Transform and Load. During the Extract step, we would import data from Table 1 into a staging table object with all fields defined as variable characters (VARCHAR). The next step – Transform – involves type conversion of fields from our staging table into other data types. For the purposes of this demo, this step converts the data type of the fields highlighted in Table 2.

 Field Staging Dimension
 Nominee Varchar Varchar
 Club Varchar Varchar
 Jersey Number Varchar INT
 Date of Birth Varchar Date
 Place of Birth Varchar Varchar
 Nationality Varchar Varchar
 Height Varchar Numeric (3,2)

Table 2: Staging to dimension (target) mapping

Unlike the Extract step, the Transform step can be implemented in several ways including using a Data Flow Task to convert data between Source and Destination components; using the Data Conversion transformation task to convert data types; or by simply using a T-SQL script in a form of a SQL Server stored procedure or view that employs SQL CONVERT and SQL CAST functions to perform type conversion. Figure 1 indicates how Data Conversion transformation task in SSIS can be used to convert data as per the requirement in Table 2.

Data Conversion transformation

Figure 1

On the other hand, Script 1 demonstrates a T-SQL equivalent option for performing type conversions.

Script 1

The challenge with the traditional T-SQL type conversion function occurs whenever a type conversion is unsuccessful due to an invalid input data format. For instance, say we alter values in the [Date of Birth] staging column to add a suffix “_SQLSHack” as shown in Figure 2.

T-SQL type conversion

Figure 2

Such a change to the [Date of Birth] staging column causes both the SSIS and T-SQL Transform steps to break as per the error messages shown below:

Msg 241, Level 16, State 1, Procedure sp_loadDim, Line 3 [Batch Start Line 21]
Conversion failed when converting date and/or time from character string.

SSIS and T-SQL Transform steps

Figure 3

As a result of the above type conversion errors, the entire ETL solution is likely to break, which is sometimes not an ideal situation, particularly if you only have one window in a day to run your ETLs. It would mean that a mere data type conversion error could result in you having to wait for another day to get an opportunity to run your ETLs again, which could be very inconvenient not just for you but for your business users too.

Solution

Whilst there could be errors in your ETL that are unpredictable thus unavoidable, you could avoid type conversion errors by redirecting all errors into another destination in your Data Flow Task in SSIS. You could also refactor your T-SQL type conversion script to make use of ISDATE and ISNUMERIC functions in order to firstly check whether what you are attempting to convert is in fact of expected data type. However, in this article, I would like to further propose another simpler approach to avoiding type conversion errors which is to make use of the SQL TRY functions such as TRY_PARSE, TRY_CAST, and TRY_CONVERT functions.

Script 2 shows an updated version of Script 1 which replaces the traditional type conversion T-SQL functions with SQL TRY functions TRY_PARSE, TRY_CAST, and TRY_CONVERT functions. Thus, instead of returning an error message shown in Figure 3, a NULL value is returned for all [Date of Birth] data that couldn’t be successfully converted, as shown in Figure 4. This is to indicate that an attempt to convert these values to date were unsuccessful, but instead of throwing an error message, a NULL was returned. This is in essence the benefit of these Try functions, instead of throwing errors, they all return a NULL for the values they couldn’t successfully convert.

Script 2

NULL values in the output of convert functions

Figure 4

In terms of arguments, both TRY_PARSE and TRY_CONVERT functions require the same number of mandatory and optional parameters as was required when using PARSE and CONVERT functions, respectively. However, compared to the CAST function, the TRY_CAST requires two additional parameters. In terms of performance, the TRY_PARSE function may incur additional SQL Server performance overhead as it is not a native SQL Native function. It also doesn’t support type conversion to XML, TEXT and VARBINARY data types.

However, one advantage that SQL TRY function TRY_PARSE has over TRY_CAST and TRY_CONVERT is that it can successfully perform a type conversion whilst TRY_CAST and TRY_CONVERT would have returned a NULL value for the very same input. For instance, according to timeanddate.com, Lionel Messi’s date of birth (1987-06-24) was on a Wednesday. Now, say we alter the date format of his [Date of Birth] staging value into Wednesday, 24 June 1987 and update our T-SQL type conversion script as per Script 3, we get an output shown in Figure 5, indicating that only the SQL TRY function TRY_PARSE managed to successfully convert Messi’s date of birth.

Script 3

SQL TRY functions TRY_PARSE

Figure 5

Another thing to look out for when using these new functions is that, if your ETL involves connections to SQL Azure, then you will have to replace SQL TRY function TRY_CONVERT with either TRY_PARSE or TRY_CAST as TRY_CONVERT is not supported in Azure. Finally, although the biggest advantage of TRY_PARSE or TRY_CAST as TRY_CONVERT functions over their PARSE, CAST and CONVERT counterparts is that whenever they cannot convert a value they return NULL values instead of throwing an error, this is not always true, particularly when performing explicit type conversions using TRY_CAST and SQL TRY function TRY_CONVERT functions. For instance, the execution of Script 4 causes the following error:

Msg 529, Level 16, State 2, Line 2
Explicit conversion from data type int to xml is not allowed.

Script 4

Summary

There are failures during ETL loads that shouldn’t prevent the rest of the ETL from executing and type conversion error is one such failure that should be avoided whenever you can. In this article we have demonstrated that SQL TRY function TRY_PARSE or TRY_CAST as TRY_CONVERT functions make it easy for ETL developers to avoid type conversion errors that could break your entire ETL run.

Sifiso Ndlovu
168 Views