Converting commas or other delimiters to a Table or List in SQL Server using T-SQL

Database developers often need to convert a comma-separated value or other delimited items into a tabular format. Delimiters include pipe “|”, hash “#”, dollar “$” and other characters. SQL Server compatibility level 130, and subsequent versions, support a string_split() function to convert delimiter-separated values to rows (table format). For compatibility levels under 130, developers have previously done this with a user-defined function, which incorporates a While loop or Cursor to extract the data.

Why is comma-separated input required for a procedure or T-SQL query?

Execute a program in an iteration using a loop is a common method in the back-end application as well as the database. When iteration count is smaller than expected, the execution time can be less however the number of iterations is bigger which leads to longer processing time. For example, if the application is fetching data from a database with executing a piece of program in a loop by parsing different input parameter application will need to wrap the database response of each iteration. To handle this type of situation, we can execute the same program with a single execution with parsing the comma or delimiter separated values as an input parameter in the database and make it to tabular format in the T-SQL program to proceed with further logic.

Today, many platforms have been changed to micro-service architecture. Not only application but database design could also be structured in micro-service-based architecture and most of the internal communication between the databases is done with integer references only. That is the reason to get some tricky way to achieve better execution from the database side.

Multiple inputs to the same parameter can be accomplished with comma-separated values in the input parameter of the stored procedure or input to the tabular function, and used with the table in a T-SQL statement. There may be more situations for this strategy when using T-SQL. While programming with SQL Server, a developer must break a string into elements using a separator. Using the split function against the string defines the separator as an input parameter. The entire string will be split and returned as a table. In certain circumstances, the input parameter can be a combination of two or more character-separated values for each set of input. For instance:

N’203616, 198667, 193718, 188769,…’ N’1021|203616$1021|198667$1022|193718$1022|188769$…’ N’1021|203616,198667$1022|193718,188769$…’

There are advantages to single-thread execution with comma-separated values over multi-thread execution in SQL Server:

Single execution for part of the program, resulting in the number of input
Single read or write operation over the database tables instead of several iterations using a loop
One-time server asset use
No more hurdles on the back-end application to set up a response from multiple result sets
One-shot commit means that the database/SQL Server won’t block the transaction and slow things down. Thus, no blocking possibilities
Easy to oversee transaction consistency

Split function using loop

In most cases, a developer writes the table-valued function with a loop to extract the data from comma-separated values to a list or table format. A loop extracts the tuple one by one with the recursion. Performance issues can arise when working with the large input parameter since it will take more time to extract it. Here, we have a simple function with the name of split_string, which is using a loop inside the table-valued function. This function allows two arguments or input parameters (1 – input string and 2 – delimiter):

CREATE FUNCTION split_string

(

@in_string VARCHAR(MAX),

@delimeter VARCHAR(1)

)

RETURNS @list TABLE(tuple VARCHAR(100))

BEGIN

WHILE LEN(@in_string) > 0

BEGIN

INSERT INTO @list(tuple)

SELECT left(@in_string, charindex(@delimiter, @in_string+',') -1) as tuple

SET @in_string = stuff(@in_string, 1, charindex(@delimiter, @in_string + @delimiter), '')

end

RETURN

END

A result-set table returned by this function contains every substring of the input parameter string which is separated by the delimiter. The substrings in the table are in the order in which they happen in the input parameter. If the separator or delimiter doesn’t coordinate with any piece of the input string, then the output table will have only one component. The tabular function will return the result set in a row-column format from the comma-separated string.

1	SELECT * FROM split_string('1001,1002,1003,1004', ',')

Recent versions of SQL Server provide a built-in function string_split() to do the same task with the input parameters of the input string and delimiter. But there is another efficient way to perform this task using XML.

Split function using XML

The Split function using XML logic is a useful method to separate values from a delimited string. Using XML logic with the T-SQL based XQUERY approach results in the T-SQL not being characterized as a recursion using loop or cursor. Essentially, the input parameter string is converted to XML by replacing the XML node in the T-SQL articulation. The resulting XML is used in XQUERY to extract the node value into the table format. It also results in better performance.

For example, we have a comma-separated value with the variable @user_ids in the below T-SQL statement. In a variable, a comma is replaced by the XML tags to make it an XML data type document and then extracted using XQUERY with the XPATH as ‘/root/U’:

DECLARE @user_ids NVARCHAR(MAX) = N'203616, 198667, 193718, 188769, 183820, 178871, 173922, 168973, 164024, 159075, 154126, 149177, 144228, 139279, 134330, 129381, 124432, 119483, 114534, 109585, 104636, 99687, 94738, 89789, 84840, 79891, 74942, 69993, 65044, 60095, 55146'

DECLARE @sql_xml XML = Cast('<root>'+ Replace(@user_ids, ',', '')+ '</root>' AS XML)

SELECT f.x.value('.', 'BIGINT') AS user_id

INTO #users

FROM @sql_xml.nodes('/root/U') f(x)

SELECT *

FROM #users

In the above function, we have used the VARCHAR (100) data type for the extracted value rather than BIGINT. There can be changes in the input parameter string, too. For example, if the input string has one extra ‘,’ at the beginning or end of the string, it returns one extra row with 0 for the BIGINT data type and ” for VARCHAR.

For example,

DECLARE @user_ids VARCHAR(MAX) = N'203616, 198667, 193718, 188769, 183820, 178871,'

DECLARE @sql_xml XML = Cast('<root>'+ Replace(@user_ids, ',', '')+ '</root>' AS XML)

SELECT f.x.value('.', 'BIGINT') AS user_id

FROM @sql_xml.nodes('/root/U') f(x)

To handle this type of ” or 0 data in the result set, we can add the condition below in the T-SQL statement:

DECLARE @user_ids VARCHAR(MAX) = N'203616, 198667, 193718, 188769, 183820, 178871,'

DECLARE @sql_xml XML = Cast('<root>'+ Replace(@user_ids, ',', '')+ '</root>' AS XML)

SELECT f.x.value('.', 'BIGINT') AS user_id

FROM @sql_xml.nodes('/root/U') f(x)

WHERE f.x.value('.', 'BIGINT') <> 0

Here, f.x.value(‘.’, ‘BIGINT’) <> 0 excludes the 0 and f.x.value(‘.’, ‘VARCHAR(100)’) <> ” excludes the ” in the query result set. This T-SQL condition can be added into the table-valued functions with the number of input parameter to handle this extra delimiter.

Function:

CREATE FUNCTION split_string_XML

(

@in_string VARCHAR(MAX),

@delimeter VARCHAR(1)

)

RETURNS @list TABLE(tuple VARCHAR(100))

BEGIN

DECLARE @sql_xml XML = Cast('<root>'+ Replace(@in_string, @delimeter, '')+ '</root>' AS XML)

INSERT INTO @list(tuple)

SELECT f.x.value('.', 'VARCHAR(100)') AS tuple

FROM @sql_xml.nodes('/root/U') f(x)

WHERE f.x.value('.', 'BIGINT') <> 0

RETURN

END

Execution:

1 2	SELECT * FROM split_string_XML('203616,198667,193718,188769,183820,178871,173922,', ',')

Combination of characters in delimiter-separated string

But what about when the input parameter is a combination of two values with multiple separators? In this case, the input parameter must be extracted twice with the help of a subquery, as shown below:

DECLARE @in_string VARCHAR(MAX) = '1021|203616$1021|198667$1022|193718$1022|188769$'

DECLARE @sql_xml XML = Cast('<root>'+ Replace(@in_string, '$', '')+ '</root>' AS XML)

SELECT X.Y.value('(U)[1]', 'VARCHAR(20)') AS role_id,

X.Y.value('(U)[2]', 'VARCHAR(20)') AS user_id

FROM

(

SELECT Cast('<root>'+ Replace(f.x.value('.', 'VARCHAR(MAX)'), '|', '')+ '</root>' AS XML) AS xml_

FROM @sql_xml.nodes('/root/U') f(x)

WHERE f.x.value('.', 'VARCHAR(MAX)') <> ''

OUTER APPLY T.xml_.nodes('root') as X(Y)

This example is similar to the above example but we have utilized pipe “|” as a delimiter with a second delimiter, dollar “$”. Here, the input parameter is the combination of two-columns, role_id, and user_id. As we see below, user_id and role_id are in a separate column in the table result set:

We can use any character in the above function, such as a single character-separated string or combination of character-separated strings. A developer must simply replace the character in the T-SQL statements above. Follow the steps described here to convert any delimited string to a list. Simply remember that the XML function of SQL Server accepts a string input and the input string will be delimited with the specified separator using the tabular function.

Author
Recent Posts

Jignesh Raiyani

Jignesh has good experience in Database Solutions and Architecture, working with multiple customers on Database Design & Architecture, SQL Development, Administration, Query Optimization, Performance Tuning, HA and Disaster Recovery.

View all posts by Jignesh Raiyani

Latest posts by Jignesh Raiyani (see all)

Page Life Expectancy (PLE) in SQL Server - July 17, 2020
How to automate Table Partitioning in SQL Server - July 7, 2020
Configuring SQL Server Always On Availability Groups on AWS EC2 - July 6, 2020

Why is comma-separated input required for a procedure or T-SQL query?

Split function using loop

Split function using XML

Combination of characters in delimiter-separated string

Related posts: