Matija Lah

String Concatenation Done Right – Part 2 – An Effective Technique

April 16, 2015 by

Previously I have introduced string concatenation by demonstrating how not to use Transact-SQL to concatenate strings. In this part I will focus on XML composition, and will demonstrate how to correctly implement it in order to concatenate strings.

I assume that you already understand what XML is, and how it is implemented in SQL Server. You might not know this, but I have written extensively on this very subject in my “regular” blog, at http://milambda.blogspot.com (there is plenty to choose from, once you locate it: http://milambda.blogspot.com/search?q=xml). In the very unlikely case that you might not be a fan of XML, stay tuned – part three of this article is XML-free.

In order to properly utilize XML it needs to be treated as what it is – a complex data type. Once an XML document (or fragment) is composed, the only reliable way of transforming it is to use the built-in XML retrieval methods. So, in order to correct the query I used at the end of part one of this article, I will use the QUERY and the VALUE XML retrieval methods.

Figure 1: XML composition with XML retrieval methods.

The query in Figure 1 retrieves three attributes of a particular book from the dbo.books table and uses a correlated subquery with the FOR XML directive to retrieve the related data about authors from the dbo.authors and dbo.titleauthor tables, which is then concatenated into a single string in the authors column of the outer query. The FOR XML subquery uses the TYPE setting, which instructs the database engine to return an XML document. The QUERY XML retrieval method is then used to extract the concatenated string from the XML document, and finally the VALUE method is used to return the data as a VARCHAR(64) value.

The FOR XML directive can be used to create a concatenated string from data of any type, as long as it can be represented as a string. By using the BASE64 setting of the FOR XML directive you can even represent binary values as strings with a standard, and reversible, conversion technique.

Depending on the way the results of this query are going to be used, you might need some way of distinguishing between the values being concatenated and the values used to delimit them (for instance, to be able to create a normalised representation of the data at the destination). In the example above I’ve used the semi-colon to delimit the co-authors’ names, making it possible to distinguish between individual full names.

The SQL Server implementation of XML composition not only guarantees that the concatenated result will contain all of the corresponding values and in the correct shape, it also guarantees that they will be placed in the XML document in the specified order. In case the concatenated values need to be sorted, with XML composition you have complete control over the ordering: simply use the ORDER BY clause in the FOR XML query, which is also demonstrated in Figure 1.

The FOR XML subquery technique is completely integrated with the database engine; it is therefore governed by exactly the same principles as the outer, “pure”, T-SQL query. All of the regular T-SQL constructs are at your disposal to shape, sort, and correlate, the construction of the delimited string in accordance with your particular requirements.

Of course, you do not have to restrict the use of the XML subquery to the composition of strings; if the client application expects additional information from the database (such as metadata, or related sets) you can certainly use XML documents to pass the data back to the client as a nested “table”, not just a one-dimensional set. In contrast to returning multiple result sets, where multiple queries would need to be restricted by the same criteria, the XML approach might even significantly reduce the I/O as the data is queried once, with the correlated set returned as a column of the single result set.

This might come very handy in situations where the consumer is able to process only a single result set (a single table) that has been returned from the data source.

For instance, the query shown in Figure 2 returns a list of books written by a particular author, including a list of any co-authors.

Figure 2: Titles written by a particular author, including any other authors.

The result of the query, shown in Figure 3 is a single result set, containing an XML column with two additional result sets.

Figure 3: The XML document with the list of books written by a particular author, including a list of co-authors.

At first glance, the string concatenation technique based on XML composition can seem a bit difficult to understand for something seemingly as easy as “sticking a bunch of strings together”, but it is actually quite intuitive. If you have a good understanding of T-SQL queries, and once you get past the occasional quirkiness of the XPath/XQuery syntax, XML composition will manifest itself as a very useful tool.

In part three of this article I will demonstrate another reliable, and efficient, string concatenation technique using native SQL Server capabilities. Note though, that if your DBA dislikes XML, he is going to absolutely loathe the next approach.

Homework

In Figure 1 you have seen an example of how XML composition can be used to represent a set of values as a delimited string. It is now your turn to practice this technique, so for your homework I want you to use XML composition to alter the above query in the following way:

  • In the authors column create a comma-delimited list of each author’s order number (the au_ord column), first name (au_fname), last name (au_lname), phone number (phone), and address attributes (address, city, state and zip).
  • The address will be composed from multiple properties; format it as you see fit, but enclose it in double quotation marks.
  • For books written by multiple authors, make sure the attributes of each author are placed on a separate row of the resulting string.

As shown in Figure 1, the final result set of your query must still contain one row per book title.

The pubs sample database used in this article can be found online, at http://www.microsoft.com/en-us/download/details.aspx?id=23654.


Matija Lah

Matija Lah

Matija Lah, formally a university graduate of law, has been involved in information management since the nineties.

His first introduction to SQL Server in IUS Software d.o.o. Slovenija has later culminated in a new career developing data-centric solutions and consulting. His contributions to the SQL Server community have led to the Microsoft Most Valuable Professional award in 2007 (Windows Server System SQL Server).

Currently, most of Matija's time is spent on projects involving advanced information management, natural language processing, and knowledge management.

View all posts by Matija Lah
Matija Lah
String Concatenation

About Matija Lah

Matija Lah, formally a university graduate of law, has been involved in information management since the nineties. His first introduction to SQL Server in IUS Software d.o.o. Slovenija has later culminated in a new career developing data-centric solutions and consulting. His contributions to the SQL Server community have led to the Microsoft Most Valuable Professional award in 2007 (Windows Server System SQL Server). Currently, most of Matija's time is spent on projects involving advanced information management, natural language processing, and knowledge management. View all posts by Matija Lah

651 Views