Frank Solomon
Run the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure with a T-SQL EXEC statement, and see the result set.

Lever T-SQL to dynamically define duplicate SQL Server database table rows

May 11, 2020 by

Introduction

Lever T-SQL to handle duplicate rows in SQL Server database tables article highlighted T-SQL features that detect and handle duplicate SQL Server table rows. The techniques work well, but they rely on fixed duplicate row definitions. This article extends those techniques, showing how to define duplicate rows in a dynamic way.

The Sample Database

For this article, we’ll slightly modify the OFFICE_EQUIPMENT_DATABASE database,first described in the earlier article. I built this new database version in SQL Server 2014 Standard Edition, on an updated Windows 10 PC. For this article, we’ll add some rows to the original OFFICE_EQUIPMENT table, and we’ll add a stored procedure. Run this script to build the complete database:

See the earlier article for more details about the script, the table, and the table structure. This article will focus on the T-SQL features in the stored procedure, and will describe how these features offer a flexible, dynamic way to define duplicate table rows.

The Stored Procedure in action

The OFFICE_EQUIPMENT_DATABASE database has one stored procedure DYNAMIC_ROW_NUMBER_PARTITIONS with this code:

The DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, with a T-SQL PARTITION BY clause that levers the IIF() function.

As seen in the above screenshot, the stored procedure has one integer data type parameter @PARTITION_COLUMN_LIST at line 4. We can manually “paint” and run lines 11 through 14 to test the stored procedure. This screenshot runs line 11 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 000001 with a 000001 argument value:

Testing the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, with a T-SQL EXEC statement.

The result set defined duplicate rows based on the PURCHASE_DATE column values. Later on, we’ll see how different argument values drive these definitions. This screenshot runs line 13 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001011 with a 001011 argument value:

Testing the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, with a T-SQL EXEC statement.

The result set defined duplicate rows based on the below column values:

  • OFFICE_EQUIPMENT_DESCRIPTION
  • PURCHASE_QUANTITY
  • PURCHASE_DATE

The stored procedure filters out rows with ROW_NUM values below 1, with the T-SQL WHERE clause at line 35. The stored procedure does not necessarily need this filter, but with it, we can more clearly see how the engineering works.

If we comment out line 35 and run line 12 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001, the result set defines duplicate rows based on the OFFICE_EQUIPMENT_DESCRIPTION PURCHASE_DATE column values. The stored procedure returns a result set with 24 rows, as shown in this screenshot:

Comment the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, and then run it.

If we restore the line 35 T-SQL WHERE clause and again run line 12, the stored procedure returns the result set shown in this screenshot:

A DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure result set.

If we comment out line 35 and run the stored procedure with line 14 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001 the result set defines duplicate rows based on the OFFICE_EQUIPMENT_NAME and PURCHASE_DATE column values. Compared to EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001 this result set looks very different, but it still returns 24 rows, as shown in this screenshot:

Comment the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, and then run it.

If we restore the line 35 T-SQL WHERE clause and again run line 14 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001, we will get a different result set, as seen in this screenshot:

A DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure result set.

Clearly, the line 35 WHERE clause helps show how the stored procedure works, because when we disable it, the stored procedure returns 24 rows, for all input parameter values. In these 24-row result sets, the ROW_NUM values show that the stored procedure builds different “groups” based on the different @PARTITION_COLUMN_LIST values. This behavior makes it harder to see how the stored procedure operates.

The Stored Procedure Engineering

In this article, the first screenshot showed the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, but as we study its engineering, it will help to see the screenshot again:

The DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, with a T-SQL PARTITION BY clause that levers the IIF() function.

Lever T-SQL to handle duplicate rows in SQL Server database tables article explained that a T-SQL query with ROW_NUMBER() and PARTITION BY clauses can’t directly filter its result set with a WHERE clause. However, the article also showed that if we place that query in a FROM clause subquery, and then SELECT from it, we can add a WHERE clause to the outer query. With this approach, we can filter the result set. In the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, the subquery, between lines 21 and 31, has a ROW_NUMBER() clause at line 23. It also has a PARTITION BY clause between lines 24 and 29. In the stored procedure, the line 35 WHERE clause filters the result set that the subquery returns. At lines 17 and 18, the outer query T‑SQL SELECT clause selects columns from the inner subquery. Line 34 aliases the subquery as ‘TMP’.

The PARTITION BY clause, seen here between lines 24 and 29, builds the duplicate row definition. Each PARTITION BY clause line extracts a specific parameter digit, and indirectly maps to one column in the SELECT clause of lines 21 and 22. We’ll focus on line 25 as an example, and work from the inside out to see the technique.

The DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure returns result sets with these six columns:

  1. OFFICE_EQUIPMENT_ID
  2. OFFICE_EQUIPMENT_NAME
  3. OFFICE_EQUIPMENT_DESCRIPTION
  4. PURCHASE_PRICE
  5. PURCHASE_QUANTITY
  6. PURCHASE_DATE

The test code examples of lines 11 to 14 use six-digit integer arguments:

In the T-SQL PARTITION BY clause, we can map the individual digits in the arguments to specific columns. Then, the PARTITION BY clause can use those values to define duplicate rows, if we use only ones or zeros for each digit in the original argument values.

For example, EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001 sets @PARTITION_COLUMN_LIST to 001001. With this value, line 25

maps digit 2 to the OFFICE_EQUIPMENT_NAME column. We’ll unpack this line from the inside out.

The @PARTITION_COLUMN_LIST / 10000 calculation divides @PARTITION_COLUMN_LIST by 10000, and truncates, or throws away, the remainder. Dividing a six-digit integer by 10000 in this way removes the four digits on the right, as seen in this screenshot:

Example T-SQL statements: the / (DIVISION) operator.

The next function RIGHT((@PARTITION_COLUMN_LIST / 10000), 1) extracts the first right digit of 00100 – in this case, a zero (0). The third query in the previous screenshot shows that we need the T‑SQL RIGHT() function, to extract the specific right-most digit. Finally, the IIF() function on the “outside” looks at the value that the T-SQL RIGHT() function returns, as seen here:

The IIF() function short-hands the SQL Server CASE expression. As used here, if the RIGHT() function returns 1, or TRUE, IIF() returns OFFICE_EQUIPMENT_NAME and places the OFFICE_EQUIPMENT_NAME column in the PARTITION BY column list. When the RIGHT function returns 0, or FALSE, IIF() returns NULL. The stored procedure ignores these NULL values as it dynamically builds the PARTITION BY clause, and when IIF() returns a NULL value, the stored procedure never sees the column name in the IIF() function. As a special case, line 29

avoids the division calculation, because the RIGHT() function can directly extract the far-right digit of the @PARTITION_COLUMN_LIST value. With this technique, a six-digit integer maps to all six columns. For example, 001001 maps to all six columns as seen in this diagram:

Mapping @PARTITION_COLUMN_LIST component digits to DYNAMIC_ROW_NUMBER_PARTITIONS subquery columns.

The T-SQL PARTITION BY clause requires at least one column in the line 31 ORDER BY clause, but this ORDER BY clause does not operate on the finished result set. It independently operates inside each row partition, sorting by the columns specified in its column list. As explained above, the line 35 T-SQL WHERE clause removes rows with ROW_NUM values of 1.

The Order of PARTITION BY Columns

The technique behind this stored procedure, in part, involves the below:

  1. The PARTITION BY column list, between lines 24 and 29
  2. The ORDER BY clause, at line 31
  3. The WHERE clause, at line 35

We can easily add an ORDER BY clause after line 35. The stored procedure, and the engineering behind it, all work in a consistent way. However, we should keep the listed aspects in mind, because stored procedures that use them can build result sets that might seem unexpected.

If we run the stored procedure we saw above with this statement EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001, we will get this result set as seen in this screenshot:

Run the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure with a T-SQL EXEC statement, and see the result set.

We can exchange lines 26 and 29, as seen here:

To use:

When we run the stored procedure with the SQL Server statement used above EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001, we’ll get this result set:

Result set

Because of the ways, the factors listed above can interact, combined with a potential ORDER BY clause at the outer query, we need to test queries and stored procedures built with this technique, to verify that they operate as expected.

Conclusion

The PARTITION BY clause gives us a powerful tool, right out of the box, to define duplicate rows in a database table. This article showed how to combine the PARTITION BY clause with other available SQL Server tools, to define duplicate rows in a dynamic, granular way.

Frank Solomon
168 Views