Timothy Smith
Image 4

CTEs in SQL Server; Using Common Table Expressions To Solve Rebasing an Identifier Column

January 31, 2019 by

Since we know that the SQL CTE (common table expression) offers us a tool to group and order data in SQL Server, we will see an applied example of using common table expressions to solve the business challenge of re-basing identifier columns. We can think of the business problem like the following: we have a table of foods that we sell with a unique identifier integer associated with the food. As we sell new foods, we insert the food in our list. After a few years, we observe that many of our queries involve the foods grouped alphabetically. However, our food list is just a list of foods that we add to as needed without any grouping. Rather than re-group or re-order through queries using a SQL CTE or subquery, we want to permanently update the identifier.

Because we prefer to use SARGable operators, grouping related items may be a task that will help some of our queries. One way we can solve this is by adding a new grouping field and using it. However, if we’re required to re-base our identifier, adding a new column will only be part of the solution.

Our example data set

For this tip, we’ll be solving this problem using the below example tables with organic foods and an orders table that holds orders from our organic foods table. To provide an extra check, I’ve organized the orders table where every order purchased 2 items and orders are either vegetable or fruit orders. This helps as another check as we work through the problem. We can use this same re-base technique involving common table expressions with any other set of tables where we’re required to re-base an identifier field with some considerations about how our tables are organized:

  1. If we are using foreign key references that are part of the identifier that is being re-based, we must first remove the foreign keys before we proceed. The same applies to other constraints
  2. Since we’re looking at identifiers, this means that primary keys will be involved in at least one table’s re-basement, which affects dependencies, such as replication, auditing, etc. We must first solve these dependencies before proceeding

For populating our experiment data, we’ll execute the below code and review:

Rebasing an identity with Common table expressions

One principle in math is to show every step and to make this understandable, we will be doing this in this tip with the help of SQL CTEs, especially with ordering data before we rebase data. We will add one column to our tbOrganicFoodsList table and one column to our tbOrganicFoodOrders table. If we have any dependencies on our primary or foreign keys here, we would first either remove them (drop and re-create script for later) or disable them. Our next step will be dropping the foreign key and then the primary key.

Now that we’ve removed our constraints, we’ll begin by adding a column to our tbOrganicFoodsList that will save our identifier field before we re-base it. This will help us avoid column re-naming by creating a new column that’s organized, dropping the old column, and then renaming our new column. That is also an acceptable way to solve this challenge.


Our food list with the new column added that we’ll use as a reference.

Using a Common table expression named UpdateOrder, in the below code we order the foods alphabetically in our UpdateOrder CTE and call it an UpdateId as well as select our OrganicFoodId, which we’ll be updating to this new order. From our select statement, we can see the new order of our OrganicFoodId versus the OldId. This is a key step if we have old archives on disk backup files or other servers, as this table will allow us to compare the data from our re-based data to our original data, assuming we don’t rebase archived data. Because of this reason, we’ll notice that we save a copy of this table in this step – the saved data is RebasedDataKey_tbOrganicFoodsList.


Now, our OrganicFoodId is ordered alphabetically and we see the old order.

In our next step, we’ll look at our tables joined and see the results of the OrganicFoodId and OldId with the orders that we’ll be using to create a SQL CTE to update the food list. Since I intentionally grouped the orders so that we could perform a quick check, we can see how the new OrganicFoodId that’s been re-based will appear on the tbOrganicFoodOrders table. From here, we use the same join and only select the two OrganicFoodId columns – one from the table we’ve re-based (tbOrganicFoodsList) and the other from the table we’ll be updating (tbOrganicFoodOrders). To make this clear, I’ve titled the column in the SQL CTE to what they’ll be used for – one needs to be re-based (NeedsRebase) and it will be updated and the other is the base which will be the reference point for the update. We then see the update run against this CTE in SQL Server, which updates the appropriate rows of the tbOrganicFoodOrders table.


We see the order in our query and we also see how naming in the SQL CTE makes it easy to run an update.


Our orders with the new order in place.

Our final step is to remove the OldId column and add our primary key and foreign key constraints:

Some considerations

  • What if we wanted to group the items alphabetically within a category, like alphabetical list of vegetables, fruits, etc? We’d take the same logic only we’d partition it further by more detailed groups. As we’ve seen with CTEs in SQL Server, these make it easy to organize data and work intuitively to update columns from new columns or new designs
  • Since it’s impossible to have all information up front, on the initial design of our tables a best practice to avoid this problem is to consider the columns that will be used in queries and how much space should be allowed between these values. Using the same example with a food list, for each group, we may want up to 10000 or more of range values for each category, even if we only use a fraction of those values. We can apply math operations to SQL CTE order, such as the following creating a grouping identifier on sys.tables:

We have two groups of tables – MS shipped and non-MS shipped and notice our GroupId range of 1000 tables between them.

  • Of the steps required, the most manual will be re-basing the primary table, as our re-basement is determined by how we want to group: alphabetically, by category or status, by range, etc. For instance, if we had 1000 foods and wanted to group them by color, the step of grouping would be manual to get our table with the new grouping. Unfortunately, SQL CTEs and subqueries can’t help us for custom grouping, but once we have them, we can use them to assist with ordering other tables. From there, we would follow the same plug-and-play pattern of updates of the identifier fields through joins on the old identifier field

Conclusion

Because we don’t always know how our data will be used by clients, rebasing an identity field to optimize queries against data comes up in some situations. While there are many approaches to solving this problem, common table expressions offer us one tool to organize our data and solve for this problem. In addition, if we need to add extra “space” to our identifier column or columns, we can use mathematical operations with SQL CTEs to give our data space to grow

Table of contents

CTEs in SQL Server; Querying Common Table Expressions
Inserts and Updates with CTEs in SQL Server (Common Table Expressions)
CTE SQL Deletes; Considerations when Deleting Data with Common Table Expressions in SQL Server
CTEs in SQL Server; Using Common Table Expressions To Solve Rebasing an Identifier Column
Timothy Smith
Common Table Expressions (CTE), T-SQL

About Timothy Smith

Tim manages hundreds of SQL Server and MongoDB instances, and focuses primarily on designing the appropriate architecture for the business model. He has spent a decade working in FinTech, along with a few years in BioTech and Energy Tech. He hosts the West Texas SQL Server Users' Group, as well as teaches courses and writes articles on SQL Server, ETL, and PowerShell. In his free time, he is a contributor to the decentralized financial industry. View all posts by Timothy Smith

168 Views