Esat Erkec
Tempdb spill explanation

How to UPDATE from a SELECT statement in SQL Server

April 29, 2020 by

In this article, we will learn different methods that are used to update the data in a table with the data of other tables. The “UPDATE from SELECT” query structure is the main technique for performing these updates.

An UPDATE query is used to change an existing row or rows in the database. UPDATE queries can change all tables rows, or we can limit the update statement affects for the certain rows with the help of the WHERE clause. Mostly, we use constant values to change the data, such as the following structures.

The full update statement is used to change the whole table data with the same value.

The conditional update statement is used to change the data that satisfies the WHERE condition.

However, for different scenarios, this constant value usage type cannot be enough for us, and we need to use other tables’ data in order to update our table. This type of update statements is a bit complicated than the usual structures. In the following sections, we will learn how to write this type of update query with different methods, but at first, we have to prepare our sample data. So let’s do this.

Preparing the sample data

With the help of the following query, we will create Persons and AddressList tables and populate them with some synthetic data. These two tables have a relationship through the PersonId column, meaning that, in these two tables, the PersonId column value represents the same person.

Preparing sample data

UPDATE from SELECT: Join Method

In this method, the table to be updated will be joined with the reference (secondary) table that contains new row values. So that, we can access the matched data of the reference table based on the specified join type. Lastly, the columns to be updated can be matched with referenced columns and the update process changes these column values.

In the following example, we will update the PersonCityName and PersonPostCode columns data with the City and PostCode columns data of the AdressList table.

UPDATE from a select statement

After the execution of the update from a select query the output of the Persons table will be as shown below;

Result of the update from a select statement

Let’s try to understand the above code:

We typed the table name, which will be updated after the UPDATE statement. After the SET keyword, we specified the column names to be updated, and also, we matched them with the referenced table columns. After the FROM clause, we retyped the table name, which will be updated. After the INNER JOIN clause, we specified the referenced table and joined it to the table to be updated. In addition to this, we can specify a WHERE clause and filter any columns of the referenced or updated table. We can also rewrite the query by using aliases for tables.

Performance Tip:

Indexes are very helpful database objects to improve query performance in SQL Server. Particularly, if we are working on the performance of the update query, we should take into account of this probability. The following execution plan illustrates an execution plan of the previous query. The only difference is that this query updated the 3.000.000 rows of the Persons table. This query was completed within 68 seconds.

Effect of an index on the update statements

We added a non-clustered index on Persons table before to update and the added index involves the PersonCityName and PersonPostCode columns as the index key.

Non-clustered indexes performance affects on the update statements

The following execution plan is demonstrating an execution plan of the same query, but this query was completed within 130 seconds because of the added index, unlike the first one.

Performance benchmark of the update statements

The Index Update and Sort operators consume 74% cost of the execution plan. We have seen this obvious performance difference between the same query because of index usage on the updated columns. As a result, if the updated columns are being used by the indexes, like this, for example, the query performance might be affected negatively. In particular, we should consider this problem if we will update a large number of rows. To overcome this issue, we can disable or remove the index before executing the update query.

On the other hand, a warning sign is seen on the Sort operator, and it indicates something does not go well for this operator. When we hover the mouse over this operator, we can see the warning details.

Tempdb spill explanation

During the execution of the query, the query optimizer calculates a required memory consumption for the query based on the estimated row numbers and row size. However, this consumption estimation can be wrong for a variety of reasons, and if the query requires more memory than the estimation, it uses the tempdb data. This mechanism is called a tempdb spill and causes performance loss. The reason for this: the memory always faster than the tempdb database because the tempdb database uses the disk resources.

You can see this SQL Server 2017: SQL Sort, Spill, Memory and Adaptive Memory Grant Feedback fantastic article for more details about the tempdb spill issue.

UPDATE from SELECT: The MERGE statement

The MERGE statement is used to manipulate (INSERT, UPDATE, DELETE) a target table by referencing a source table for the matched and unmatched rows. The MERGE statement can be very useful for synchronizing the table from any source table.

Now, if we go back to our position, the MERGE statement can be used as an alternative method for updating data in a table with those in another table. In this method, the reference table can be thought of as a source table and the target table will be the table to be updated. The following query can be an example of this usage method.

Performing the update from a select through the merge statement.

Now let’s tackle the previous update from a select query line by line.

We have typed the Persons table after the MERGE statement because it is our target table, which we want to update, and we gave Per alias to it in order to use the rest of the query.

After the USING statement, we have specified the source table.

With the help of this syntax, the join condition is defined between the target and source table.

In this last line of the query, we chose the manipulation method for the matched rows. Individually for this query, we have selected the UPDATE method for the matched rows of the target table. Finally, we added the semicolon (;) sign because the MERGE statements must end with the semicolon signs.

UPDATE from SELECT: Subquery Method

A subquery is an interior query that can be used inside of the DML (SELECT, INSERT, UPDATE and DELETE) statements. The major characteristic of the subquery is, they can only be executed with the external query.

The subquery method is the very basic and easy method to update existing data from other tables’ data. The noticeable difference in this method is, it might be a convenient way to update one column for the tables that have a small number of the rows. Now we will execute the following query and then will analyze it.

Performing the update from a select through the subquery method.

After the execution of the update from a select statement the output of the table will be as below;

Using subqueries for the updates.

As we can see, the PersonCityName column data of the Persons table have been updated with the City column data of the AddressList table for the matched records for the PersonId column. Regarding this method, we should underline the following significant points.

  • If the subquery could not find any matched row, the updated value will be changed to NULL
  • If the subquery finds more than one matched row, the update query will return an error, as shown below:

    An error of the subquery in the update statement.

  • Many times the subquery update method may not offer satisfying performance

Conclusion

In this article, we learned to update the data in a table with the data where they are contained in other tables. The query structure, “UPDATE from SELECT” can be used to perform this type of data update scenario. Also, we can use alternative MERGE statement and subquery methods.

Esat Erkec
SQL commands, T-SQL

About Esat Erkec

Esat Erkec is a SQL Server professional who began his career 8+ years ago as a Software Developer. He is a SQL Server Microsoft Certified Solutions Expert. Most of his career has been focused on SQL Server Database Administration and Development. His current interests are in database administration and Business Intelligence. You can find him on LinkedIn. View all posts by Esat Erkec

153,228 Views