Emil Drkusic
SQL Best Practices - the data model

Learn SQL: SQL Best Practices for Deleting and Updating data

July 7, 2020 by

Deleting and updating data is very common, but if performed without taking care, which could lead to inconsistent data or data loss. Today, we’ll talk about SQL best practices when performing deletes and updates. We’re not talking about regular/expected changes, but rather about manual changes which will be required from time to time. So, let’s start.

Data Model

We’ll use the same data model we’re using in this series.

SQL Best Practices - the data model

Still, we’ll focus only on one table and that is the customer table. We’ll create table backup and update a few rows in this table.

Maybe the most important SQL Best Practice – Create Backups

Creating a backup is not only SQL best practice but also a good habit, and, in my opinion, you should backup table(s) (even the whole database) when you’re performing a large number of data changes. This will allow you two things. First, you’ll be able to compare old and new data and draw a conclusion if everything went as planned. And second – in case something went wrong, you can easily revert everything. If you need a simple way to back up a table, except options on the GUI (which are specific to different tools), you have a very simple SQL command at your disposal.

SQL backup table using SELECT INTO

Please notice here that the keys were not backed up and therefore if you’ll need to recreate the original customer table from the customer_backup table, you’ll need to do one of the following (this is not only SQL best practice but required to keep the referential integrity):

  • Completely delete the customer table (using the command DROP TABLE customer;), and re-create it from the customer_backup table (the same way we’ve created backup). The problem here is that you won’t be able to drop the table if it’s referenced in other tables. In our case, the call table has attribute call.customer_id related to customer.id. The system won’t allow you to perform the DROP statement because this way, you would impact the referential integrity of the database

    SQL DROP TABLE

  • Delete all records from the customer table and insert all records from the customer_backup table. This approach again won’t work if you have records referenced from other tables (as we do have)

    SQL DELETE FROM table

To perform any of the previous two actions, you should first drop constraints, then perform the desired action, and recreate constraints. Before doing that, we should determine all the constraints related to the customer table. I’ll use the query below to do that. You can check more regarding the INFORMATION SCHEMA database in the Learn SQL: The INFORMATION_SCHEMA Database article.

You can see the query result in the picture below.

SQL Server query INFORMATION_SCHEMA

As you can see, we’ve identified 3 keys related to the customer table: customer_pk – primary key of the table, customer_city – relation between tables customer and city (city.id is referenced), and call_customer – relation between tables call and customer (customer.id is referenced).

One useful SQL Server procedure is sp_help. We can use it to get details about the table. For the customer table, the result returned would be like in the picture below.

SQL Server sp_help procedure

Since we have their names, we can easily drop all 3 constraints and recreate them later. To drop them, we can use the following statements:

But, before we drop them, we should “store” create commands, so we can use them after we recreate the customer table from the backup. We can do it by right-clicking on each constraint and export them to a new query window.

Preparing statements

If you’re performing changes on just a few rows, that is something where you can “take a risk”, copy old data to Excel, change them manually and visually confirm if everything went OK. In that case, there is no point in applying SQL best practices mentioned in this article.

Still, from time to time, you’ll get a bunch of data that should be either updated with new values, either deleted from the system. These could be hundreds of rows, but also millions. Of course, in such cases, inspecting changes visually is not the solution, and such cases are good candidates to apply SQL best practices mentioned today.

One thing that you should do before performing mass updates or deletes is to run a select statement using conditions provided. In the ideal situation, you would have provided PK (primary key) or UNIQUE/AK (alternate key) values. This will list all the cases that shall be impacted and also give you a feeling of what shall happen.

When you’re sure these are truly the cases that should be updated/deleted, you’re ready to prepare statements to perform the desired operation. You can do it in 2 ways:

  • Every single update/delete is performed by the UNIQUE value and is limited to exactly one row (using TOP(1) in SQL Server or LIMIT 1 in MySQL). This is pretty safe because you’ll be sure that each command impacts exactly one row. Also, for updates, this is sometimes the only option, because you can expect that you’ll have different values you want to set for different rows

    SQL update statements - updating multiple rows using TOP(1)

  • You could run one statement with all ids listed in it. This will work faster because you have only 1 statement. You have somehow less “control” here, but still, this is completely OK option to go with, especially in cases when you’re working with a really large number of rows. This will work well for deletes and should be considered SQL best practice. For updates, this method shall work only in case you’re updating all rows using the same values

    SQL update statement - updating multiple rows

Use Transactions

Transactions as concepts are extremely important in the database, but for the sake of this article, we’ll just tell that they allow us to perform all statements inside the transaction or none. If any statement fails for any reason, there will be no changes applied. This is not only SQL best practice (you should always use transactions when a whole batch of commands must run successfully), but also sounds very useful in cases we have a large batch of update/delete statements. Let’s take a look at the code that will do the trick.

SQL update statements inside transaction

Final SQL Best Practice – Check what happened

In case you’ve created a backup, you should compare old and new tables.

If you’ve deleted data, you’ll simply search for rows that are present in the old table, and we don’t have them in the new table, using LEFT JOIN. The total number should match what we’ve expected based on the input data and select a statement with the same conditions (if we’ve run it before).

finding deleted values

Since we performed only updates and there were no deletions, the returned result is empty (all customers who are in the old table are also in the new table).

In case we’ve performed updates, you can compare old and new rows by joining them using INNER JOIN. Rows that have differences in any of the attributes are the ones that were impacted with update statements. Of course, you’ll need to compare all attributes that were affected by updates in any of the statements. E.g., if you know you’ve changed only the next_call_date, you can check using only that attribute. But if other attributes were also mentioned in any of the statements, you should incorporate all of them in your check. The example below will check all attributes for differences.

SQL Best Practices - comparing updated values

As expected, we have 2 rows in the final result, and these are exactly 2 ones we’ve updated using previous statements.

SQL Best Practices when performing Updates and Deletes

One of the most important things while working with databases is not to lose or “damage” your data. To avoid that, you should stick to SQL best practices. Before you decide to perform mass deletes/updates of data in your database, it would be good that you back up all tables where changes are expected. After changes are performed, you should compare the old and the new table. If everything went OK, you can delete the backup tables. If there were errors, you should revert things (replace the “live” table with the one previously backed up) and try again (with corrected code).

Table of contents

Learn SQL: CREATE DATABASE & CREATE TABLE Operations
Learn SQL: INSERT INTO TABLE
Learn SQL: Primary Key
Learn SQL: Foreign Key
Learn SQL: SELECT statement
Learn SQL: INNER JOIN vs LEFT JOIN
Learn SQL: SQL Scripts
Learn SQL: Types of relations
Learn SQL: Join multiple tables
Learn SQL: Aggregate Functions
Learn SQL: How to Write a Complex SELECT Query?
Learn SQL: The INFORMATION_SCHEMA Database
Learn SQL: SQL Data Types
Learn SQL: Set Theory
Learn SQL: User-Defined Functions
Learn SQL: User-Defined Stored Procedures
Learn SQL: SQL Views
Learn SQL: SQL Triggers
Learn SQL: Practice SQL Queries
Learn SQL: SQL Query examples
Learn SQL: Create a report manually using SQL queries
Learn SQL: SQL Server date and time functions
Learn SQL: Create SQL Server reports using date and time functions
Learn SQL: SQL Server Pivot Tables
Learn SQL: SQL Server export to Excel
Learn SQL: Intro to SQL Server loops
Learn SQL: SQL Server Cursors
Learn SQL: SQL Best Practices for Deleting and Updating data
Learn SQL: Naming Conventions

Emil Drkusic
SQL commands, T-SQL

About Emil Drkusic

Emil is a database professional with 10+ years of experience in everything related to databases. During the years, he worked in the IT and finance industry and now works as a freelancer. His past and present engagements vary from database design and coding to teaching, consulting, and writing about databases. Also not to forget, BI, creating algorithms, chess, philately, 2 dogs, 2 cats, 1 wife, 1 baby... You can find him on LinkedIn View all posts by Emil Drkusic

658 Views