In general, a Transaction is a single unit of work consists of multiple related tasks that should succeed or fail as one atomic unit. To make the concept of the transaction more familiar and why it should go all or none, imagine one of the most critical transaction examples in our daily life, which is withdrawing money from the ATM.
After a hard-working month, you are happy that your salary is released. As it is late to visit the bank after the working hours, your destination now is the ATM. All of us are familiar with the steps of withdrawing money from the ATM, but it deserves listing the common ones below:
- Insert your card
- Select the language
- Enter your PIN
- Select the transaction type, which is withdraw in our example here
- Enter the amount
- Pull the card
- Take your money
The previous tasks should run as one unit to keep your bank account in consistent state. This means that, if you specify the amount to withdraw from your bank account, it should be guaranteed that the money is getting out of the ATM before confirming the deduction from your account. If any task from the previous tasks fails, the overall withdraw transaction should be cancelled. For example, if any technical or mechanical error occurred after specifying the amount and deducting it from your account, that prevents the money from getting out the ATM, the operation should be canceled and the amount will be returned to your account.
In SQL Server, the Transaction concept is highly required to maintain the integrity of data in a database, especially when executing multiple related tasks sequentially on different tables, databases or servers, or accessing the same records by more than one session concurrently. In all these cases, the transaction should work as one unit of failure or success.
In the ATM withdraw example, the SQL Server Engine will commit the overall transaction, including the amount deduction from the bank account, when all the steps are performed successfully, and the money received by the user. To handle the concurrency, the SQL Server Engine holds a lock on the bank account during that transaction, to guarantee that no other transaction will be performed on the same account at the same time, that may result with dirty read or incorrect operation result, and release the lock when the overall transaction is committed or rolled back. Due to this lock that prevents other transactions from accessing the same records, it is better always better to narrow the scope of the transaction, so that the records will be locked for short period of time, in order not to affect the overall performance of the system.
After understanding the concept of the SQL Server Transaction, we can describe the Transaction using the four ACID properties. ACID is the acronym for Atomicity, Consistency, Isolation and Durability.
Atomicity means that the transaction will succeed, as one unit, if all the separate tasks succeed with no issue. On the other hand, the failure of any single task within this transaction leads to the overall transaction failure and rollback. In other words, Atomicity guarantees that the transaction is all or none.
Consistency guarantees that the transaction will not affect the database consistency and will leave it in a valid state by complying with all database rules, such as foreign keys and constraints, defined on the columns.
Isolation means that, each transaction has it is own boundary, that is separated from the other concurrently executing transactions, and will not be affected by these transactions’ operations.
Durability means that the result of the committed transaction that is written permanently to the database will not be lost even in the case of any abnormal system failure or termination, as there should be mechanism of recovering this data, that we will see later in the next articles of this series.
The states of the transaction can be summarized as follows:
- The running transaction is referred to as the Active transaction
- The transaction that completes its execution successfully without any error is referred to as a Committed transaction
- The transaction that does not complete it is execution successfully is referred to as an Aborted transaction
- The transaction that is not fully committed yet is referred to as a Partially Committed transaction
- If the transaction does not complete its execution, it is referred to as a Failed transaction, that is Aborted without being committed
- If the Partially Committed transaction completes its execution successfully, it will be Committed, otherwise it will be Failed then Aborted
In SQL Server, each single T-SQL statement is considered as an Auto-Commit transaction, that will be committed when the statement is completed successfully and rolled back if the statement is failed. If the T-SQL query uses the BEGIN TRANSACTION statement to control the start the transaction, COMMIT TRANSACTION statement to complete the transaction successfully, and ROLLBACK TRANSACTION statement to abort the transaction, the auto-commit transaction mode will be overridden to become an Explicit Transaction.
Implicit vs explicit transactions
Assume that you want to perform a database change that starts but is not completed unless you explicitly indicate that. To achieve that, you can enforce the opened database connection to work in the Implicit Transaction mode by using the SET IMPLICIT TRANSACTIONS ON|OFF statements. In this case, the command specified after the SET statement will start executing and remain, and locking the database resource, until you explicitly issue a COMMIT statement to complete the transaction successfully or a ROLLBACK statement to abort the transaction and revert all changes.
When the Multiple Active Result Sets (MARS) option is enabled, the implicit or explicit transaction that has multiple batches running at the same time will be running under the Batch-Scoped Transaction mode. In this case, the batch-scoped transaction that is not committed or rolled back after completing the batch will be rolled back automatically by the SQL Server Engine.
Local vs distributed transactions
SQL Server supports both the Local transactions that are processing data from the local database server and the Distributed transactions that are processing data from more than one database server. In distributed transactions, the transaction states, COMMIT and ROLLBACK coordination among the different servers is performed using a transaction manager component called the Microsoft Distributed Transaction Coordinator, also known as MSDTC. To commit the distributed transaction, all participants must complete their related task successfully. If even a single participant fails for any reason, the overall transaction fails, and any changes to data within the scope of the transaction will be rolled back.
Transaction best practices
Using the SQL Server transaction helps maintaining the database integrity and consistency. On the other hand, a badly written transaction may affect the overall performance of your system by locking the database resources for long time. To overcome this issue, it is better to consider the following points when writing a transaction:
- Narrow the scope of the transaction
- Retrieve the data from the tables before opening the transaction if possible
- Access the least amount of data inside the transaction body
- Do not ask for user input inside the body of the transaction
- Use a suitable mode of transactions
- Use as suitable Isolation Level for the transaction
In this article, we have tried to introduce the SQL Server Transaction concept in a simple way, that helps going through the SQL Server Transaction Log details in the coming articles of this series. Stay tuned!
Table of contents
- Azure Data Factory Interview Questions and Answers - February 11, 2021
- How to monitor Azure Data Factory - January 15, 2021
- Using Source Control in Azure Data Factory - January 12, 2021