Prashanth Jayaram

SQL Server Database Recovery Process Internals – database STARTUP Command

February 14, 2017 by

A database recovery process is an essential requirement for database systems, It can be a tedious job and the process of recovery varies on lot of scenarios. The desire to improve recovery results has resulted in various procedures, but understood by few and prone to errors. In this article, I’ll illustrate the impact of stopping the database instance in the middle of a large transaction that was running and discuss several techniques and tools that are available for faster and successful recovery.

Introduction

Let’s discuss a scenario where we have a SAN disk issue and latency is really high and think many jobs on the data warehouse systems started running indefinitely without a success.

The owner of the server felt this might be a server issue and decided to restart the server but from that point he’s invited trouble. Everything comes back but he’s nw not able to query the database because of a block generated by the system process on the user sessions during the recovery process. This leads to an involvement of senior DBA to troubleshoot the problem.

The server came back and databases were up and running, however, they were able to query the other objects but not the one which took part in the recovery process.

After brainstorming sessions, they came to know that the SQL job was purging 25 Million rows from the database. A general rule of thumb is that it will take about the same amount of time, or more, for an aborted transaction to rollback as it has taken to run. The rollback can’t be avoided even with the SQL Server stops and starts. There might be chances of corruption if you try to restart multiple times.

Now, we have to address two issues

  • Work with network team to fix poor I/O subsystems which directly impacting the disk latency
  • The disk latency has a direct impact on the recovery process

The first and foremost important thing is that checking the status of each database

Though the database came online, it doesn’t mean that the recovery of each database is done and it’s available for further operations that are what happened in this case.

One of the most commonly used SQL by any DBA to determine the cause of any problem would be sp_who2. It shows all the session details that are currently established with the database. The first 50 sessions are system processes. The general purpose of sp_who2 is to diagnose

  • Blocking
  • CPU
  • I/O bottleneck
  • WaitTypes

sp_who2

The key information from the below figure is the command type [database STARTUP] and the session_id that indicates this is a system task performing the startup recovery

The advantage of using the enterprise edition leads to the database to come online in the second phase of the database recovery i.e. is the database is available after the REDO phase and also at certain instance during UNDO, the last part of recovery but the data involved in the rollback is locked i.e. is any access to object being recovered would be blocked. This process sometimes referred as “fast recovery”. As we were able to query the other objects but not the one which took part in recovery process

SQL Server Error log

The error log comes very handy when troubleshooting any issues with references to database recovery. This can be very helpful to detect/troubleshoot any current or potential problem areas, including automatic recovery messages (particularly if an instance of SQL Server has been stopped and restarted), kernel messages, or other server-level error messages. In this case, the error log clearly reveals the recovery stages of the BI_DW data warehouse database.

SQL Server 2008 onwards, the two DMV’s sys.dm_exec_requests and sys.dm_tran_database_transactions provides an insight on the different recovery states of the database startup.

The other way is by executing the stored procedure “sp_readerrorlog” to get information about the progress of the database recovery.

The information presented in these DMV’s varies depending on the situation: recovery of databases during server startup, recovery of the database after a attach operation, recovery of the database after a restore operation.


The percent_complete and estimated_completion_time shows the some value that the error log messages indicate about the progress within the stage of recovery. You can also refer other columns to understand other aspects about the recovery.

Database Recovery Phases

Different phases of the recovery are

  • Analysis
  • Redo
  • Undo

Analysis In this phase the transaction log is analyzed and records the information about the last checkpoint and creates on Dirty Page Table, this might capture all the dirty pages details.

Redo It’s a roll forward phase. When this phase completes database becomes online so the database was online

Undo It’s rollback phase. It holds the list of the active transaction from the analysis phase basically undoing the transactions.

Database Startup Sequence

There is an exact sequence that determines which user database will come online first. The error log is the reference to measure the database startup sequence but in any case

  • Master database comes first as it stores system catalog details and configuration details of the user database
  • Resource database comes next
  • TempDb comes online only after model database comes online

The below SQL gives the order of database startups


Conclusion

This is normal for SQL Server when there is an unexpected shutdown when there is an execution of long running transaction. A database goes through 3 phases of recovery and then comes online – Enterprise edition allows the database to come online after second phase.  This means that there is a rollback running at the moment; need to wait for this to complete. Let the server finish the rollback process if a long-running transaction was terminated. Terminating the server process during the rollback of long-running transaction results in long recovery time.

Querying WITH (READUNCOMMITTED) hint to ignore the locks associated with the transaction that is rolling back. The results are considered unreliable, but ironically, the results are probably dirty reads since the blocking process is a rollback.

The best solution is to wait. It’s possible to kill the blocking process by restarting the instance, but then you should be concerned with database integrity. Many cases end up with restoring the database from the backup.

Summary

  • Make sure to check for any active transaction before restarting the database server
  • Setting Target Recovery Time of a Database but it’s not recommended for OLTP systems as it generates extra overhead to maintain that may degrade the performance.
    • Once TARGET_RECOVERY_TIME value is set to >0, the database starts to use it instead of using automatic checkpoint and it is called indirect checkpoint. The value can be specified in seconds or minutes and will define the maximum time to recover the database after a crash.

Recovery

About Prashanth Jayaram

I’m a Database technologist having 11+ years of rich, hands-on experience on Database technologies. I am Microsoft Certified Professional and backed with a Degree in Master of Computer Application.

My specialty lies in designing & implementing High availability solutions and cross-platform DB Migration. The technologies currently working on are SQL Server, PowerShell, Oracle and MongoDB.

View all posts by Prashanth Jayaram

1,866 Views
  • Syed Shanu

    Nice One Prashanth Jayaram

  • Prashanth Jayaram

    Thanks, Syed!!

  • Calin Oprea

    how is the estimated completion time influenced by the fact that SQL is either Standard or Enterprise? Any idea if this is shown anywhere?

  • Prashanth Jayaram

    Hello,
    Thanks for the comment. Let me put this way, the Enterprise Edition always delivers fast performance— enabling many options internally but whereas for Standard Edition there is a limitation in terms of the scalability. For example, buffer pool limitation and to enable column store features beyond 32 GB. If you are working with very limited and small data sets, you may not see the difference between the two editions. I hope this will give a glimpse about SE and EE.

    To answer the second part of the question, I may need to do some research to get the right pointers. I will keep you posted.

    Best Regards,
    Prashanth