Jefferson Elias

How to take advantage of the SQL Server’s transaction log?

April 11, 2016 by

Introduction

SQL Server keeps track of all database modifications and every database transaction. This is done in a file called the transaction log or TLOG. This transaction log is particular to a SQL Server database and there is, at least, one transaction log per SQL Server database.

As explained on MSDN, the transaction log is a critical component of the database and, if there is a system failure, the transaction log might be required to bring your database back to a consistent state. The transaction log should never be deleted or moved unless you fully understand the ramifications of doing this.

So, we can conclude that the information contained inside the transaction log is valuable and maybe we could even take advantage of this information to answer questions like “Who is responsible for this table drop?” or “When did the operation occurred?” in order to recover a data alteration or data loss. Let’s explore which tools can help and what we actually can do with it…

Why getting back information from transaction log?

Let’s say there is an issue: somebody dropped a table or deleted rows in a particular table and you want to get the version of those rows just before he dropped it. Well, you’ve well done your job and have taken backups but, in general, you can’t simply restore it (somewhere else than on the production database) and take back data because the time between the last backup and the moment when the person who did the mistake actually fired its DROP or DELETE query.

To be able to do it and being sure of the data we get back, we need to know precisely in which transaction the query was executed so that we can get a Log Sequence Number (LSN) and be able to use the STOPBEFOREMARK option of RESTORE LOG command like the code below.

How to get back information from the transaction log?

The first way to have access to information inside the transaction log is to use DBCC LOG(<DbName>) command. But as you will see in the figure below, we won’t go that far with its output: you don’t know anything about the object that has been impacted and it’s not really helpful in recovery.

Fortunately, there is another way. Unfortunately, this operation will require the use of undocumented SQL functions and, since they are not documented by Microsoft, they are not supported and you must use it at your own risk.

The first one we will discuss is called sys.fn_dblog. This function reads the active part of the transaction log in a log sequence number (LSN) interval. What I call “the active part of a transaction” is the portion of the transaction log where transaction details are still there and have not been truncated. The truncation can happen automatically if the database is in simple recovery model, when a TRUNCATE LOG operation has been fired or after a backup occurred.

You will find below a simple way to call it:

Here is a sample result. As you can see, the first columns are [CurrentLSN] and [Operation]. The former allows to know the log sequence number corresponding to the entry in transaction log while the latter tells the kind of operation that have been made. There is also a [TransactionId] column. So, we can tell what have been done for a single TransactionId and maybe. Two columns that are not displayed here but seem interesting is [AllocUnitId] and [AllocUnitName].

One more word about the way to call the sys.fn_dblog function is that, as written in the code sample above, you can’t use as a parameter the values returned by the function in the [CurrentLSN] column. The reason is that the [CurrentLSN] value is expressed in hexadecimal and the function takes decimal expressions of the LSN. So, you need to convert a [CurrentLSN] value before using it. There are multiple ways to do it: either using a website resource like BinaryHexConverter. To get it right, you will have to firstly split the [CurrentLSN] value into three pieces using the ‘:’ character as separator then convert each individual value using the website and finally concatenate everything with the ‘:’ character in the same order. The operation is quite the same if you want to use this LSN in conjunction with the STOPBEFOREMARK option of RESTORE LOG command: you still split the [CurrentLSN] the same way but concatenate differently. In fact, the difference resides in the fact that the second number is padded by leading 0 so that its length is 10 digits and the third number is padded by leading 0 so that its length is 5 digits.

As an alternative, you can use the bit of T-SQL code shown below:

So, what kind of operations can be found with it? You will find below a partial list of the operations that can be found:

OPERATION DESCRIPTION
LOP_ABORT_XACT   Indicates that a transaction was aborted and rolled back.
LOP_BEGIN_CKPT  A checkpoint has begun.
LOP_BEGIN_XACT  Indicates the start of a transaction.
LOP_BUF_WRITE Writing to Buffer.
LOP_COMMIT_XACT Indicates that a transaction has committed.
LOP_COUNT_DELTA  ?
LOP_CREATE_ALLOCCHAIN New Allocation chain
LOP_CREATE_INDEX Creating an index.
LOP_DELETE_ROWS Rows were deleted from a table.
LOP_DELETE_SPLIT  A page split has occurred. Rows have moved physically.
LOP_DELTA_SYSIND   SYSINDEXES table has been modified.
LOP_DROP_INDEX Dropping an index.
LOP_END_CKPT Checkpoint has finished.
LOP_EXPUNGE_ROWS Row physically expunged from a page, now free for new rows.
LOP_FILE_HDR_MODIF   SQL Server has grown a database file.
LOP_FORGET_XACT Shows that a 2-phase commit transaction was rolled back.
LOP_FORMAT_PAGE   Write a header of a newly allocated database page.
LOP_HOBT_DDL  ?
LOP_HOBT_DELTA  ?
LOP_IDENT_NEWVAL Identity’s New reseed values
LOP_INSERT_ROWS   Insert a row into a user or system table.
LOP_LOCK_XACT
LOP_MARK_DDL Data Definition Language change – table schema was modified.
LOP_MARK_SAVEPOINT Designate that an application has issued a ‘SAVE TRANSACTION’ command.
LOP_MIGRATE_LOCKS
LOP_MODIFY_COLUMNS    Designates that a row was modified as the result of an Update command.
LOP_MODIFY_HEADER   A new data page created and has initialized the header of that page.
LOP_MODIFY_ROW   Row modification as a result of an Update command.
LOP_PREP_XACT Transaction is in a 2-phase commit protocol.
LOP_SET_BITS
LOP_SET_BITS Designates that the DBMS modified space allocation bits as the result of allocating a new extent.
LOP_SET_FREE_SPACE   Designates that a previously allocated extent has been returned to the free pool.
LOP_SORT_BEGIN  A sort begins with index creation. – SORT_END end of the sorting while creating an index.
LOP_SORT_EXTENT Sorting extents as part of building an index.
LOP_UNDO_DELETE_SPLIT The page split process has been dumped.
LOP_XACT_CKPT During the Checkpoint, open transactions were detected.

So, basically, the transaction log contains information not only on DML and DDL operations but also on SQL Server internal operations like a checkpoint, a buffer write or a page split. But it has a limit as explained above. Let’s prove this with this simple example.

The figure below is the output of the batch presented above.

You should see a tremendous reduction in the number of returned values. For me, it passed from 545 to 10. This means that the inactive part of the transaction log which tracked the transactions has been dumped to a backup file and the original entries from the log file have been flushed which means the log truncation can occur to reduce space consumption inside the transaction log.

You will find demos later. Those demos are a step by step procedure to recover data after a DELETE or DROP operation has occurred using sys.fn_dblog function.

Before jumping to the other function, you will find another query below which just lists committed DML operations. You can customize it to get some other information:

How to get back information from a transaction log backup file?

Here again, you must use it at your own risks and according to Paul Randal on his blog post, it seems that every time this function is called, a new SQLOS scheduler and three threads are created. Those threads won’t disappear until a server restart. Still according to Paul Randal, this has been addressed as a bug to Microsoft. It’s fixed in SQL Server 2012 SP2+ and SQL Server 2014 but won’t be backported to SQL Server 2008 (R2).

This function has a lot of parameters. The two first parameters are the same as sys.fn_dblog i.e. an interval in LSN. Then it takes a device type which can be NULL (which means DISK and is the default value), ‘DISK’, ‘TAPE’, or ‘VIRTUAL_DEVICE’. The next parameter is called @seqnum with a default value of 1. I don’t currently know its meaning. Finally, there are 64 parameters for specifying paths to a log backup device.

Here is its interface:

Let’s now see an example call, assuming we have a backup file on disk which is on the following path:

When the following query will be fired against a database, we’ll get the same kind of dataset as returned by sys.fn_dblog function:

Demos

In the following, we will try two things: recover data after a DELETE statement occurred and recover data after a DROP statement occurred. As the principles are the same for sys.fn_dblog and for sys.fn_dump_dblog functions, we will just use the first one.

Demo setup

Let’s first create a new, empty database and add a simple table inside this database.

Let’s launch a FULL and a LOG backup to initialize things:

We have an empty table but no data into it. Let’s insert some data.
Use case 1: Recover deleted data and get to know who did it

Let’s pretend we are doing the following operation by error. The operation consists in some data deletion:

We have made some transactions so far. The transaction log must be filled with some useful information and rows have been “unexpectedly” deleted.

The following query calls sys.fn_dblog function and filters its output to only get back information on delete operations in City table:

The output of this query is as follows:

With this query, we are able to know the starting LSN for delete operation at which delete operation started for the transaction Id 0000:00000461. But the transaction was already running! We need to find the LSN corresponding to the beginning of the transaction. This can be obtained by filtering the output on LOP_BEGIN_XACT operations only for the given transaction.

The following query will provide this information:

And the winner is:

We now know that the DELETE operation started at [Begin Time] value of the column with LSN 0000020:0000e7:0001

We need to convert this value to decimal using one of the methods described previously and only consider the result for STOPBEFOREMARK option usage. In his case, the value to use is 32000000023100001

By the way, at the same time, we get a [TransactionSID] column which tells us which database user ran the DELETE statement.

We can get to know it by running the following query:

This can be a plus to ensure that it was the good DELETE statement we considered so far.

So, basically, we have the value to use during the last log restore. We have to first restore a full backup media to another location and database with NORECOVERY option enabled then successfully restore logs until the LSN is the expected boundary i.e. 32000000023100001

But before that, we need to run a transaction log backup. That’s because we are using sys.fn_dblog which queries the active part of transaction log. It wouldn’t be the case if we were using sys.fn_dump_dblog since it queries a transaction log backup

If we had not found the corresponding operation with sys.fn_dblog, we would have used sys.fn_dump_dblog with the last transaction log backup media the same way we did.

Once we have all the transaction log backups needed to recover, we can start restoring. An example code sample is provided below:

If we take a look at the content of City table, we will see that the rows we deleted are present.

The query to check:


Its output:

In summary, we’ve found the transaction that did the DELETE operation (this is the hard part of the work in production environment where DELETE statements can occur regularly against multiple tables). Then we got back the LSN corresponding to the beginning of the transaction that contains the DELETE statement. We translated this LSN to a decimal padded value and restored to that LSN in another database. The next step would be to export the deleted rows from the restored database to the actual production database.

Use case 2: Discover LSN corresponding to the start of a table drop operation

As the instructions to restore to a given LSN we get back from sys.fn_dblog and sys.fn_dump_dblog functions, this section will only address the question to retrieving the starting LSN of a transaction which dropped a table.

So, let’s get started by dropping the City table:

Let’s look for drop operations in the transaction log. This can be achieved using the following query.

Well, all the demos I’ve seen on the internet which show this operation only have one row returned. This means only one object has been dropped since last log backup. The authors tell then “ok, we have the LSN, restore to that LSN and everything is gonna be alright.” But, it’s not representative of a true production system where there can be multiple object drops between two transaction log backups, or where the application uses tables for storing kind of “sequence numbers” and drops it daily (seriously, we have an application that does it here). Plus, in my experience, the DBA is rarely called directly after such an error.

Here, we have two rows returned.

Fortunately, we can segregate thanks to the [BeginTime] column in our case (we take the last one). If you remember the previous use case, we also segregated on the [AllocUnitName] column, but this column is null for every row returned. This means that, in some cases, we cannot know with exactitude which [CurrentLSN] value should be taken, but I’ve not seen this warning anywhere on the other demos…

I’ve tried different things but except if you know the Object_ID previously used by the object in the database, it’s more or less impossible to know with precision the right [CurrentLSN] value to be used.

So, to me, it’s more a “guess and check” process than a straightforward step by step procedure. But, when you have a very big database, it’s practically impossible due to the time needed to restore.

By the way, here is a sample code, given as is which extracts information on the object_id of the dropped objects:


Demo cleanup

Dropping the databases:

Jefferson Elias
Transaction log

About Jefferson Elias

Living in Belgium, I obtained a master degree in Computer Sciences in 2011 at the University of Liege. I'm one of the rare guys out there who started to work as a DBA immediately after his graduation. So, I work at the university hospital of Liege since 2011. Initially involved in Oracle Database administration (which are still under my charge), I had the opportunity to learn and manage SQL Server instances in 2013. Since 2013, I've learned a lot about SQL Server in administration and development. I like the job of DBA because you need to have a general knowledge in every field of IT. That's the reason why I won't stop learning (and share) the products of my learnings. View all posts by Jefferson Elias

168 Views