Rajendra Gupta

SQL Server FILESTREAM with Change Data Capture

February 22, 2019 by

Sometimes we require tracking data change activity (Insert, update and deletes) in SQL Server tables. SQL Server 2008 introduced Change Data Capture (CDC) to track these changes in the user-defined tables. SQL Server tracks the defined table with a mirrored table with same column structure; however; it adds additional metadata fields to track these changes. We can use table-valued function to access this changed data.

In the series of articles on SQL Server FILESTREAM (see TOC at bottom), we are continuously exploring the SQL FILESTREAM feature along with its combination with other SQL Server features. Using SQL FILESTREAM, we can store the objects into the file system instead of SQL Server relational tables. We can also perform changes to the SQL FILESTREAM tables using T-SQL. In this article, we will explore the change data capture on the SQL FILESTREAM tables.

Prerequisites

  • A SQL Server instance with FILESTREAM feature enabled
  • A SQL FILESTREAM database
  • A SQL FILESTREAM table and data

In this article, we will use CDC along with SQL Server FILESTREAM.

There are two steps to configure CDC.

Enabling CDC at the database level

We need to enable the CDC at the database level first.

Execute the below query in the FILESTREAM database. It enables the CDC in the FILESTREAM database.

We can use the column ‘is_CDC_enabled’ in the sys.databases to check about the status of CDC on a particular database. In the following query, we checked the CDC on the FILESTREAM database.

In the output, Value ‘1’ shows that we have successfully enabled the CDC on FILESTREAM database.

.

Once we have enabled CDC, it creates a new schema ‘CDC’ in the FILESTREAM database as shown below.


It creates a few system tables as well in the CDC schema.

Enabling CDC at the table level

We need to enable the CDC on the FILESTREAM table for which we want to track the changes. We can check it first using ‘sys.tables.’ whether any existing table is CDC enabled or not.

In the following screenshot, we can see that currently CDC is not enabled for our FILESTREAM table ‘ Tbl_Support_Documents.’

We can enable CDC on the FILESTREAM table using the below query. The SQL Server Agent service should be running before we execute this command. We also need to execute this query with the db_owner permission in the FILESTREAM table. We might have the requirement to track a few columns only from the FILESTREAM table. We can also specify a specific column name in the command with ‘captured_column_list’. In the below example, we want to track each column of the FILESTREAM table.

We observed here that the CDC enable command created the ‘CDC.FileStreamDemoDB_test_cleanup’ job and started it. It should create the below two jobs.

  1. cdc.FileStreamDemoDB_test_capture: This job is responsible for tracking the changes and it runs the stored procedure ‘sys.sp_MScdc_capture_job’
  2. ‘CDC.FileStreamDemoDB_test_cleanup’: this job cleans up the CDC tables by running the stored procedure sys.sp_MScdc_cleanup_job. By default, it clears any data at least 3 days old from the CDC tracked tables

In our case, we do not see the capture job created. We did not receive any error message either. Let’s now try to create this capture job manually using the system procedure sys.sp_cdc_add_job.

Execute the below query in the FILESTREAM database

We get an error message this time. It shows that capture job cannot be used by CDC to extract the changes from the log because transaction replication is enabled on the FILESTREAM database. In the last article ‘ SQL Server FILESTREAM and Replication’, we configured the SQL Server replication on the FILESTREAM database. We cannot use CDC and Replication together because Replication also tracks the changes from the transaction log.

Expand the Replication folder in the SSMS and we can the replication is still configured for the FILESTREAM database. We need to drop it to use the CDC feature on the SQL Server FILESTREAM database.

Right click on the publication and Delete the publication

We need to confirm deleting the publication in the SQL FILESTREAM database. It will also delete the subscriptions configured for the particular publication as well. Click on ‘Yes’.

In the following screenshot, we can notice that it automatically creates the CDC capture job now and it starts the job as well. If the replication is configured on the CDC database, it does not create the CDC capture job.

We can verify that both the jobs are created in the SQL Server instance. The Capture job is scheduled to run continuously to track all the changes in the CDC enabled tables.

Once we have configured the CDC on a particular table, let us query that CDC table and we can see it showing 0 rows as of now. It does not show any record because we did not perform any changes to the SQL FILESTREAM table after we enabled this feature.

We can get the information about the CDC source table, capture instance, schema from the stored procedure ‘sys.sp_cdc_help_change_data_capture’. Execute the below query

In the output, we can get the list of the entire CDC enabled table, schema and captured instance.

If we are not sure of which are the columns being tracked using CDC, we can check it using the sys.sp_cdc_get_captured_columns stored procedure. We need to the pass the @capture_instance parameter value from the ‘sys.sp_cdc_help_change_data_capture’.

In the below screenshot, we get the list of the all FILESTREAM table columns because we did not specify the column level filter initially.

We will now perform DML operations on the SQL Server FILESTREAM table and view the changes using CDC.

Tracking Insert activity on SQL FILESTREAM tables

Let us insert one object into the FILESTREAM table using the following query.

Once the insert is successful, query the CDC system table and we get one row in it. In this record, we can see the document_name and document_type values are similar to the insert statement values.

In this output, we have _$operation value as 2. This column shows the type of activity performed on the SQL FILESTREAM enabled CDC table.

We can see the following values in the column

Tracking Update activity on SQL FILESTREAM tabless

Let’s try to update the record in the FILESTREAM table. In this, we will replace the existing file in the FILESTREAM table using below query.

In the CDC table, you get two rows this time. If you look at the _$operation column values, we have value ‘3’ (old update values) and ‘4’ (update after values). In the above update query, we just updated the SQL FILESTREAM object; therefore, we do not see any changes in the document_name column. We always get two rows for the update statement in the CDC tracking table.

Let’s now update the document name as well in the SQL FILESTREAM table using below query.

In the CDC system table, we can see the old and new value for the document_name column as well.

In the SQL FILESTREAM enabled database, it creates the table-valued functions as well. In the below screenshot, you can see the table-valued function.

  1. Cdc.fn_cdc_get_all_changes_dbo_Tbl_Support_Documents
  2. Cdc.fn_cdc_get_net_changes_dbo_Tbl_Support_Documents

We can use the function Cdc.fn_cdc_get_all_changes_dbo_Tbl_Support_Documents to get events in the CDC tables over a particular period.

In the below query, we used the function sys.fm_cdc_map_time_lsn to get the LSN information for a time period. We can have following values in this function.

  • largest less than
  • largest less than or equal
  • smallest greater than
  • smallest greater than or equal

Execute the query above, and you can see all the operation performed in the CDC enabled table. We can see the old values and new values after the update as well occurred during a specific period.

If we do not want to get the old values before the update, execute the below query with ‘all’ parameter.

This time you get only the updated values after the successful update statement execution on the FILESTREAM table.

Tracking Delete activity on SQL FILESTREAM tables

Let us delete the row from the FILESTREAM table now.

In the CDC output, you can see we can track the deleted object from the CDC tables as well. Document_ID is similar in the CDC table as of the delete statement.

Disabling CDC on SQL FILESTREAM tables

We can disable the CDC for the SQL FILESTREAM table using the system procedure ‘sys.sp_cdc_disable_table’. You need to pass the @capture_instance value using the stored procedure ‘sys.sp_cdc_help_change_data_capture’

Once we execute the command it drops the CDC tables, functions, catalog views associated with the CDC table.

We can disable CDC on the database level as well using the below command. It removes all the data, jobs created during the CDC enable process. It might take a few seconds to complete depending upon the CDC data and objects.

Table of contents

FILESTREAM in SQL Server
Managing data with SQL Server FILESTREAM tables
SQL Server FILESTREAM Database backup overview
Restoring a SQL Server FILESTREAM enabled database
SQL Server FILESTREAM database recovery scenarios
Working with SQL Server FILESTREAM – Adding columns and moving databases
SQL Server FILESTREAM internals overview
Importing SQL Server FILESTREAM data with SSIS packages
SQL Server FILESTREAM queries and Filegroups
Viewing SQL Server FILESTREAM data with SSRS
SQL Server FILESTREAM Database Corruption and Remediation
Export SQL Server FILESTREAM Objects with PowerShell and SSIS
SQL FILESTREAM and SQL Server Full Text search
SQL Server FILESTREAM and Replication
SQL Server FILESTREAM with Change Data Capture
Transaction log backups in a SQL FILESTREAM database
SQL FILESTREAM Compatibility with Database Snapshot, Mirroring, TDE and Log Shipping
SQL Server FILETABLE – the next generation of SQL FILESTREAM
Managing Data in SQL Server FILETABLEs
SQL Server FILETABLE Use Cases
Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views