Rajendra Gupta

Static Data Masking in SSMS 18

December 6, 2018 by

In this article we are going to talk about Static Data Masking, a new feature for the SQL Databases in SSMS. Static Data Masking feature previously was available only for the Azure SQL DB.

Database administrators are responsible for database security and compliance issues. In a normal workflow, we used to have multiple environments for applications such as Production, Staging, UAT, Dev, and Sandbox. It is a regular task for the DBA to refresh the lower (staging, test, UAT etc.) environments with a backup of the production database backup. Sometimes we need to share the database backup with the external vendors as well. The database may contain critical information such as Personally Identifiable Information (PII data) such as email address, contact information, national id number etc. We might have other sensitive information such as bank account number, credit card number, CVV etc. which we do not want anyone to access. We need to be compliant with GDPR, PCI, and SOX regulatory compliance as well.

If we restore the database to lower environment, all of the data gets transferred to the lower environment which put our critical data at a risk since we do not have security or encryption enabled at that level. It may be a requirement to mask or shuffle the data prior to a database restore.

Consider the below example, where our production database has sensitive credit card information and we have performed a database restore to dev, UAT and sandbox environment.

You can see here that data is transferred to all environments, which makes our data accessible to a large audience and at risk too.

In SQL Server 2016, we come across a new solution ‘Dynamic Data Masking’ to prevent unauthorized users to access the defined sensitive information. We need to define curtails rules to mask the data. When any user request for the data, SQL Server checks his access and if he is not having necessary permission, he gets masked data. In this process, there are no changes in the source data. But this does not satisfy our case described above.

SSMS 18.0 contains the Static Data Masking feature for the SQL Databases. Previously it was available for the Azure SQL DB only. Static data masking creates a copy of the database and applies the data transformation rules on it. We can backup this masked database to lower environments so that sensitive data is not transferred to the other end. We can use this features for the databases since SQL Server 2012.

In the above diagram, we can see that there is no sensitive data information is transferred to the other environments using the static data masking.

We can use the Static Data Masking for the various purpose as below

  • Preparing database environments other than production
  • Database development
  • Database troubleshooting
  • Sharing data with third-party vendors

Worked example

Let us understand static data masking using an example.

First, create a database ‘StaticDemo’ along with the table. Insert sample data into it using the below query. This table contains DOB, Email address and the credit card no which we want to mask using this feature. To create the example, see the script in Appendix A

Right click on the database name -> Tasks-> Mask Database (Preview)

This launches the Static Data Masking (Preview) wizard as shown here.

In this wizard, you can see Step1: Masking Configuration. Under ‘Masking Configuration’ we can either choose to mask all columns in the database. Ideally, we do not want to mask all columns of the database.

Below this, we can filter columns as well. We can see all tables eligible for this under the filtered columns as well…

In the filtered columns, we can do an inline search that means as soon as we write something in the text box, we get tables containing columns with those names. For example, in below image, we type email and it gives the table and that column details.

If we select all columns for this table, we can see the error for the columns which are not suitable for this data masking. We can see here that identity columns are not eligible for the static data masking.

Now, just select the columns on which we want to apply static data masking. We can see that for all the selected columns, it shows the actions as ‘shuffle’ and there is a configure option against each column selected.

There are 5 masking functions available.

  1. Null: it replaces the data for that particular column with NULL values
  2. Single Value masking: in this masking, we specify a single value and that value will be copied to all data rows for a particular column
  3. Shuffle: In this masking function, values are shuffled to new rows
  4. Group Shuffle: in this shuffle, it binds several columns together in a shuffling group
  5. String Composite: we define string format and values will be replaced as per the specified string

By default, it selects for the shuffle masking function. We can change it from the drop-down list. Let us change it as below for our data

For the string composite, we need to define the string format. To do so, click on the configure and it opens up the pop-up screen.

Let us define the string and click on ‘OK’. I will leave an extra ‘\’ here to see how it behaves during the static data masking activity.

In step 2, we can choose the save the backup location for the clone .bak file. By default, it is the default location set up at the instance level, however, we can change the location here as per our requirement.

Next step is to specify the name of the masked database in the text box shown below.

Let us click on to start the static data masking process. We get the below error message

Let me explain the error message.

It asks to remove the below artifacts manually.

  • Partially masked database ‘StaticDemo_Masked’


  • Backup file used for cloning operation location at ‘…’

I generated this error intentionally to give you more understanding. I have performed this static data masking for this database prior as well for testing purpose, so it asks us to remove that masked databases along with the backup file used for cloning operation manually.

  • Long escape character ‘\’ at the end:

This is due to the ‘\’ placed during the string masking function for the DOB column. Therefore, go to configure option against the DOB column and remove the ‘\’ at the end.

After removing the ‘\’ character, we can see sample value as well below the pattern.

Let go back and click on Ok to apply the static data masking. We can see the message ‘Masking complete’.

We also get a message to clean up the backup file used for the cloning operation.

Let us view the table on both the Original database and the masked database.

You can notice the difference, in the data, for the column on which we applied the static data masking functions. We can change and apply the static data masking function as per our requirements. For example, let me change the Email to below function.

And we can see the difference in the masked database.

Configuration options:

We can save the masking configuration to an XML file. This XML file can be used later to apply the masking functions directly without doing configuration again. Click on the ‘Save Config’ and provide a location along with filename to save it. We can use ‘load config’ to load an existing configuration XML file.

Masking log:

‘Static Data Masking’ creates a log file inside the document folder. We can see the log file name for that particular operation at the bottom of the configuration wizard

In the document folder, there is a folder ‘Static Data Masking’ and we can see several log file. However, each log file shows one operation of the data masking activity.

Open the log file and we can see the detailed progress of the operation. In the screenshot, we can see high-level operation below.

  • Begin database clone
  • Backup Original database
  • Create a new database from the database backup.
  • Fetch and adjust Database schema, column lengths
  • Applies masking for the specified table and columns
  • Bring Clone database online

Note:

We cannot revert up to the original state in the masked static database, however, we do not apply changes in the original database.

Conclusion:

Static data masking is a useful feature for the database administrator to protect the sensitive information in the database. We can easily mask databases and use that copy to restore the database on the other database environments.

Appendix A

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views