SQL Server DB Migration - Cloning a database to another collation

Database migration is a vital task in any environment, complex or otherwise. Seamless migrations are the goal but the efforts required to ensure it are tremendous.

Backing up and restoring the database is surely a preferred and a robust approach, but does it work well in all situations? How do we plan this when the source and the destination databases need different configurations? How do we make such a migration seamless?

I recently worked on a database migration project, which needed a collation setting of case sensitivity that is “SQL_Latin1_General_CP1_CS_AS”, but the source setting of the database was “SQL_Latin1_General_CP1_CI_AS”.

Anyone would feel that it is fairly straightforward to restore the database and change the collation of the database to the said one. But when I dug into the database, I found there were various references being made on the columns, and that clustered index was created on many columns. Just like you cannot change the collation on a column that is being referenced by a foreign key, you cannot change the collation setting on a column that is set as the clustered index, which is the primary key column of the table.

A global SQL Server Collation Sequence update is the answer to the puzzle, but it is a multi-step operation. The first step is to identify the case sensitivity of the collation assigned to the repository database. If you find that the repository database is not case sensitive, you must create a new empty (blank) case-sensitive database to migrate the data to. You must then import the data from the old case-insensitive repository database to the new empty case-sensitive database, and then establish the new case-sensitive database as the repository database.

Background

The collation setting provides sorting rules, case sensitivity, and accent sensitivity properties of character data types such as char and varchar. They apply not only to data stored in tables, but also to all text used in SQL Server, including variable names, metadata, and temporary tables. If the collation is case sensitive, then the uppercase letters are treated differently than the lowercase letters. Objects can also be set to accent sensitive in which letters with accent characters will sort in a specific order.

Collation is by default applied to the entire SQL Server instance. Also, when we install SQL Server, it picks the collation setting from the Windows system language. If the systems language setting is set to English (US), then the collation for SQL Server would be “SQL_Latin1_General_CP1_CI_AS” by default.

Collation Names

SQL Server maintains a list of pre-defined collations that conform to the following pattern

SQL_<SortRules><[_Pref]>_<CPCodepage>_<CaseSensitivity>_<AccentSensitivity>

Each field within the collation name is separated by an underscore character (_). Values in these fields display characteristics of the collation and its sequence.

The SortRules and CPCodepage fields identify the name of the language/locale or alphabet that the collation was designed to support and the character numbering rule used when sorting terms. The CaseSensitivity and AccentSensitivity fields identify additional sorting rules to use when two letters within that language/locale or alphabet do not share the same case or accent.

To identify the case sensitivity of a selected collation, review the fifth term (CaseSensitivity) in the collation name. This term can contain one of the following values:

CI : Case Insensitive
CS : Case Sensitive

To create a case-sensitive repository database, select the collation that supports the language/locale or alphabet used by your repository database and include the value CS in the fifth term of the collation name.

How to identify collation setting and its various levels

Oftentimes, we’re stuck at a point where we’re unable to find what collation setting is applied to a certain object. Let us look at some of the useful queries that will help you find the collation and its levels.

First of all, you need to know that collation can be set four levels

Server-level collations

Set during the installation of SQL Server
Default collation for all the databases, both system and user-defined databases

If you’re uncertain what collation has been assigned to a certain SQL Server instance, use the SERVERPROPERTY system function to find that out:

SELECT CONVERT (varchar, SERVERPROPERTY('collation')) SQLServerCollation;

Review the fifth term of the collation name returned from the query. If the fifth term is CI, the database collation is case-insensitive

Database-level collations

It can be set during the creation of database using the collate clause of create or alter database SQL statement
Collation property of a newly-created database would be inherited from the instance, if we don’t explicitly mention the collation setting.

SELECT name, collation_name FROM sys.databases;

--OR

SELECT CONVERT (varchar, DATABASEPROPERTYEX('SQLShack','collation'));

In this case the fifth term is CS, which means that the database collation is case-sensitive.

Column-level collation

It can be set during the time of the creation of the table or altered using the collate clause
A column is a child object of the database, so the collation is inherited from the database, unless explicitly changed.

SELECT name, collation_name FROM sys.columns

Expression-level collation

Set when a SQL statement is run using the collate clause

SELECT name, collation_name FROM sys.columns order by name collate Latin1_General_100_CI_AS

The sort order is based on the defined collation setting

To view the all the collations supported in SQL Server, you can use the following query:

--The sys.fn_helpcollations system table function is used to view all collations supported by SQL Server 2017

SELECT name, description FROM sys.fn_helpcollations();

Preparing a Case-Insensitive Repository Database for Migration

The first step in preparing the case-insensitive repository database for transfer to a case-sensitive database is to run the SQL query, DBCC CHECKDB WITH DATA_PURITY. This query identifies any issues you must address before transferring the data.

In the Object Explorer pane, right-click the repository database objects, and then click New Query.
Type the following statement in the SQL Query pane.

1
2
3
4
5

USE WF_REPP
GO
DBCC CHECKDB WITH DATA_PURITY
Click Execute

In case of any errors, troubleshoot and fix the problems and run all of the above steps again. In this case, all of the statistics are set to zero (0), which means that there are no issues.

Create the database backup

The second step is to create a backup copy of the case-insensitive version of the repository database using the SQL Server Database Backup. We can use this backup database to recover the data if in case issues occur during or after the transfer from the case-insensitive database to the case-sensitive database

Creating a Case-sensitive Database

In the Object Explorer pane, right-click the Databases folder, and then click New Database.
Type the name of your new repository database in the Name field of the New Database dialog box.
For example, let’s call the new database, WF_REPP
In the Select a page pane, click Options.
Open the Collation field drop-down list, and click the case-sensitive collation SQL_Latin1_General_CP1_CS_AS
Click OK.
An icon for the new repository database appears in the Databases folder in the Object Explorer pane.

Scripting objects of case in-sensitive repository – Generate and Publish Scripts Wizard

In Object Explorer, expand Databases, right-click a database, point to Tasks, and then click Generate Scripts. Follow the steps in the wizard to script the database objects.
On the Choose Objects page, select entire database and all the database objects.
Click Next
On the Set Scripting Options page, select Save to new query window
To specify advanced scripting options, select the Advanced button and leave the default value of collation and change script to drop and create
On the Summary page, review your selections. Click Previous to change your selections. Click Next to generate a script of the objects you selected. On the Save or Publish Scripts page, monitor the progress of the script generation.
Fix the code and check related objects such UDF and TVFs, Views, computed columns, and constraints.
Execute the script on the case-sensitive repository.

Identify the dependency

By now, we know how to fix all the logical objects of the database. Let’s now take a look at fixing the collation settings on the tables. It is important to understand the foreign key relationships between the objects. Use the following query to list the dependent objects, where the collation setting SQL_Latin1_General_CPI_CI_AS of the column

Select

ROW_NUMBER() OVER (ORDER BY object_name(rkeyid), c1.name) as ID,

object_name(rkeyid) Parent_Table,

object_name(fkeyid) Child_Table,

object_name(constid) FKey_Name,

c1.name FKey_Col,

c2.name Ref_KeyCol

From

sys.sysforeignkeys s

INNER JOIN sys.syscolumns c1

on ( s.fkeyid = c1.id AND s.fkey = c1.colid )

INNER JOIN syscolumns c2

on ( s.rkeyid = c2.id AND s.rkey = c2.colid )

WHERE

c1.collation='SQL_Latin1_General_CP1_CI_AS'

Order by

Parent_Table,Child_Table

Identify Primary Key column dependency

SELECT si.name AS PrimaryKey,

OBJECT_NAME(ic.object_id) TableName,

COL_NAME(ic.object_id,ic.column_id) ColumnName,

sc.collation_name Collation

FROM sys.indexes AS si INNER JOIN

sys.index_columns AS ic ON si.object_id = ic.object_id AND

si.index_id = ic.index_id INNER JOIN

sys.columns sc ON ic.object_id=sc.object_id AND

sc.column_id=ic.column_id

WHERE si.is_primary_key=1

AND sc.collation_name like 'SQL_Latin1_General_CP1_CI_AS'

The above information clearly shows that using ALTER COLUMN to modify the collation will not work until the table is rebuilt. Let’s see a demo to change the collation of Primary Key column

--Step 1: Create the table with CS (Case-Sensitive) collation

CREATE TABLE [dbo].[SQLShackDemo](

[customer_id] [varchar](20) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL,

[customer_name] [varchar](20) NOT NULL,

[phone_number] [int] NULL,

[sales_total] [int] NULL,

CONSTRAINT [PK_CustomerID] PRIMARY KEY CLUSTERED

(

[customer_id] ASC

))

--Step 2: Insert the Dummy data

Insert into SQLShackDemo values(1,'Prashanth',168082,1000),(2,'Jayaram',68082,1500),(3,'thanVitha',68083,2500)

SELECT * from [SQLShackDemo]

--Step 3: Drop the PK constraint

ALTER TABLE [SQLShackDemo] DROP CONSTRAINT [PK_CustomerID]

--Step 4: Alter the column with the new collation setting

ALTER TABLE [<Table>] ALTER COLUMN [<Column>] <DataType(Size)> COLLATE <NewCollationType>

ALTER TABLE [SQLShackDemo] ALTER COLUMN [customer_id] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL

--Step 5: re-add the Primary Key constraint

ALTER TABLE [SQLShackDemo] ADD CONSTRAINT [PK_CustomerID] PRIMARY KEY (customer_id)

SELECT * from [SQLShackDemo]

Here’s what we did in the last steps:

Script drop and re-create a foreign key constraint
Drop all the all foreign key constraints
Migrate data using SQL Server Import and Export Wizard
Modify the target database by re-enabling the foreign key constraints
Rebuild the indexes

Please note that the above steps are an option only when working on a very few tables. Doing this on a huge table would be a daunting task, not to mention the significant amount of resources that would be consumed if the data is too large.

Summary

Many times, as administrators, we test various methodologies of migration. This article is an effort to demonstrate a simple technique to move the data to a different (new) database with a different configuration while ensuring to maintain data integrity.

SQL Server supports storing objects that have different collations within a database—we saw how we can set collation at multiple levels.

The execution plan may vary depending on the type of collation used in the transact SQL statement. The behavior of SQL execution may be different results with different collation settings.

If you want to use a collation other than the default when installing SQL Server, be sure to change the collation on the Collation tab of the wizard’s Server Configuration screen.

You should try your best to get the server collation right when you install SQL Server because changing it after the installation is no small feat. You must take steps such as backing up the data, rebuilding the master database, recreating the user databases and all the objects within and importing the data into the newly created tables.

Fortunately, instead of changing the server collation, you can assign a different collation to your user databases, and you can assign a specific collation to even a character column. But as usual, better to keep in mind the inheritance, so as to keep things simple and manageable.

Author
Recent Posts

Prashanth Jayaram

I’m a Database technologist having 11+ years of rich, hands-on experience on Database technologies. I am Microsoft Certified Professional and backed with a Degree in Master of Computer Application.

My specialty lies in designing & implementing High availability solutions and cross-platform DB Migration. The technologies currently working on are SQL Server, PowerShell, Oracle and MongoDB.

View all posts by Prashanth Jayaram

Latest posts by Prashanth Jayaram (see all)

Stairway to SQL essentials - April 7, 2021
A quick overview of database audit in SQL - January 28, 2021
How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server - January 20, 2021