Prashanth Jayaram

SQL Server DB Migration – Cloning a database to another collation

September 19, 2017 by

Database migration is a vital task in any environment, complex or otherwise. Seamless migrations are the goal but the efforts required to ensure it are tremendous.

Backing up and restoring the database is surely a preferred and a robust approach, but does it work well in all situations? How do we plan this when the source and the destination databases need different configurations? How do we make such a migration seamless?

I recently worked on a database migration project, which needed a collation setting of case sensitivity that is “SQL_Latin1_General_CP1_CS_AS”, but the source setting of the database was “SQL_Latin1_General_CP1_CI_AS”.

Anyone would feel that it is fairly straightforward to restore the database and change the collation of the database to the said one. But when I dug into the database, I found there were various references being made on the columns, and that clustered index was created on many columns. Just like you cannot change the collation on a column that is being referenced by a foreign key, you cannot change the collation setting on a column that is set as the clustered index, which is the primary key column of the table.

A global SQL Server Collation Sequence update is the answer to the puzzle, but it is a multi-step operation. The first step is to identify the case sensitivity of the collation assigned to the repository database. If you find that the repository database is not case sensitive, you must create a new empty (blank) case-sensitive database to migrate the data to. You must then import the data from the old case-insensitive repository database to the new empty case-sensitive database, and then establish the new case-sensitive database as the repository database.

Background

The collation setting provides sorting rules, case sensitivity, and accent sensitivity properties of character data types such as char and varchar. They apply not only to data stored in tables, but also to all text used in SQL Server, including variable names, metadata, and temporary tables. If the collation is case sensitive, then the uppercase letters are treated differently than the lowercase letters. Objects can also be set to accent sensitive in which letters with accent characters will sort in a specific order.

Collation is by default applied to the entire SQL Server instance. Also, when we install SQL Server, it picks the collation setting from the Windows system language. If the systems language setting is set to English (US), then the collation for SQL Server would be “SQL_Latin1_General_CP1_CI_AS” by default.

Collation Names

SQL Server maintains a list of pre-defined collations that conform to the following pattern

SQL_<SortRules><[_Pref]>_<CPCodepage>_<CaseSensitivity>_<AccentSensitivity>

Each field within the collation name is separated by an underscore character (_). Values in these fields display characteristics of the collation and its sequence.

The SortRules and CPCodepage fields identify the name of the language/locale or alphabet that the collation was designed to support and the character numbering rule used when sorting terms. The CaseSensitivity and AccentSensitivity fields identify additional sorting rules to use when two letters within that language/locale or alphabet do not share the same case or accent.

To identify the case sensitivity of a selected collation, review the fifth term (CaseSensitivity) in the collation name. This term can contain one of the following values:

  • CI : Case Insensitive
  • CS : Case Sensitive

To create a case-sensitive repository database, select the collation that supports the language/locale or alphabet used by your repository database and include the value CS in the fifth term of the collation name.

How to identify collation setting and its various levels

Oftentimes, we’re stuck at a point where we’re unable to find what collation setting is applied to a certain object. Let us look at some of the useful queries that will help you find the collation and its levels.

First of all, you need to know that collation can be set four levels

Server-level collations

  • Set during the installation of SQL Server
  • Default collation for all the databases, both system and user-defined databases

If you’re uncertain what collation has been assigned to a certain SQL Server instance, use the SERVERPROPERTY system function to find that out:


Review the fifth term of the collation name returned from the query. If the fifth term is CI, the database collation is case-insensitive

Database-level collations

  • It can be set during the creation of database using the collate clause of create or alter database SQL statement
  • Collation property of a newly-created database would be inherited from the instance, if we don’t explicitly mention the collation setting.

In this case the fifth term is CS, which means that the database collation is case-sensitive.

Column-level collation

  • It can be set during the time of the creation of the table or altered using the collate clause
  • A column is a child object of the database, so the collation is inherited from the database, unless explicitly changed.

Expression-level collation

  • Set when a SQL statement is run using the collate clause

The sort order is based on the defined collation setting

To view the all the collations supported in SQL Server, you can use the following query:


Preparing a Case-Insensitive Repository Database for Migration

The first step in preparing the case-insensitive repository database for transfer to a case-sensitive database is to run the SQL query, DBCC CHECKDB WITH DATA_PURITY. This query identifies any issues you must address before transferring the data.

  • In the Object Explorer pane, right-click the repository database objects, and then click New Query.
  • Type the following statement in the SQL Query pane.
  • Click Execute

In case of any errors, troubleshoot and fix the problems and run all of the above steps again. In this case, all of the statistics are set to zero (0), which means that there are no issues.

Create the database backup

The second step is to create a backup copy of the case-insensitive version of the repository database using the SQL Server Database Backup. We can use this backup database to recover the data if in case issues occur during or after the transfer from the case-insensitive database to the case-sensitive database

Creating a Case-sensitive Database

  • In the Object Explorer pane, right-click the Databases folder, and then click New Database.
  • Type the name of your new repository database in the Name field of the New Database dialog box.
    For example, let’s call the new database, WF_REPP
  • In the Select a page pane, click Options.
  • Open the Collation field drop-down list, and click the case-sensitive collation SQL_Latin1_General_CP1_CS_AS
  • Click OK.
  • An icon for the new repository database appears in the Databases folder in the Object Explorer pane.

Scripting objects of case in-sensitive repository – Generate and Publish Scripts Wizard

  • In Object Explorer, expand Databases, right-click a database, point to Tasks, and then click Generate Scripts. Follow the steps in the wizard to script the database objects.
  • On the Choose Objects page, select entire database and all the database objects.

  • Click Next
  • On the Set Scripting Options page, select Save to new query window
  • To specify advanced scripting options, select the Advanced button and leave the default value of collation and change script to drop and create

  • On the Summary page, review your selections. Click Previous to change your selections. Click Next to generate a script of the objects you selected. On the Save or Publish Scripts page, monitor the progress of the script generation.

  • Fix the code and check related objects such UDF and TVFs, Views, computed columns, and constraints.
  • Execute the script on the case-sensitive repository.

Identify the dependency

By now, we know how to fix all the logical objects of the database. Let’s now take a look at fixing the collation settings on the tables. It is important to understand the foreign key relationships between the objects. Use the following query to list the dependent objects, where the collation setting SQL_Latin1_General_CPI_CI_AS of the column


  • Identify Primary Key column dependency


  • The above information clearly shows that using ALTER COLUMN to modify the collation will not work until the table is rebuilt. Let’s see a demo to change the collation of Primary Key column

Here’s what we did in the last steps:

  • Script drop and re-create a foreign key constraint
  • Drop all the all foreign key constraints
  • Migrate data using SQL Server Import and Export Wizard
  • Modify the target database by re-enabling the foreign key constraints
  • Rebuild the indexes

Please note that the above steps are an option only when working on a very few tables. Doing this on a huge table would be a daunting task, not to mention the significant amount of resources that would be consumed if the data is too large.

Summary

Many times, as administrators, we test various methodologies of migration. This article is an effort to demonstrate a simple technique to move the data to a different (new) database with a different configuration while ensuring to maintain data integrity.

SQL Server supports storing objects that have different collations within a database—we saw how we can set collation at multiple levels.

The execution plan may vary depending on the type of collation used in the transact SQL statement. The behavior of SQL execution may be different results with different collation settings.

If you want to use a collation other than the default when installing SQL Server, be sure to change the collation on the Collation tab of the wizard’s Server Configuration screen.

You should try your best to get the server collation right when you install SQL Server because changing it after the installation is no small feat. You must take steps such as backing up the data, rebuilding the master database, recreating the user databases and all the objects within and importing the data into the newly created tables.

Fortunately, instead of changing the server collation, you can assign a different collation to your user databases, and you can assign a specific collation to even a character column. But as usual, better to keep in mind the inheritance, so as to keep things simple and manageable.


Prashanth Jayaram
Migration

About Prashanth Jayaram

I’m a Database technologist having 11+ years of rich, hands-on experience on Database technologies. I am Microsoft Certified Professional and backed with a Degree in Master of Computer Application. My specialty lies in designing & implementing High availability solutions and cross-platform DB Migration. The technologies currently working on are SQL Server, PowerShell, Oracle and MongoDB. View all posts by Prashanth Jayaram

168 Views