Gauri Mahajan
Displaying data using SQL varchar and CAST & Convert functions.

SQL varchar data type deep dive

May 29, 2019 by

In this article we’ll review the SQL varchar data type including a basic definition and overview, differences from varchar(n), UTF-8 support, Collation, performance considerations and more.

Data plays a crucial part in any organization and an attribute by which it is defined is called its data type. In simple words, data type states what kind of data any object, variable or expression can store. As a SQL developer, while creating a SQL table, we have to understand and decide what type of data will be contained by each and every column in a table. Like any other programming language, SQL also supports a gamut of data types that can hold integer data, date and time data, character data etc. and allows you to define data types of your own as well. SQL varchar is one of the best-known and most-used data types among the lot. In this article, we will walk through different facets of the SQL Server varchar in the SQL server.

Below is the outline that we will cover in this block.

  1. Introduction to the SQL Server varchar data type in SQL Server
  2. Use of varchar for large blocks of text
  3. What is new in SQL Server 2019 preview for varchar datatype?
  4. Influence of collation on varchar SQL in SQL Server
  5. UTF-8 support with varchar in SQL Server 2019 CTP
  6. SQL Server varchar for data conversions and data display
  7. Storage and performance considerations using SQL Server varchar
  8. Impact on string length of SQL varchar with CAST and CONVERT functions

Let’s move ahead and see the aforementioned in action.

So what is varchar in SQL?

As the name suggests, varchar means character data that is varying. Also known as Variable Character, it is an indeterminate length string data type. It can hold numbers, letters and special characters. Microsoft SQL Server 2008 (and above) can store up to 8000 characters as the maximum length of the string using varchar data type. SQL varchar usually holds 1 byte per character and 2 more bytes for the length information. It is recommended to use varchar as the data type when columns have variable length and the actual data is way less than the given capacity. Let’s switch to SSMS and see how varchar works.

The following example creates three variables (name, gender and age) with varchar as the data type and different values being assigned to them. As evident from the result sets shown below, by default, the string length of the SQL varchar columns is 1 and it returns only the first value of the variables(rest of the string being truncated) when no string length is passed for the varchar data type. Function len() is used to determine the number of characters stored in the varchar column.

Default values and length of SQL varchar variables in SSMS.

How SQL varchar(max) is different from varchar(n)?

There are times where SQL developers (including myself) usually define varchar datatype without a length, and subsequently, are failed to insert string records in the SQL table, this is because SQL Server allocates 1 character space as the default value to the varchar column that is defined without any length. In practical scenarios, varchar(n) is used to store variable length value as a string, here ‘n’ denotes the string length in bytes and it can go up to 8000 characters. Now, let’s proceed further and see how we can store SQL varchar data with a string length into the column of a SQL table. Below script creates the table Demovarchar with some data in it. And the result screen shows records of 7 employees based on their departments, age etc.

Demo table created to contain SQL Server varchar data in SSMS.

Suppose, there is a new addition of an employee in the organization and we, as SQL data developers, would have to insert this new record into the above table using INSERT SQL Statement. Below is one such example shown.

Error encountered while inserting a new record with string length greater than the assigned length of varchar column.

Oops, SQL Server encountered an error and terminated the statement saying string or binary data would be truncated. This has occurred because, column LastName varchar(10) can hold up to 10 characters and here we are attempting to insert a new record with string length(‘Newton Hamilton’) which is clearly greater than 10 characters. As a quick fix, we can alter the table and increase the data type of the SQL varchar column, say to varchar(50) to insert the new row. Execute the below script to ALTER and INSERT a new record into the table. Additionally, you can use LEN() and DATALENGTH() functions to determine the number of characters and the storage size in bytes respectively that are stored in the varchar column.

Succesfully inserted new record by changing varchar(10) data type to varchar(50) data type.

We observed above how we can set or alter the string length in the SQL varchar column to meet the business needs. However, consider a scenario, where we are unsure of the data size that is going to be loaded into our SQL tables, in such circumstances, inspecting and altering data type size for each and every column is not a viable choice. One of the options to handle this could be is to set the string length on the higher bar in the SQL Server varchar column (provided you have a rough estimation of what length of the string column would be approximately).

An important point to keep in consideration, we can use string length up to varchar(8000) only as this is the maximum number of characters that SQL varchar(n) data type can hold. So in cases when there are chances that the string length of the varchar column might exceed 8000 bytes, using varchar(8001) or anything higher will result into an error. One short example demonstrating this fact is shown below.

Displays error when exceeding the limit of varchar(8000) datatype to anything more than 8000.

SQL Server 2005 got around this limitation of 8KB storage size and provided a workaround with varchar(max). It is a non-Unicode large variable-length character data type and can store a maximum of 2^31-1 bytes (2 GB) of non-Unicode characters.

When I got first introduced to the concepts of varchar(n) and SQL varchar, the common question like any other beginner I had, was why can’t we simply declare a column of data type varchar(8500) or higher, since we have varchar(max) that takes care of storage up to 2GB and why are we supposed to either use varchar(<=8000) or varchar(max)? I got my answers on a little research that SQL Server uses page to store data and the size of each page is 8KB(excluding page header, row offsets size). If the data to be stored is less than or equal to 8000 bytes, varchar(n) or varchar(max) stores it in-row. However, if the data exceeds the 8000 byte size then it is treated as a Large Object(LOB) and they are not stored in-row but in separate LOB pages(LOB_DATA). Row in such case will only have a pointer to the LOB data page where the actual data is present and SQL Server automatically assigns an over-flow indicator to the page to manipulate data rows. In nutshell, if you know the data might exceed 8000 byte, it is a better option to use varchar(max) as the data type.

We can refer to the DMV sys.dm_db_index_physical_stats to see what kind of page allocation (IN_ROW_DATA data/LOB_DATA/ ROW_OVERFLOW_DATA) is performed. You can also check out this link in case you want detailed explanation on how SQL Server exercises row and page limits with both varchar(n) and varchar(max) data types.

Let’s quickly jump over to SSMS and see how we can use varchar(max). Execute the following script to insert 1 record where StringCol column value in each row is 15,000 B characters (i.e. 15,000 bytes).

Using SQL Server varchar feature of varchar(max)  to insert a column with 15,000 bytes value.

One limitation of using varchar(max) is we cannot create an index that has a varchar(max) as a key column, instead, it is advisable to do a Full-text index on that column.

A quick note to make – From here to the last leg of this article, we will mention varchar in place of varchar(n). Do NOT consider it as the varchar with default value = 1.

To learn some more interesting differences between varchar(n) and varchar(max) in SQL Server, consider going through this article, Comparing VARCHAR(max) vs VARCHAR(n) data types in SQL Server.

UTF-8 support with SQL Server 2019 CTP

Before we dig in what SQL Server 2019 preview feature has to offer for SQL varchar, let’s quickly look at one more interesting data type – ‘nvarchar’ first. Like SQL Server varchar [(n|max)], we have SQL nvarchar [(n|max)], the prefix n in nvarchar denotes Unicode, i.e. it stores both Unicode and non-Unicode data. The key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar can hold upto 4000 characters and it takes double the space as SQL varchar. You can go through this link to learn more about nvarchar in SQL Server.

With the public preview of SQL Server 2019, Microsoft has announced the support for UTF-8 character encoding to the existing data types (varchar and char). For those, who are not aware of UTF-8, it stands for Unicode Transformation Format and is a Unicode-based encoding that supports many languages. The 8 in UTF-8 means it uses 1 byte (8-bits) to represent a character in memory. Likewise, UTF-16 uses 16 bits (2 bytes) to represent a character. We will limit the scope of this new SQL Server 2019 CTP enhancement to ‘SQL varchar’ only in this article.

This enhancement has the following impact in SQL Server: is

  1. Improves Data compatibility

    Until SQL Server 2019 CTP, SQL varchar data type had the capacity to store only Non-Unicode data and with this preview, we can now create a varchar column to store Unicode data under UTF-8 enabled collations (_UTF8). UTF-8 is allowed in the varchar datatypes and is enabled when creating or changing an object’s collation to a collation with the UTF8 suffix. This helps in minimizing character conversion issues.

  2. Reduction in storage and performance improvements

    UTF-8 support for varchar data type provides substantial storage savings depending on the character set in use. For eg, using an UTF-8 enabled collation, changing the column data type from nvarchar(20) to varchar(20) offers a significant drop in storage requirements since nvarchar(20) requires 40 bytes for storage and varchar(20) needs 20 bytes for the same Unicode string.

Important side note – Since this enhancement is still in preview, we can expect more progressions on this front in the near future. However, existing Unicode (UTF-16) data types (nchar, nvarchar and ntext) remain unchanged in SQL Server 2019 preview.

Collation with SQL varchar in SQL Server 2019 CTP

Collation in SQL Server defines configurations to determine various rules like case sensitivity, accent sensitivity, sorting, character types and width etc. Understanding all these properties and how do they work with your data become very important. Collation can be set at server, database, expression or column level. UTF-8 supports database-level or column-level collation in SQL Server 2019 CTP and is enabled when you create or change Database or column collation to a collation with UTF8 suffix.

If you execute the below query against SQL Server 2019 CTP, you will be able to see all the UTF-8 supported collations on your instance of SQL Server using function (fn_helpcollations()).

UTF-8 collations enabled for SQL varchar in SQL Server 2019 CTP.

With SQL Server 2019 preview version, we can assign Unicode collations (UTF-8 supported) as well for SQL varchar columns using the COLLATE clause while declaring the varchar column. This way, specific collation is applied to the particular column’s data without impacting the rest of the database.

Since we are dealing with SQL Server varchar data type in this post, let’s see how Column Collation with SQL varchar datatype works. Execute the code below to alter the SQL Server varchar Column Collation from one collation type to _UTF8 suffix. You can read more on Database Collation from here.

Changing the Column collation to UTF8 on SQL Server varchar data type.

Role of SQL varchar in data conversions and data display

SQL Server varchar is widely used in displaying data in the desirable formats using Convert and Cast functions in SQL Server. Real data deals with a mix of data types and it has to be compatible with each other (i.e. belong to the same data type), before we make comparisons to them. SQL Server supports both implicit and explicit conversions.

With an incessant need of formatting and displaying data in the required output, SQL varchar comes really handy. As a SQL developer myself, I find it extremely straightforward to use convert/cast with varchar data type to make assignments or transformations on data, especially for the date fields.

I am using table FactInternetSales from Sample DB AdventureWorksDW2017 to show how this feature works. You can refer to any table with some datetime and money/float fields for the practice purpose. The following script converts two datetime columns to SQL varchar types with style 102 and 107 to display the data in the format yyyy.mm.dd and Mon dd, yyyy respectively. Also, the SalesAmount column with Money as a data type is converted to varchar and style 3 is applied to display the amount with commas as shown in the screenshot below. Additionally, say, we would want to see data for the orders placed in the year 2010 only, using the CAST function to convert datetime column to varchar data, the string comparison is performed in the WHERE clause. You can also go over SQL convert date to find more information on date conversion formats and styles.

Displaying data using SQL varchar and CAST & Convert functions.

Impact on string length of SQL varchar with CAST and CONVERT functions

SQL Server stores long string data in the commonly used varchar data type and it becomes helpful to know the expected and maximum lengths of the strings to display the results in the UI. Copy and execute the below code, where we are passing a long string in an unspecified length varchar variable (@demovarchar) and also in another variable with a defined varchar length (@demovarcharwithcast). Microsoft takes 30 as the default length for SQL Varchar (with unspecified varchar length) in the SQL Server when it is used with CAST and CONVERT functions. In our case, even though the length of the string was 52, it returned 30 as the length as shown in the last result output.

One important point to note here is that when an unspecified length varchar field is created, the default length of such field is 1 (shown in red color below). When varchar length is unspecified and is used with CAST or CONVERT functions, the CAST or CONVERT returns n=30 as the default string length of this conversion (marked in blue color below).

Understanding Length of strings concepts in SQL Server varchar when it is used with CAST and CONVERT functions.

Storage and performance considerations using SQL varchar

Data types like varchar, char and nvarchar are all used to store string data in SQL Server. SQL varchar stores variable string length whereas SQL char stores fixed string length. This means SQL Server varchar holds only the characters we assign to it and char holds the maximum column space regardless of the string it holds.

Because of the fixed field lengths, data is pulled straight from the column without doing any data manipulation and index lookups against varchar are slower than that of char fields. CHAR is better than VARCHAR performance wise, however, it takes unnecessary memory space when the data does not have a fixed-length. So in cases where disk size is not an issue, it is recommended to use CHAR.

In simple words, say we have a column with varchar(150) = ‘SQLShack’ – This will take 8 bytes(sqlshack) + 2 bytes for the length information = 10 bytes in actual and for column with char(150) = ‘SQLShack’ – This will consume whole 150 bytes on disk, regardless of what we pass as a string. The below example shows how CHAR uses the maximum allotted space (150) to fit in the string passed and how varchar column uses only the needed space.

Varchar SQL vs char SQL in SQL Server.

Bottom line is to use the data type that fits our need. You can use SQL varchar when the sizes of the column vary considerably, use varchar(max) when there are chances that string length might exceed 8000 bytes, use char when the sizes of the column are fixed and use nvarchar if there is a requirement to store Unicode or multilingual data.

Conclusion

Data types play a fundamental role in database design but they are often overlooked. A good understanding and accurate use of data types ensure correct nature and length of data is populated in the tables. The intention of this tip is to help you gain an understanding of basic characteristics and features of SQL Server varchar along with its performance and storage aspects in SQL Server. We also covered recent advancements in SQL varchar in the SQL Server 2019 Preview.

See also

You can check out these other articles to continue your learning on SQL datatypes.

Gauri Mahajan
Data types

About Gauri Mahajan

Gauri is a SQL Server Professional and has 6+ years experience of working with global multinational consulting and technology organizations. She is very passionate about working on SQL Server topics like Azure SQL Database, SQL Server Reporting Services, R, Python, Power BI, Database engine, etc. She has years of experience in technical documentation and is fond of technology authoring. She has a deep experience in designing data and analytics solutions and ensuring its stability, reliability, and performance. She is also certified in SQL Server and have passed certifications like 70-463: Implementing Data Warehouses with Microsoft SQL Server. View all posts by Gauri Mahajan

168 Views