Rajendra Gupta
Sample table

Comparing VARCHAR(max) vs VARCHAR(n) data types in SQL Server

July 26, 2019 by

I have seen that SQL developers use varchar(max) data while designing the tables or temporary tables. We might not be sure about the data length, or we want to eliminate the string or binary truncation error.

Is it a good practice to use varchar(max) for each usage?

We can define a specific range for the varchar (n) data type, and it is the recommended way to do so. In order to gain understanding about this data type, read SQL varchar(n) article.

We will discuss the use of varchar max and its implications, comparison with the varchar (n) data type in this article.

Overview of the VARCHAR(max) SQL Server Data Type

The SQL Server 2005 introduced this varchar(max) data type. It replaces the large blob object Text, NText and Image data types. All these data types can store data up to 2 GB. As you might be aware that the basic unit of storage in SQL Server is a page. The page size is 8 KB (8192 byes) in SQL Server, and it is fixed. On a page, SQL Server uses 96 bytes for the page header. We can store 8096 bytes ( 8192-96 bytes) for data in SQL Server. Apart from this, page also contains row overhead and row offset and leaves 8000 bytes to use for data storage. Due to this, we can store up to 8000 bytes of data using varchar (8000) data type.

You might think of using the varchar(max) data type to store 2 GB data to resolve the string truncation issues.

Let’s create a few sample tables with different size in varchar data type. We will also create a table with a varchar(max) data type.

Let’s insert records into these sample tables using the following queries.

We can verify the data length in these tables using the following queries.

In the following screenshot, we can verify the data length is similar to existing table column length.

Sample table

Now, we can check the object statistics like page count, row count, and allocation unit using the DMV sys.dm_db_index_physical_stats.

We can see that all tables contains the allocation unit IN_ROW_Data. SQL Server stores all data in the IN_ROW_Data allocation unit.

Allocation Unit for the varchar(max) data type

We cannot insert more than 8000 bytes data in the varchar(n) data type. If we try to do so , we get the following error message.

It inserts the data successfully but truncates the values to 8000 characters. Similar truncation occurs for the Employee_varchar_max table containing the varchar(max) data type.

Column data length

We need to cast the value to varchar(max) and insert for the length above 8000 characters. We get the error message while trying to insert records in Employee_varchar_8000 table.

It successfully inserts records in the Employee_varchar_max table.

String or binary data tuncation error

Rerun the query to check the allocation unit. We get the LOB_Data allocation unit to store the data more than 8000 bytes in the Employee_Varchar_Max table. We have a pointer to this data in the IN_Row_DATA allocation unit.

Allocation Unit for the varchar(max) data type

We can get the following conclusion from this.

  1. SQL Server uses the IN_ROW_DATA page for the varchar(max) data type if the data is less than or equal to 8000 bytes.
  2. If the data grows beyond the 8000 bytes, SQL Server uses LOB_DATA page for the varchar(max) data type

Performance comparison between varchar(max) and varchar(n) data type

Let’s insert 10,000 records into each of the tables we created earlier. We want to check the data insertion time. You can use the ApexSQL Generate tool to insert the data without writing the t-SQL code for it.

In the following screenshot, you can note the following.

  • Employee_varchar_2000 insertion time 0.08 Seconds
  • Employee_varchar_4500 insertion time 0.19 Seconds
  • Employee_varchar_8000 insertion time 0.31 Seconds
  • Employee_varchar_Max insertion time 2.72 Seconds

Performance comparison

Indexes on VARCHAR(N) and VARCHAR(MAX) columns

As a DBA, you might not design the table. However, it is required to create an Index on the tables to improve the performance of the query.

We can create an index on the key column of the table holding varchar(n) data type.

If we try to do the same for the varchar(max) data type, it gives the following error message.

Msg 1919, Level 16, State 1, Line 23 Column ‘col1’ in table ‘dbo.Employee_varchar_max’ is of a type that is invalid for use as a key column in an index.

We can use the varchar(max) column as an included column in the index, but you cannot perform the index seek on this column. It will also require additional storage. Therefore, you should avoid creating an index with the varchar(max) data type.

Execution plan comparison

Let’s compare the execution plan of two select statements.

In the first query, we want to retrieve data from the Employee_Varchar_2000 table and get the actual execution plan.

In the actual execution plan, we can see a non-clustered index seek operator.

Execution plan comparisons

If we run the same query with the varchar(max) data type, it uses a clustered index scan operator, and it can be a resource-intensive operator depending upon the number of rows in the table.

select col1 from Employee_varchar_max where col1 like ‘xxxx%’

Execution plan for varchar(max) data type

Let’s compare the execution plan using the Compare Showplan option of SSMS. To compare two execution plans, save one execution plan by right click on the plan and Save Execution Plan as and provide the location to save the plan.

In another query execution plan, right-click and choose Compare Showplan. It opens a window, and you can specify the path of the earlier saved execution plan.

In the following screenshot, you can see the comparison between both execution plans.

  • The estimated CPU cost is higher in the varchar(max) data type for a similar activity as of varchar(2000)
  • For the varchar(max) it uses clustered index scan operator and scans all records. You can see this the estimated number of rows is 10000 while in the varchar(2000) data type it uses index seek operator and estimated number of rows is 1.96078
  • Estimated row size 4035 B is greater than in varchar(max) compare to the 1011 B for the varchar(2000) data type

Execution plan comparison of varchar(max) and varchar(n)

Difference between the varchar(max) and varchar(n) data type

varchar(max)

varchar(n)

We can store up to 2 GB of data in this data type

We can store up to 8000 bytes data in this data type

It uses the allocation unit IN_ROW_Data up to 8000 bytes of data. If data is more than 8000 bytes, it uses the LOB_Data page and stores its pointer in the IN_ROW_Data page

It stores data in the standard data page

We cannot create an index on the key column of the varchar(max) data type

We can create an index on this data type

We cannot compress the LOB data

We can compress data for this data type

Data retrieval and updation on the LOB data is relatively slow

We do not face such issue in the varchar(n) data type

Conclusion

In this article, we demonstrated varchar(max) data type and also explored several differences between the varchar(max) and varchar(n) data types. You should use an appropriate data type. We should consider the database design, performance, compression, indexes in mind. You should review the data types in your database and change it if required with proper testing.

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views