Comparing VARCHAR(max) vs VARCHAR(n) data types in SQL Server

I have seen that SQL developers use varchar(max) data while designing the tables or temporary tables. We might not be sure about the data length, or we want to eliminate the string or binary truncation error.

Is it a good practice to use varchar(max) for each usage?

We can define a specific range for the varchar (n) data type, and it is the recommended way to do so. In order to gain understanding about this data type, read SQL varchar(n) article.

We will discuss the use of varchar max and its implications, comparison with the varchar (n) data type in this article.

Overview of the VARCHAR(max) SQL Server Data Type

The SQL Server 2005 introduced this varchar(max) data type. It replaces the large blob object Text, NText and Image data types. All these data types can store data up to 2 GB. As you might be aware that the basic unit of storage in SQL Server is a page. The page size is 8 KB (8192 byes) in SQL Server, and it is fixed. On a page, SQL Server uses 96 bytes for the page header. We can store 8096 bytes ( 8192-96 bytes) for data in SQL Server. Apart from this, page also contains row overhead and row offset and leaves 8000 bytes to use for data storage. Due to this, we can store up to 8000 bytes of data using varchar (8000) data type.

You might think of using the varchar(max) data type to store 2 GB data to resolve the string truncation issues.

Let’s create a few sample tables with different size in varchar data type. We will also create a table with a varchar(max) data type.

CREATE TABLE dbo.Employee_varchar_2000

(id INT IDENTITY PRIMARY KEY,

Col1 VARCHAR(2000)

);

CREATE TABLE dbo.Employee_Varchar_4500

(id INT IDENTITY PRIMARY KEY,

Col1 VARCHAR(4500)

);

CREATE TABLE dbo.Employee_Varchar_8000

(id INT IDENTITY PRIMARY KEY,

Col1 VARCHAR(8000)

);

CREATE TABLE dbo.Employee_Varchar_Max

(id INT IDENTITY PRIMARY KEY,

Col1 VARCHAR(MAX)

);

Let’s insert records into these sample tables using the following queries.

INSERT INTO Employee_varchar_2000 (Col1)

SELECT REPLICATE('A', 2000);

INSERT INTO Employee_varchar_4500 (Col1)

SELECT REPLICATE('A', 4500);

INSERT INTO Employee_varchar_8000 (Col1)

SELECT REPLICATE('A', 8000);

INSERT INTO Employee_varchar_max (Col1)

SELECT REPLICATE('A', 8000);

We can verify the data length in these tables using the following queries.

Use SQLShackDemo

SELECT LEN(col1) AS columnlength

FROM Employee_varchar_2000;

SELECT LEN(col1) AS columnlength

FROM Employee_varchar_4500;

SELECT LEN(col1) AS columnlength

FROM Employee_varchar_8000;

SELECT LEN(col1) AS columnlength

FROM Employee_varchar_max;

In the following screenshot, we can verify the data length is similar to existing table column length.

Now, we can check the object statistics like page count, row count, and allocation unit using the DMV sys.dm_db_index_physical_stats.

SELECT OBJECT_NAME([object_id]) AS TableName,

alloc_unit_type_desc,

record_count,

page_count,

round(avg_page_space_used_in_percent,0) as avg_page_space_used_in_percent ,

min_record_size_in_bytes,

max_record_size_in_bytes

FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED')

WHERE OBJECT_NAME([object_id]) LIKE 'Employee_varchar%';

We can see that all tables contains the allocation unit IN_ROW_Data. SQL Server stores all data in the IN_ROW_Data allocation unit.

We cannot insert more than 8000 bytes data in the varchar(n) data type. If we try to do so , we get the following error message.

INSERT INTO Employee_varchar_8000 (Col1)

SELECT REPLICATE('A', 8001);

INSERT INTO Employee_varchar_8000 (Col1)

SELECT REPLICATE('A', 10000);

It inserts the data successfully but truncates the values to 8000 characters. Similar truncation occurs for the Employee_varchar_max table containing the varchar(max) data type.

We need to cast the value to varchar(max) and insert for the length above 8000 characters. We get the error message while trying to insert records in Employee_varchar_8000 table.

1 2	INSERT INTO Employee_varchar_8000 (Col1) SELECT REPLICATE(CONVERT(VARCHAR(max), 'x'), 8001);

It successfully inserts records in the Employee_varchar_max table.

Rerun the query to check the allocation unit. We get the LOB_Data allocation unit to store the data more than 8000 bytes in the Employee_Varchar_Max table. We have a pointer to this data in the IN_Row_DATA allocation unit.

We can get the following conclusion from this.

SQL Server uses the IN_ROW_DATA page for the varchar(max) data type if the data is less than or equal to 8000 bytes.
If the data grows beyond the 8000 bytes, SQL Server uses LOB_DATA page for the varchar(max) data type

Performance comparison between varchar(max) and varchar(n) data type

Let’s insert 10,000 records into each of the tables we created earlier. We want to check the data insertion time. You can use the ApexSQL Generate tool to insert the data without writing the t-SQL code for it.

In the following screenshot, you can note the following.

Employee_varchar_2000 insertion time 0.08 Seconds
Employee_varchar_4500 insertion time 0.19 Seconds
Employee_varchar_8000 insertion time 0.31 Seconds
Employee_varchar_Max insertion time 2.72 Seconds

Indexes on VARCHAR(N) and VARCHAR(MAX) columns

As a DBA, you might not design the table. However, it is required to create an Index on the tables to improve the performance of the query.

We can create an index on the key column of the table holding varchar(n) data type.

CREATE INDEX IX_Employee_varchar_2000_1

ON dbo.Employee_varchar_2000(col1)

If we try to do the same for the varchar(max) data type, it gives the following error message.

CREATE INDEX IX_Employee_varchar_max

ON dbo.Employee_varchar_max(col1)

Msg 1919, Level 16, State 1, Line 23 Column ‘col1’ in table ‘dbo.Employee_varchar_max’ is of a type that is invalid for use as a key column in an index.

We can use the varchar(max) column as an included column in the index, but you cannot perform the index seek on this column. It will also require additional storage. Therefore, you should avoid creating an index with the varchar(max) data type.

Execution plan comparison

Let’s compare the execution plan of two select statements.

In the first query, we want to retrieve data from the Employee_Varchar_2000 table and get the actual execution plan.

In the actual execution plan, we can see a non-clustered index seek operator.

If we run the same query with the varchar(max) data type, it uses a clustered index scan operator, and it can be a resource-intensive operator depending upon the number of rows in the table.

select col1 from Employee_varchar_max where col1 like ‘xxxx%’

Let’s compare the execution plan using the Compare Showplan option of SSMS. To compare two execution plans, save one execution plan by right click on the plan and Save Execution Plan as and provide the location to save the plan.

In another query execution plan, right-click and choose Compare Showplan. It opens a window, and you can specify the path of the earlier saved execution plan.

In the following screenshot, you can see the comparison between both execution plans.

The estimated CPU cost is higher in the varchar(max) data type for a similar activity as of varchar(2000)
For the varchar(max) it uses clustered index scan operator and scans all records. You can see this the estimated number of rows is 10000 while in the varchar(2000) data type it uses index seek operator and estimated number of rows is 1.96078
Estimated row size 4035 B is greater than in varchar(max) compare to the 1011 B for the varchar(2000) data type

Difference between the varchar(max) and varchar(n) data type

varchar(max)	varchar(n)
We can store up to 2 GB of data in this data type	We can store up to 8000 bytes data in this data type
It uses the allocation unit IN_ROW_Data up to 8000 bytes of data. If data is more than 8000 bytes, it uses the LOB_Data page and stores its pointer in the IN_ROW_Data page	It stores data in the standard data page
We cannot create an index on the key column of the varchar(max) data type	We can create an index on this data type
We cannot compress the LOB data	We can compress data for this data type
Data retrieval and updation on the LOB data is relatively slow	We do not face such issue in the varchar(n) data type

Conclusion

In this article, we demonstrated varchar(max) data type and also explored several differences between the varchar(max) and varchar(n) data types. You should use an appropriate data type. We should consider the database design, performance, compression, indexes in mind. You should review the data types in your database and change it if required with proper testing.

Author
Recent Posts

Rajendra Gupta

Hi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.

I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.

I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.

Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.

Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at rajendra.gupta16@gmail.com

View all posts by Rajendra Gupta

Latest posts by Rajendra Gupta (see all)

Understanding PostgreSQL SUBSTRING function - September 21, 2024
How to install PostgreSQL on Ubuntu - July 13, 2023
How to use the CROSSTAB function in PostgreSQL - February 17, 2023

Overview of the VARCHAR(max) SQL Server Data Type

Performance comparison between varchar(max) and varchar(n) data type

Indexes on VARCHAR(N) and VARCHAR(MAX) columns

Execution plan comparison

Difference between the varchar(max) and varchar(n) data type

Conclusion

Related posts: