The SQL Server CXPACKET wait type is one of the most misinterpreted wait stats. The CXPACKET term came from Class Exchange Packet, and in its essence, this can be described as data rows exchanged among two parallel threads that are the part of a single process. One thread is the “producer thread” and another thread is the “consumer thread”. This wait type is directly related to parallelism and it occurs in SQL Server whenever SQL Server executes a query using parallel plan.
Generally speaking, the CXPACKET wait type is normal for SQL Server and it is an indicator that SQL Server is using a parallel plan in executing a query, which is generally faster comparing to a query executed in a serialized process. When the parallel plan is used, the query is executed in multiple threads and the query can continue only when all parallel threads are completed. This mean that query will be as fast as the slowest thread.
The below diagram will be used for better understanding of the SQL Server CXPACKET wait type and will help in its interpretation.
From this diagram we can see that whenever a parallel query execution can provide benefit to SQL Server, it will create multiple threads for that statement allowing for each parallel process to produce its own subset of data. Each thread can be processed by the separate physical or logical CPU. Communication between the producer and consumer thread is performed via the producer-consumer queue, which is actually a buffer. The query operator in charge of implementing this queue is called the Exchange operator.
One or more producer threads will produce packets and send them to a buffer. That data will then be read from the buffer by the consumer threads. During this process, three different scenarios that can cause the excessive CXPACKET waits can be encountered:
The Consumer cannot read the packets because the buffer (queue) is empty – meaning that the Producer threads do not supply or supply slowly data into buffer. This mean that some Producer threads are working slowly due to waiting for a resource such as CPU, memory grants, I/O etc., or some Producer threads are simply blocked
The producer threads cannot store the packets into a buffer as buffer is full. This mean that Consumer threads cannot process the data fast enough, causing a situation where the Producer threads must wait to store the data in buffer, once the buffer gets full
Excessive parallelism for small queries, where creating the parallel plan and parallel execution could be costlier and slower than serialized plan
Uneven balance of packets across the parallel threads, could cause that some threads complete work faster than the others, and then they are waiting for other packets to complete their works
So let’s get inside of the SQL Server CXPACKET wait type to understand this process in more details. Let’s consider the ideal scenario for executing a query when a parallel plan has been used.
What we have in the above image is the example of a properly distributed load balance on each parallel thread, which is an ideal situation, as they will be executed in parallel without waiting on each other. But even in the ideal scenario, the parallel plan always has a “control thread” and it is in charge of registering CXPACKET waits. In case of the control thread the CXPACKET wait will represent the time needed for a parallel plan to be executed. It is now clear that the SQL Server CXPACKET wait type is always present in parallel execution even under an ideal scenario and that this is rather an indication of parallelism in query execution than an indication that something went wrong. As long as the CXPACKET is less than 50% of total waits, it shouldn’t be considered as a problem but rather as an indicator.
When high CXPACKET values are encountered, a possible issue, even in case when parallelism is evenly distributed, is when the cost of creating the parallel plan is higher than the cost of the serialized thread. This is often something that is overlooked and by the rule of thumb of reaching for altering of the Max Degree of Parallelism (MAXDOP), by setting it to 1 (each and every query will be processed by the single CPU core). Configuring MAXDOP settings to 1 should be the last resource used in troubleshooting excessive CXPACKET wait times.
It is important to know that SQL Server’s query optimizer is using the Cost Threshold for Parallelism (CTFP) to determine when the query should be parallelized, or in other words, when the serialized query plan cost exceeds the cost threshold for parallelism it will create a parallel query plan. The CTFP is set by default to 5, which mean that even not so expensive query plan could initiate the parallel plan to be created.
The Cost Threshold for Parallelism value is in seconds and it means that for every query for which SQL Server estimates that running time will be longer than 5 seconds, a parallel plan will be created. This default value has been set back in nineties when single core computers, slow hard drives and memory were used and for modern computers it is definitely not optimal. What was in that era executed in 5 seconds, on modern machines will be executed for a fragment of second. In general, estimating query execution in seconds is not a good approach as the query cost actually depends on CPU, memory, I/O etc., and SQL Server don’t know the speed of the CPU and how many cores/CPUs are available or speed of HDD/SSD used.
To prevent unwanted parallelism, the CTFP number could be increased and by the aforementioned rule of thumb, a minimum value of 25. Recent analysis indicates that 50 should be the optimal minimal number for modern computers. Although, finding the proper CTFP number and fine tuning it for maximum performance is something that has to be done by analyzing the query plans and available resources is the way to determine what CTFP configuration would work best for specific system. A great place to find how to properly determine the CTFP value is Tuning ‘cost threshold for parallelism’ from the Plan Cache article.
So only in situations when above mentioned resources are exhausted and CXPACKET wait time is still large, should playing with the Maximum Degree of Parallelism be considered. The MAXDOP number represents the number of CPU cores SQL Server will use for parallel query execution. The default setting for MAXDOP is 0 and it means that all CPU cores should be used for processing. With modern machines featuring 8, 12, 32, 64 or even more cores, it is not advisable to allow that single query take over all the cores.
When a high CXPACKET value is accompanied with a LATCH_XX and with PAGEIOLATCH_XX or SOS_SCHEDULER_YIELD, it is an indicator that slow/inefficient parallelism itself is the actual root cause of the performance issues. And in such a scenario if the LATCH_XX waits is ACCESS_METHODS_DATASET_PARENT or ACCESS_METHODS_SCAN_RANGE_GENERATOR class, then it is highly possible that the parallelism level is the bottleneck and the actual root cause of the query performance issue. This is a typical example when MAXDOP should be reduced.
For those who are interested in more details on how to set up MAXDOP properly for Intel, AMD and/or virtual machines there is a good article here Recommendations and guidelines for the “max degree of parallelism” configuration option in SQL Server.
All the above described have one aim, and this is to allow large queries to be executed in parallel, as they can benefit from that significantly, and ensuring small queries are run serialized as it is the more efficient approach for small queries.
Another scenario where high values of SQL Server CXPACKET wait type can occur is due to uneven distribution of data across the threads. This is a typical scenario where CXPACKET is not the problem, but that the CXPACKET value is an indicator that a problem exists. In such cases, troubleshooting should be focused on other potential problems to understand better how this scenario can create the high CXPACKET value. The following graphics will be used to illustrate the scenario.
In this particular scenario, it can be seen that Thread 1 and Thread 2 were executed and completed their processing, so now they are waiting for other threads to complete their execution. As it is shown in this particular case, thread 3 and 5 are still running. This type of thread wait is called CXPACKET wait. Due to uneven distribution of data that each thread has to process, the CXPACKET wait type can have significantly higher values sometimes. Most of the burden could be on one or two threads instead on all five, like we have in our example, so the time needed for completing will be higher. In such cases, the CXPACKET wait is again an indicator that there is something wrong, although not with parallelism itself, but rather with external resources that are causing the uneven distribution of data per thread. The source of the issue should be investigated focusing on improper indexing for example or obsolete statistics among other reasons.
It is also possible that thread has to wait for some external resources, the most common of which are:
when the thread has to share I/O resource with another database or application, which causes slower processing and requires more time to complete work
a large parallelized query that is executing for a long time where different threads have to access different databases that are stored on a different physical or logical storage of a different speed
the resources needed by some parallelized threads are blocked by ad-hoc queries executed at the same time
This is also an example where the CXPACKET wait type is just indicator that something is wrong. In such situations it is recommended to look at the associated wait types LCK_M_XX or PAGEIOLATCH_XX as well as IO_COMPLETION and ASYNC_IO_COMPLETION waits that are often accompanying previously mentioned two. Diagnosing and troubleshooting those wait types, rather than CXPACKET is something that will solve the root cause of the parallelism issues that were red-flagged via the high CXPACKET wait type value.
So to sum the things up, these are the steps that are recommended in diagnosing the cause of high CXPACKET wait stats values (before making any knee-jerk reaction and changing something on SQL Server):
Do not set MAXDOP to 1, as this is never the solution
Investigate the query and CXPACKET history to understand and determine whether it is something that occurred just once or twice, as it could be just the exception in the system that is normally working correctly
Check the indexes and statistics on tables used by the query and make sure they are up to date
Check the Cost Threshold for Parallelism (CTFP) and make sure that the value used is appropriate for your system
Check whether the CXPACKET is accompanied with a LATCH_XX (possibly with PAGEIOLATCH_XX or SOS_SCHEDULER_YIELD as well). If this is the case than the MAXDOP value should be lowered to fit your hardware
Check whether the CXPACKET is accompanied with a LCK_M_XX (usually accompanied with IO_COMPLETION and ASYNC_IO_COMPLETION). If this is the case, then parallelism is not the bottleneck. Troubleshoot those wait stats to find the root cause of the problem and solution
- SQL Server trace flags guide; from -1 to 840 - March 4, 2019
- How to handle the SQL Server WRITELOG wait type - June 13, 2018
- SQL Server performance counters (Batch Requests/sec or Transactions/sec): what to monitor and why - June 5, 2018