Nikola Dimitrijevic

How to handle excessive SOS_SCHEDULER_YIELD wait type values in SQL Server

June 8, 2016 by

The SQL Server SOS_SCHEDULER_YIELD is a fairly common wait type and it could indicate one of two things:

  • SQL Server CPU scheduler is utilized properly and is working efficiently

  • There is a pressure on CPU

The first thing that has to be properly understood is that the SOS_SCHEDULER_YIELD wait type, even so named, is not actually an actual wait type at all, at least not comparing to other wait types. So it is often misinterpreted despite its prevalence as a wait type in SQL Server. The first thing that has to be explained is what the SOS_SCHEDULER_YIELD wait type is, for a starting point the Microsoft online book definition will be used:

“Occurs when a task voluntarily yields the scheduler for other tasks to execute. During this wait the task is waiting for its quantum to be renewed.”

To properly understand what this means, when and why the SOS_SCHEDULER_YIELD wait type occurs, a better knowledge of SQL scheduler and its functionality is needed. So let’s take a look, in detail, at the SQL OS scheduler design and how it works

SQL Server OS (via the storage engine) is exclusively in charge for thread scheduling (as it doesn’t rely on Windows OS) and it will assign a scheduler to every CPU core in order to manage threads. So all requests made by the user will be scheduled for execution via SQL OS. SQL OS will use schedulers for that and it will create one scheduler for each CPU core. If SQL Server is using a machine with two octa core processors, SQL OS will create 16 schedulers for that instance

The SQL OS scheduler (or SOS Scheduler) consists of three components:

Processor – this is the physical or logical CPU/CPU core and it is in charge of processing the thread at rate of one at a time

Waiter list – the waiter list stores the threads that are in suspended state as they have to wait for a resource to became available. The waiter list does not impose any time limitations to contained threads and there are no parameters that can be set to limit the time a thread can spend in the waiter list. Though the timeout specified in the query execution session is takes precedence, the thread could still be canceled as a consequence of the execution timeout

Runnable queue – The runnable queue uses a strict First-In-First-Out (FIFO) order of threads. The thread that is transitioning from the waiter list or processor into a runnable queue will be moved to the last position of the queue. The only situation when FIFO order can be overridden is when the resource governor is enabled. When enabled, some different resource pool priorities can be assigned to a workload groups; High, Medium and Low priority. The thread with the higher priority assigned can override the lower priority thread in the runnable queue. Please note that the use of the resource governor can be considered a special case and it is not often seen in practice

The thread in the scheduler can have one of the three different states:

RUNNING – a thread is running on processor core. Only one thread can be active at the time for running on each core

SUSPENDED – thread calls for the resource that are not available are moved in the waiter list. It will stay in the waiter list until the resource became available

RUNNABLE – a thread that has available resources, but is moved to the runnable queue where it will wait until the CPU became available

We have two different scenarios that describe how the scheduler is working and both will be explained here

  1. Thread running through three states until completing the user request

    The running thread that is trying to access a required resource, which is not immediately available to be acquired, will became suspended and will be moved to the wait list. It will stay in this suspended state until the requested resource becomes available. Note that there is no time limit for a thread that is in the waiter list, nor limit on the number of threads. The moment when the resource becomes available for the thread, it will be moved to a runnable queue where it will wait for the processor to became available. When the processor becomes available it will complete the request upon which it will enter the sleep state again (unless it has to wait for the resource again, in which case the sequence will be repeated)

    What can be seen in image is:

    1. The SPID 53 requires a resource that is not immediately available and it will be moved to a waiter list with the appropriate wait type

    2. The SPID 72 resource is available and it will be moved from a waiter list into a runnable queue at the last position

    3. SPID 51 enters the running state as it is its turn and CPU resource is available

  2. Thread running through two states until completing the user request

    Each thread has a fixed quantum (time given to threads to use CPU free of interruptions) of 4 milliseconds assigned to it by SQL OS and it cannot be changed. Each thread is accountable to estimate when that quantum assigned to it is exhausted (via SQL OS helper routine) and, if so, will yield voluntarily to allow the next thread to get in to use its CPU time (quantum). In this case, the yielded thread will not be moved in the waiter list as it’s resource is actually available and there isn’t anything remaining to wait for, so it will be moved directly at the bottom of the runnable queue of its scheduler.

    This second scenario is exactly where the SQL OS will register thread movement from a running state into a runnable queue (runnable state) as the SOS_SCHEDULER_YIELD wait type. Since there is nothing to wait (the thread doesn’t wait for any resource, and it just voluntarily yielding to a runnable queue), SOS_SCHEDULER_YIELD will have 0 milliseconds of resource wait time. This will create the signal wait time that will show how long the thread has waited in the runnable queue to get back in the running state again

    In the example above:

    1. SPID 53 has exhausted its scheduled quantum (4 milliseconds) and is in transition to a runnable queue

    2. SPID 51 is going to a running state as processor has become available after SPID 53 voluntarily moved to a runnable queue

High SOS_SCHEDULER_YIELD causes

As it can be seen from the previous example, the SOS_SCHEDULER_YIELD wait type is the common wait type and it is the actual indicator that the thread has exhausted its quantum. If the SOS_SCHEDULER_YIELD is the prevalent wait type, it might indicate that CPU pressure is the problem, but this doesn’t necessarily mean that CPU is not powerful enough to process the user requests

Less experienced administrators are often reaching to set MAXDOP to 1 in order to troubleshoot excessive CXPACKET waits or set the CTFP number too high, and in that way create high SOS_SCHEDULER_YIELD wait type values; a direct consequence of this inappropriate troubleshooting. A common, but incorrect, perception to high SOS_SCHEDULER_YIELD values is that CPU is the bottleneck and more CPU power is needed. High SOS_SCHEDULER_YIELD wait type values indicate that the query that causing high SOS_SCHEDULER_YIELD needs more CPU power, as it could finish faster with such additional resources

Here is an example of how a single threaded query will always use one CPU core. While the core is utilized 100% causing the bottleneck in a query execution resulting in a high SOS_SCHEDULER_YIELD wait type value, the rest of the seven cores are more or less idle. This example clearly indicates that it is not the CPU that is the bottleneck nor root cause of the high SOS_SCHEDULER_YIELD waits, as it is obvious that there is abundance of CPU resources available. Instead, it is a case that the query needs more CPU power but it is limited to use only the single core for execution. So poor optimization if the SQL Server, in this case, caused the query to be executed on a single core, instead of running 6 time faster if executed in parallel on 6 cores.

Spinlock could be often indicated as one of the reasons for excessive SOS_SCHEDULER_YIELD wait type values to be shown, but this is actually wrong. For those who want to get more in depth analysis on why the spinlock is not related to excessive amount of SOS_SCHEDULER_YIELD wait type, please see this article: SOS_SCHEDULER_YIELD waits and the LOCK_HASH spinlock

So the question is when and what to troubleshoot when high SOS_SCHEDULER_YIELD wait type values are present? To answer that question, three different scenarios have to be described since the answer depends on whether the SOS_SCHEDULER_YIELD is frequent and/or is it coupled with high signal wait times

  • SOS_SCHEDULER_YIELD is frequent but with a low signal wait time

    This is a typical scenario where SOS_SCHEDULER_YIELD waits are not indicators of the problem. It just indicates that there are lot of threads that are running on the scheduler, and all threads are executing with the same priority, without a single thread that is governing CPU. In this case, something else should be investigated for the performance issues

  • SOS_SCHEDULER_YIELD is highly frequent, has a high signal wait time

    This is the scenario where CPU pressure is highly possible as it might indicate that a large number of CPU intensive queries are trying to reach the CPU resources and thus the high frequency of SOS_SCHEDULER_YIELD waits. The high wait time in this case shows that the yielded thread that was moved to a runnable queue, needed to wait a long time for the scheduler to bring it back to a running state

    Usually this requires some query tuning as it is the query itself that is causing the problem. Take a look at the execution plan for unplanned CPU intensive operations like intense data conversion, large index and/or table scans or user defined functions. Also check for dropped non clustered indexes

  • SOS_SCHEDULER_YIELD is not frequent but has a high signal wait time

    Such a scenario indicates that there are not many threads that are waiting for CPU to go to a running state, so it is the external (system) that is causing the CPU resources to be monopolized rather than some SQL Server thread. Various examples could be the reason, like a Windows application that is CPU intensive or is running with higher priority and thus doesn’t return CPU control to a SQL thread that is in the runnable queue.

    One of the reasons that is often overlooked, but should be investigated, is the power management feature on the server. When enabled, it causes fluctuation of the CPU frequency by scaling it up and down depending on how the feature will estimate the requirements for the CPU power. The frequency change, albeit quite frequent, is often not fast enough to keep the pace with SQL Server demands and thus causes the creation of the SOS_SCHEDULER_YIELD wait type. Turning off power management and allowing CPU to run at the highest speed usually is the easiest solution here

Nikola Dimitrijevic
Wait types

About Nikola Dimitrijevic

Nikola is computer freak since 1981 and an SQL enthusiast with intention to became a freak. Specialized in SQL Server auditing, compliance and performance monitoring. Military aviation devotee and hard core scale aircraft modeler. Extreme sports fan; parachutist and bungee jump instructor. Once serious, now just a free time photographer View all posts by Nikola Dimitrijevic

168 Views