David Alcock

Using sp_server_diagnostics

January 19, 2017 by

Troubleshooting SQL Server is all about gathering the right evidence. Ordinarily we utilise a variety of different methods and analyse their output to look for specific areas where we would focus our diagnostic efforts. We could for example, use the results of various DMVs to look at wait statistic and resource information to help us focus our investigation in a particular area of SQL Server.

One available option that I haven’t seen commonly used is a system stored procedure introduced in SQL Server 2012 called sys.sp_server_diagnostics. This stored procedure provides a quick assessment of a SQL instance by capturing and returning health and event related information that is conveniently categorised for us.

To execute, using a regular or admin connection, run the following T-SQL command:

By default the capture process lasts for duration of five seconds before the output is returned to the results grid as displayed in the image below:

The stored procedure can also be executed in a repeated mode where a time interval can be passed and execution will repeat continuously (until cancelled) and the output will be sent to the results pane each stated interval duration:

Each returned row represents a particular category of health-related information:

System

Returns system data on items such as spinlocks, latch warnings, page faults, CPU utilisation and other performance conditions.

Resource

The resource category focuses on memory related objects including, amongst other items, available physical and virtual memory, free page information and memory low flags.

Query_Processing

Query Processing returns cumulative wait statistic information including the top preemptive and non-preemptive waits ordered by both count and wait duration.

IO_Subsystem

Returns IO information such as IO related timeouts and pending IO requests.

Events

The events category returns comprehensive event information retrieved during the execution duration.

A sixth category is present when running the stored procedure on an instance where Always On Availability Groups are implemented (component name is cleared in this example):

By using the state and state_desc columns we can very quickly see if the process has detected any potential errors in any of the categories that warrant further investigation. Whilst the health scoring calculations are a relative unknown the possible values for state are the following:

0: Unknown

1: Clean

2: Warning

3: Error

Please note that the event category has a default state value of 0.

When running the stored procedure, the data column returned uses a varchar(MAX) data type by default however we can insert the output into a table converting the column into a more readable XML type without too much overhead using code like the following:

However, there is a pretty big restriction when using this method, when using INSERT INTO the stored procedure cannot be executed in a repeated mode and the following error message is displayed:

Msg 17073, Level 16, State 1, Procedure sp_server_diagnostics, Line 1 Executing ‘sp_server_diagnostics’ stored procedure in repeat mode with ‘insert into’ clause is not allowed.

The advantage of using an XML data type though is that we can view data much easier within Management Studio and by looking into the following results obtained from one of my test instances we can get a great insight into the health of my SQL Server instance very quickly:

System

From this example I can see that my instance has not detected and fixed any bad pages, CPU utilisation is at 3% whilst the utilisation for my SQL instance is at 0%. Amongst other items, no latch warnings or spinlocks have been detected.

Resource

Resource returns memory related information for my instance. Using these results I can see how much memory is available in the machine, I can see page allocation information and it contains the memory high/low flags that you would using system objects such as the sys.dm_os_sys_memory DMV.

Query Processing

As you can see the query processing category is centred on task counts and those important wait statistics. Waits are supplied in top ten lists and are broken down into pre-emptive and non-pre-emptive types. Pre-emptive waits, also referred to as non-cooperative waits are caused outside of SQL by an interruption of a task by the Operating System. Non-pre-emptive or co-operative waits simply mean that SQL Server is managing the thread scheduling.

Each category of wait has two lists; waits ordered by count and waits ordered by wait duration. There are two very important points when looking at this type of information; first, these are cumulative values. If you are running sp_server_diagnostics in repeated mode the values are not cleared down at each specified interval and will continue to add up. This actually applies in non-repeated mode as well, the values represent the total wait counts and duration since they were last cleared either manually or by an instance restart.

Secondly there is no way to filter out what we might consider benign waits. The top two non-pre-emptive waits that I have in my instance are HADR_FILESTREAM_IOMGR_IOCOMPLETION and QDS_SHUTDOWN_QUEUE which relate to the filestream manager and the query store respectively; both of which both can safely be ignored in this instance.

IO Subsystem

Here we see information on latch timeouts and long I/O requests. For this particular category the information is surprisingly rather limited and therefore not that useful to us and we would expect to use other methods to investigate further. That said, by using the state and state description columns we can very make quickly make a decision if I/O is an area we want to look into in more detail.

Events

This is a very small subset of the information returned by the session. The part of the process is incredibly in-depth by design and it records and retrieves a lot of information. The downside of this of course is the effort involved in going through it all to find something useful.

Conclusion

Overall sp_server_diagnostics is a very useful process that provides a quick assessment of a SQL instance. Repeat mode enables us to view periodic results which can aid us when we want to investigate the on-going conditions within a SQL Server.

The information though can be returned by using other methods and often in more detail. It does have limitations on important areas like I/O and the wait statistics do not allow any filtering of waits we might want to ignore so in this case we would certainly use alternative options like DMV queries.

The key point is that it is native functionality to SQL Server and it does return useful information very quickly and therefore is a practical addition to a troubleshooting toolkit.

David Alcock
Maintenance, Performance, Stored procedures

About David Alcock

David is a SQL Server professional based in the UK and is the Director and Principal Consultant of DTA I.T. Consultancy Ltd. He specialises in the design, administration, maintenance and optimisation of SQL Server solutions as well as delivering bespoke training courses to organizations. He has over 10 years experience working with SQL Server in different roles such as DBA, Database Developer and BI specialist. He has worked as Technical Lead on numerous mission critical projects in various sectors such as local government, finance, charities and retail. David is extremely passionate about SQL Server and keeps his own blog at http://sqlclarity.blogspot.com/ where he shares his views on the Data Platform. In his spare time he loves spending time with his family, watching movies and cooking Asian cuisine! View all posts by David Alcock

168 Views