Troubleshooting SQL Server CPU

–by Ginger Keys

Recently, a client of mine contacted me in a panic because CPU was running at 99-100% over an extended period of time, which was affecting everything on the machine.

We’ve all seen it at one time or another. And there are several reasons this could be happening. It could be SQL consuming most of the CPU, or it could be something outside of SQL. As a DBA, you don’t want to spend a lot of time troubleshooting high CPU usage if SQL isn’t the culprit. If the issue is caused by processes outside of SQL, you can take a high level look around, but should probably defer that to your Sys Admin or your IT team. If it is SQL that is causing the issues though, we need to dig in deeper.

This article discusses basic CPU troubleshooting. We will take a very ‘high level’ approach to a topic that can potentially get very in-depth and complex. The goal is not only to provide immediate relief but also to help prevent future stress to your CPU caused by SQL.

Is it SQL or System Processes causing high CPU Usage

The first step is to determine if it is SQL or not causing CPU pressure. There are several ways you can make this determination.

1. Task Manager/Resource Monitor

The quick way is to open Task Manager and/or Resource Monitor, and look to see if sqlservr.exe is the culprit. Be cautious when looking at task manager to troubleshoot, as this tool records data once per second. So task manager will record spikes in performance (which is normal), instead of averaging it out over a longer time span (which is a better indicator of stress).

Blog_20151001_1

Blog_20151001_2

2. Perfmon Counters  

You can also determine if SQL is the culprit for high CPU usage by using Performance Monitor (Perfmon). Unlike task manager which records data once/second, you can adjust Perfmon to poll the data every 3-5 seconds to get a better indication of CPU stress. There are four counters you will want to look at:

    • Processor: % Privileged Time (_Total) – measures the amount of time your computer is executing I/O operations and other System services (aka Kernel mode).  If this metric is high (>20%) it’s possible that something like system backups are running, your driver is bad, or large files are being copied into or out of the system.   Either way, it’s time to get your Sys Admin involved.
    • Processor: % User Time (_Total) – measures amount of time your computer is executing user applications, including SQL.  If user time is consuming most of the time, we need to look at what is running on SQL to see what is causing the high CPU usage.
    • Process: % Privileged Time (sqlservr.exe instance) – measures the percentage of elapsed time that the process threads spent executing code in privileged (Kernel/System) mode
    • Process: % User Time (sqlservr.exe instance) – measures the percentage of elapsed time that the process threads spent executing code in user (application/SQL) mode.

Blog_20151001_3

3. Query DMVs

Another method to find out what is causing high CPU usage is to execute the following DMV:

–Get CPU Utilization % from SQL, NonSQL (System), and Idle

DECLARE @ms_ticks_now BIGINT

SELECT @ms_ticks_now = ms_ticks

FROM sys.dm_os_sys_info;

SELECT TOP 10 record_id

,dateadd(ms, – 1 * (@ms_ticks_now – [timestamp]), GetDate()) AS EventTime

,SQL_CPU_Use

,100 – SystemIdle – SQL_CPU_Use AS NonSQL_CPU_Use

,SystemIdle

FROM

(

SELECT record.value(‘(./Record/@id)[1]’, ‘int’) AS record_id

,record.value(‘(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]’, ‘int’) AS SystemIdle

,record.value(‘(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]’, ‘int’) AS SQL_CPU_Use

,TIMESTAMP

FROM

(

SELECT TIMESTAMP

,convert(XML, record) AS Record

FROM sys.dm_os_ring_buffers

WHERE ring_buffer_type = N’RING_BUFFER_SCHEDULER_MONITOR’

AND record LIKE ‘%<SystemHealth>%’

) AS x

) AS y

ORDER BY record_id DESC

(from: http://sqlblog.com/blogs/ben_nevarez/archive/2009/07/26/getting-cpu-utilization-data-from-sql-server.aspx)

The query results break down CPU usage as SQL usage, NonSQL (or System) usage, and Idle. If the SQL_CPU_Use has predominantly higher percentages than NonSQL or Idle, then we know that SQL is causing the pressure on CPU.

Blog_20151001_4

                                                                                                                                                                                

System Process causing high CPU usage

If after your preliminary investigating, you discover that SQL is not causing CPU pressure, but rather it is a system issue, the problem could be any number of things. It’s good to have a general understanding of these other issues, but as a SQL DBA, you may not be able to spend a lot of time troubleshooting these. Some system, or non-SQL issues that could cause CPU stress are:

1. Architecture

2. Virtual Environment

  • Host CPU settings
  • Host processor usage
  • vCPUs allocated among guests
  • # guests
  • # other machines on host, and how busy are they
  • Resource governor settings

3. CPU Power Settings

  • Should be set to high performance on SQL server.  However, this has had inconsistent results, and may or may not make a significant difference.

4. Hardware & Software Interrupts – these should be looked at over a period of time, (at least 5 seconds per sample over 30 minutes or more), and should also be interpreted in relation to a baseline of what’s normal for your system

  • Processor: % Privileged Time (Kernel Time + DPC + Interrupts) over 15-20% consistently is bad.  High Kernel time could indicate windows backups are happening or result from a bad driver
  • Processor: % Interrupt Time – over 5-10% consistently is a problem.  This shows the percentage of time processor spent servicing hardware interrupts.  A high rate could indicate disk or network adapter issues.
  • Processor: % DPC Time – over 5-10% consistently is a problem.  This shows percentage of time processor handles deferred procedure calls.  These are interrupts with a lower priority, and have been deferred to some kind of queue.  Could be software or hardware causing this, and might be necessary to redistribute interrupts or upgrade to faster processor.

5. Antivirus Scans – temporarily disable anti-virus software to rule this out, or locate the antivirus process in task manager/resource monitor.

6. SvcHost.exe – this is a generic process name, and includes individual services that Windows uses to perform various tasks.  There can be multiple occurrences of this process running on your computer at the same time.  

                                                                                                                                                                                

SQL causing high CPU usage

If you determine that it is definitely SQL causing the CPU stress, you first need to find the problem and correct or stop it. After you provide some immediate relief to your system, you then need to troubleshoot the issue so that you can make necessary adjustments and prevent it from happening in the future.

1. sys.dm_exec_requests

First you want to find what in SQL is monopolizing the CPU.  To do this, look at the currently executing queries along with the CPU usage.  Run this script to identify the queries using the most CPU time:

SELECT sessions.session_id

,requests.status

,requests.blocking_session_id

,requests.wait_type

,requests.wait_time / (1000.0) ‘wait_time(sec)’

,requests.cpu_time

,requests.total_elapsed_time / (1000.0) ‘total_elapsed_time(Sec)’

,Substring(sqltext.TEXT, (requests.statement_start_offset / 2) + 1,

((CASE requests.statement_end_offset

WHEN – 1 THEN Datalength(sqltext.TEXT)

ELSE requests.statement_end_offset

END – requests.statement_start_offset  ) / 2

) + 1) AS statement_text

,requests.command

,sessions.login_name

,sessions.host_name

,sessions.program_name

FROM sys.dm_exec_sessions AS sessions

INNER JOIN sys.dm_exec_requests AS requests ON requests.session_id = sessions.session_id

CROSS APPLY sys.dm_exec_sql_text(requests.sql_handle) AS sqltext

WHERE requests.session_id != @@SPID

ORDER BY requests.cpu_time DESC

 

When you identify the query consuming the most CPU time, you should be able to observe the TSQL command that SQL is processing, the user issuing the command, and determine whether it can be safely terminated or not.

2. sys.dm_os_wait_stats

This provides historical (from the time that the SQL instance started), cumulative data and records waits related to the workload of your server.  We can use this DMV to investigate the most prevalent wait types over a period of time, and specifically see CPU waits (signal time) which might indicate CPU pressure.  Since this data is cumulative, current waits might not be evident and adjacent samples should be differenced to determine actual resource and signal wait times.  It is best to track these statistics over time and establish a baseline on which to compare and look for anomalies.

3. Activity Monitor

Using Activity Monitor in SSMS displays both DMVs above in an easy to read report, but can create a performance hit to run it…maybe not a good idea if your CPU is already under extreme stress.

Blog_20151001_5

4. Stored Proc / Query

    After you have taken care of the query that was causing the CPU pressure, you can run the following to get recent CPU utilization by query.  This will show the most expensive queries that have already completed and their execution plans, so that you can check for ways to prevent them from beating up your CPU in the future.

— Get CPU utilization by query

SELECT TOP 10 st.text

,st.dbid

,st.objectid

,qs.total_worker_time

,qs.last_worker_time

,qp.query_plan

FROM sys.dm_exec_query_stats qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st

CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp

ORDER BY qs.total_worker_time DESC

(from: http://sqlskills.com/blogs/glenn)

Some of the steps you might need to take in order to improve performance for these queries are:

  • Check execution plan and tune query.  For a great explanation click here https://blog.sqlrx.com/2015/03/16/introduction-to-sql-server-execution-plans/
  • Update statistics on tables called – statistics are what SQL uses to decide how to execute your query in the most efficient way.
  • Identify high I/O queries
  • Create/rebuild indexes – be sure not to create indexes without thoroughly testing first!

5. Database

    To find out if one or more of your databases is predominantly causing CPU pressure, the following query will help determine which database(s) has used the most CPU resources on your instance:

— Get CPU utilization by database

WITH DB_CPU_Stats

AS

(SELECT DatabaseID, DB_Name(DatabaseID) AS [Database Name], SUM(total_worker_time) AS [CPU_Time_Ms]

FROM sys.dm_exec_query_stats AS qs

CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID]

FROM sys.dm_exec_plan_attributes(qs.plan_handle)

WHERE attribute = N’dbid’) AS F_DB

GROUP BY DatabaseID)

SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [CPU Rank],

[Database Name], [CPU_Time_Ms] AS [CPU Time (ms)],

CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPU Percent]

FROM DB_CPU_Stats

WHERE DatabaseID <> 32767 — ResourceDB

ORDER BY [CPU Rank] OPTION (RECOMPILE);

(from: http://sqlskills.com/blogs/glenn)

This information should be gathered over a period of time, to determine if it is a regular occurrence. If you see that a database is consistently using most of the CPU resources, you can

  • Update statistics – to ensure SQL is executing queries in the most efficient way
  • Create/rebuild indexes – to help eliminate fragmentation and forwarded records, as these cause  SQL to work harder to find records

6. Excessive compilations/recompilations

When a query is submitted to SQL, it has to create or reuse a query plan. Creating a query plan can be very CPU intensive, so reusing them is easier on the CPU. Query plans may have to be recompiled if the object schema changes, or if the statistics change on the underlying table or database.

You can view these metrics through Perfmon using these counters:

  • SQLServer:SQL Statistics: SQL Compilations/sec
  • SQLServer:SQL Statistics: SQL Re-Compilations/sec

The ideal value for Re-compilations per second is zero.  And the ideal value for Compilations is less than 25% of SQLServer:SQL Statistics: Batch Requests/sec.

Blog_20151001_6

If your compilations or recompilations are at a higher than optimal number, consider the following:

  • Optimize for ad-hoc
  • Enable parameterization – Query plan reuse is critical for optimal performance on highly transactional systems and it is most often achieved by parameterizing your queries or stored procedures.
  • Avoid statistics based recompilations – by using KEEP PLAN hint or KEEPFIXED PLAN hint

7. SQL Server Configuration

    Making sure your SQL server is configured for optimal performance and best practices can make a huge difference in performance.  Some of the things you can look at are:
    • Max Degree of Parallelism – if you are seeing a lot of CXPACKET wait types, you may need to adjust your MAXDOP.
    • Affinity Mask – this may or may not make a big difference
    • Priority Boost (don’t enable) –this too, may or may not make a big difference
    • Max worker threads – adjusting this could help, but is beyond the scope of this article.
    • File organization on underlying disks – choice of RAID array, and the organization and placement of file types make a big difference in the performance of your server

                                                                                                                                                                                

Conclusion

Troubleshooting CPU issues can be a very painstaking process as there are multiple combinations of issues that can be causing stress. The methods outlined above provide a basic approach to troubleshooting CPU pressure. It could be hardware, software, system, SQL, web, network, and a host of other problems, as well as a complicated combination of these! As a DBA, your main concern is to find and correct any SQL related issues causing this pressure on the CPU.

If you would like assistance configuring, maintaining, or monitoring your SQL Server and instances please contact us at SQLRx@sqlrx.com. We would be happy to answer any question you have, and would be interested to hear about your SQL experiences as well!

                                                                                                                                                                                

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. We love to talk tech with anyone in our SQL family!

Both comments and trackbacks are currently closed.
%d bloggers like this: