Monthly Archives: January 2016

SQL Server 2012/2014 Extended Events for Developers – Part 1

— By Jeffry Schwartz

Microsoft introduced Extended Events, a.k.a. XEvents, in SQL Server 2008 to aid in the analysis of performance problems. Both the capabilities and syntax were changed significantly on SQL Server 2012, and new (and very important) features were added in SQL Server 2014. Therefore, the focus of this document will be on XEvent usage with SQL Server 2012 and 2014. Specifically, it will concentrate on how developers can use XEvents to evaluate the performance of their queries and stored procedures regardless of how they are invoked, e.g., from a user interface.

Creating and Starting an XEvent Session

The first action is to define and create an XEvent session. One can think of a session as a definition of the events that will be captured along with any filtering that may be appropriate for each event. Once defined, an analyst can simply start and stop the session without specifying anything else. This makes it ideal for developers in a shared environment because they can establish a tailored definition once and then use it repeatedly for arbitrary periods. The code below deletes a pre-existing session named DevPerformanceMonitoring, and establishes a new session definition.

USE master

GO

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name=‘DevPerformanceMonitoring’)

DROP EVENT SESSION DevPerformanceMonitoring ON SERVER

GO

CREATE EVENT SESSION DevPerformanceMonitoring ON SERVER

ADD EVENT sqlserver.rpc_completed(

SET collect_statement=(1)

ACTION(sqlserver.database_id, sqlserver.database_name, sqlserver.sql_text, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.server_principal_name, sqlserver.query_hash, sqlserver.query_plan_hash)

WHERE ([duration]>(10000) AND database_id = 9) — SQL won’t tolerate a variable in here…ugh!

),

ADD EVENT sqlserver.sp_statement_completed(

SET collect_statement=(1), collect_object_name=(1)

ACTION(sqlserver.database_id, sqlserver.database_name, sqlserver.sql_text, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.server_principal_name, sqlserver.query_hash, sqlserver.query_plan_hash)

WHERE ([duration]>(10000) AND database_id = 9)

),

ADD EVENT sqlserver.sql_batch_completed(

ACTION(sqlserver.database_id, sqlserver.database_name, sqlserver.sql_text, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.server_principal_name, sqlserver.query_hash, sqlserver.query_plan_hash)

WHERE ([duration]>(10000) AND database_id = 9)

),

ADD EVENT sqlserver.sql_statement_completed(

SET collect_statement=(1)

ACTION(sqlserver.database_id, sqlserver.database_name, sqlserver.sql_text, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.server_principal_name, sqlserver.query_hash, sqlserver.query_plan_hash)

WHERE ([duration]>(10000) AND database_id = 9)

)

ADD TARGET package0.event_file(SET FILENAME= N’D:\SQLRx\SQLScriptOutput\DevPerf.xel’)

WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF

 

This session monitors RPC completed, SP statement completed, SQL batch completed, and SQL statement completed events whose duration is longer than ten milliseconds (10,000 microseconds) and whose database ID is 9. It stores the information in a file that will reside in the D:\SQLRx\SQLScriptOutput directory, and has the DevPerf prefix as well as the .XEL extension. Note: This directory MUST exist before starting the session or it will fail. Note: Unfortunately, as noted in the comments for the rpc_completed event (and still true as of SQL Server 2014 SP1), variables cannot be used in WHERE clauses. This causes the repetitive filtering code shown above. Should one desire filtering based upon user login, this should be specified as one of the filters in the where clauses, e.g., server_principal_name = (‘<instance>\<login>).

To start this session, the user must issue the following command: ALTER EVENT SESSION DevPerformanceMonitoring ON SERVER STATE = START

To stop the session, the user must issue the following command: ALTER EVENT SESSION DevPerformanceMonitoring ON SERVER STATE = STOP

The actual file names will be DevPerf<large integer value>.xel where <large integer value> is the number of milliseconds between the date and time that the file is created and January 1, 1600 as shown below:

Blog_20160128_1

Obtaining Performance Data Directly from an XEvent XEL File

SSMS Method

Once the events have been captured to a file by starting and then stopping a session, the next step is to query the data. One can use two methods for this: open the XEL file directly from SSMS or use a T-SQL query to read it. The first method is a bit limited because the developer cannot perform searches easily and its output is shown below. Note: times are in microseconds.

Blog_20160128_2

T-SQL Method

The second method is a bit daunting at first, but once a developer has an example from which to work, the task is reasonable, and with practice, relatively easy. The following script illustrates how to navigate the XEL file DIRECTLY using the sys.fn_xe_file_target_read_file function and begin extracting the information, for which XML format is used. The first query lists the event names along with their associated timestamps and XML. It uses the cross apply operator to gain access to the event_data XML column so the event’s name and timestamp can be extracted. This query illustrates the method by which someone new to the process can understand what is contained within the event_data XML, and it also demonstrates how to extract single data elements from the XML.

The handling of the @timestamp node merits further discussion. Event_data timestamps are expressed in UTC time, so one needs to adjust them to understand exactly when things happened. The code in the script that makes it time zone agnostic and automatically determines what the adjustment from UTC time should be in order to be correct for the current time zone.

The second query uses the same mechanism to summarize the number and names of all the events contained within a particular file. The output for both select statements immediately follows the script. The developer can click on the xml_event_data within SSMS and view the entire XML data set.

DECLARE @TraceDir varchar(512)

DECLARE @XeventsTraceName varchar(512) = ‘DevPerf_0_130966819185620000.xel’

SET @TraceDir = ‘D:\SQLRx\SQLScriptOutput\’

— dump entire XEL file so we can see what nodes are present

SELECT event_xml.value(‘(./@name)’, ‘varchar(128)’) AS EventName,

DATEADD(hh,

DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP),

event_xml.value(‘(./@timestamp)[1]’, ‘datetime2’)

) AS PostedTimestamp,

xml_event_data

FROM

(SELECT CAST(event_data AS XML) AS xml_event_data

FROM sys.fn_xe_file_target_read_file(@TraceDir + @XeventsTraceName, null, null, null)

) AS event_table

CROSS APPLY xml_event_data.nodes(‘//event’) EventDataNodes (event_xml);

— dump entire XEL file summarizing by event name so we can be sure we have reported all the ones that have been captured

WITH EventDetails AS (

SELECT event_xml.value(‘(./@name)’, ‘varchar(128)’) AS EventName

FROM

(SELECT CAST(event_data AS XML) AS xml_event_data

FROM sys.fn_xe_file_target_read_file(@TraceDir + @XeventsTraceName, null, null, null)

) AS event_table

CROSS APPLY xml_event_data.nodes(‘//event’) EventDataNodes (event_xml)

)

SELECT EventName, COUNT(*) AS [Count]

FROM EventDetails

GROUP BY EventName

ORDER BY EventName

 

Blog_20160128_3

Obtaining XEvent Record Details

The next script uses the same method as the previous one. The primary differences are that it extracts more data from the XML and uses a common table expression to simplify the inclusion of certain events and the exclusion of certain SQL statements in which we are not interested. The where EventName in (@EventNameToBeSearchedFor1, @EventNameToBeSearchedFor2, @EventNameToBeSearchedFor3, @EventNameToBeSearchedFor4) and SQLStatement not like ‘%DefaultFileStreamFileGroup%’ and SQLStatement not like ‘%CollationVersion%’ portion accomplishes this filtering. The output appears below the script and lists the events along with their duration, CPU time, last row count, physical reads, logical reads, writes, and the text of the SQL statement.

The reader should note the action and data references that are highlighted in the code below. Action items are generally included with every XEvent record (sometimes referred to as global columns), whereas data items contain information specific to an event, e.g., sp_statement_completed. In order to augment the script below, one must understand the differences between actions and data items.

Not all events utilize the same keywords, so the “case when” code below handles the differences to extract the SQL statement from the appropriate node. Another reason for using the common table expression was to make the NOT LIKE operator function simply and as desired. When the NOT LIKE operator was used in a where clause directly with the XML, the code was very clumsy and difficult to achieve the desired results. Placing the XML portion in a common table expression with the where clause on the outside greatly simplified the code and enabled it to operate correctly.

DECLARE @EventNameToBeSearchedFor1 VARCHAR(40)

DECLARE @EventNameToBeSearchedFor2 VARCHAR(40)

DECLARE @EventNameToBeSearchedFor3 VARCHAR(40)

DECLARE @EventNameToBeSearchedFor4 VARCHAR(40)

DECLARE @TraceDir VARCHAR(512)

DECLARE @NumMinutesUTCDiff INT = DATEPART(tz,SYSDATETIMEOFFSET())

DECLARE @XeventsTraceName VARCHAR(512) = ‘DevPerf_0_130966819185620000.xel’

 

SET @TraceDir = ‘D:\SQLRx\SQLScriptOutput’

SET @TraceDir = @TraceDir + ‘\’

SET @EventNameToBeSearchedFor1 = ‘sp_statement_completed’

SET @EventNameToBeSearchedFor2 = ‘sql_statement_completed’

SET @EventNameToBeSearchedFor3 = ‘sql_batch_completed’

SET @EventNameToBeSearchedFor4 = ‘rpc_completed’

— Parse XML from the file to show column-wise output

— use CTE to simplify the where clause like conditions

;

WITH ParsedXMLData AS (

SELECT

event_xml.value(‘(./@name)’, ‘varchar(128)’) AS EventName,

DATEADD(hh, @NumMinutesUTCDiff, event_xml.value(‘(./@timestamp)’, ‘datetime’)) AS PostedTimestamp,

event_xml.value(‘(./action[@name=”database_name”]/value)[1]’, ‘varchar(128)’) AS DatabaseName,

OBJECT_NAME(CAST(event_xml.value(‘(./data[@name=”object_id”]/value)[1]’, ‘int’) AS INT), CAST(event_xml.value(‘(./data[@name=”source_database_id”]/value)[1]’, ‘int’) AS INT)) AS ObjectName,

event_xml.value(‘(./data[@name=”object_type”]/text)[1]’, ‘varchar(25)’) AS ObjectType,

CAST(event_xml.value(‘(./data[@name=”duration”]/value)[1]’, ‘bigint’) AS BIGINT)/1000000.0 AS DurationSecs,

CAST(event_xml.value(‘(./data[@name=”cpu_time”]/value)[1]’, ‘bigint’) AS BIGINT)/1000000.0 AS CPUTimeSecs,

event_xml.value(‘(./data[@name=”row_count”]/value)[1]’, ‘int’) AS [RowCount],

event_xml.value(‘(./data[@name=”last_row_count”]/value)[1]’, ‘int’) AS LastRowCount,

event_xml.value(‘(./data[@name=”physical_reads”]/value)[1]’, ‘bigint’) AS PhysicalReads,

event_xml.value(‘(./data[@name=”logical_reads”]/value)[1]’, ‘bigint’) AS LogicalReads,

event_xml.value(‘(./data[@name=”writes”]/value)[1]’, ‘bigint’) AS Writes,

event_xml.value(‘(./data[@name=”nest_level”]/value)[1]’, ‘int’) AS NestLevel,

CASE WHEN event_xml.value(‘(./data[@name=”statement”]/value)[1]’, ‘varchar(max)’) IS NULL THEN

event_xml.value(‘(./data[@name=”batch_text”]/value)[1]’, ‘varchar(max)’) ELSE

event_xml.value(‘(./data[@name=”statement”]/value)[1]’, ‘varchar(max)’) END AS SQLStatement,

event_xml.value(‘(./action[@name=”sql_text”]/value)[1]’, ‘varchar(max)’) AS SQLText,

event_xml.value(‘(./data[@name=”source_database_id”]/value)[1]’, ‘int’) AS source_database_id,

event_xml.value(‘(./data[@name=”object_id”]/value)[1]’, ‘int‘) as object_id,

event_xml.value(‘(./data[@name=”object_type”]/value)[1]’, ‘varchar(25)’) AS object_type

FROM

(SELECT CAST(event_data AS XML) AS xml_event_data

FROM sys.fn_xe_file_target_read_file(@TraceDir + @XeventsTraceName, null, null, null)

) AS event_table

CROSS APPLY xml_event_data.nodes(‘//event’) EventDataNodes (event_xml)

)

SELECT *

FROM ParsedXMLData

WHERE EventName IN (@EventNameToBeSearchedFor1, @EventNameToBeSearchedFor2, @EventNameToBeSearchedFor3, @EventNameToBeSearchedFor4)

AND SQLStatement NOT LIKE ‘%DefaultFileStreamFileGroup%’

AND SQLStatement NOT LIKE ‘%CollationVersion%’

 

Blog_20160128_4

Summary

This article covered how to create, start, and stop XEvents sessions using T-SQL and report the output using either SSMS or T-SQL. SSMS provides a convenient method for viewing the data quickly, but is difficult to use for searches of any kind or any additional filtering. T-SQL appears to be difficult at first because of the XQuery language, but once one becomes accustomed to the XML layout, obtaining specific pieces of data is reasonably easy. More advanced subjects such as collecting query_post_execution_showplan events and connecting these plans to other completed events will be covered in subsequent articles.

Jan 2016 Tip of the Month: sys.dm_server_memory_dumps

January 2016 Tip of the Month:  Using sys.dm_server_memory_dumps to find SQL memory dumps.

When SQL experiences a serious issue it will often dump the contents of its memory to disk. This is known as a memory dump and it does generate files that are usually located in the Logs folder. I often find that many SQL DBA’s are unaware that SQL may be experiencing issues until they start seeing some scary looking messages in their error logs. It is possible for SQL to have a serious issue but not bad enough to cause it to shut down. I have also found that the memory dump files can be quite large and may need to be cleaned up periodically. If you query the dynamic management view sys.dm_server_memory_dumps, you will get a list of memory dumps along with the location of the files.

SELECT [filename], creation_time, size_in_bytes

FROM sys.dm_server_memory_dumps

Blog_20160122_1

If you find that SQL is producing memory dumps, you should be alarmed and may need to send the files off to Microsoft for evaluation. You can find more on the sys.dm_server_memory_dumps on MSDN at: https://msdn.microsoft.com/en-us/library/hh204543.aspx

 

Repair Suspect Database – SQL Server

– -by Ginger Keys

Occasionally mistakes happen, things get deleted, or hardware fails. Your end users may be complaining that the application is not working or pulling up any data… or you may open SQL Management Studio and see that the database is simply unavailable. What happened!?

Blog_20160121_1

I have encountered a situation where an inexperienced IT employee saw that a drive was running out of space, and decided to delete the database transaction log file, thinking that it was only an activity log that wasn’t important. On the contrary, transaction log files are what keeps data in the database transactionally consistent and durable. And unfortunately, sometimes hardware failures can cause the database files to become corrupt. When your transaction log file (ldf) of your database is deleted or corrupted, it makes your database unusable.

In this article we will…

  1. Make a database suspect, in a test environment
  2. Diagnose the problem
  3. Resolve the problem

 Create an Un-usable Database

It’s always a good idea to plan for and practice dealing with IT problems before they actually occur. For demonstration and practice purposes, I will show how to cause a test database to be in a SUSPECT or unusable condition.   For testing and practice in your environment, make sure you are using a test database that is un-essential and on an instance that is not used for production.

1. First step is to take backup of the database

2. Next, make changes to your data. You can either manually insert, update, or delete record(s) in your database, or there are many scripts available to insert random data into a database table. Here is one link for inserting random data:

https://blog.sqlrx.com/2015/05/06/generate-random-column-values-to-populate-a-table-for-testing/

 3. After making changes to your data, stop the SQL Server Service. You can do this by right clicking on your instance in SSMS, and selecting ‘Stop’

4. After the service is stopped, you can delete the log (ldf) file. Go to Windows explorer to the location where the ldf files reside, right click on your test database_log.ldf file and Delete.

Blog_20160121_2

5. After deleting the ldf file, restart SQL Server Service. You will now see that your database is still in the list, but is not available.

Blog_20160121_3

What’s the Problem?

First step is to figure out what is wrong with the database. Why is it unavailable? The SQL Server Logs is a good starting point to find out what happened to the database. Looking at our logs in this example, we find that the database log file is missing:

Blog_20160121_4

To find out what state the database is in, run the following query:

SELECT name, state_desc

FROM sys.databases

WHERE state_desc <> ‘online’

 The database will either show as SUSPECT, IN_RECOVERY, or RECOVERY_PENDING.

Blog_20160121_5

How to Fix it!

The best solution is to restore your database from the most recent backup. However that is not always possible. Your backup files may be corrupt or inaccessible. Or you may simply not have a backup plan in place (gasp!).

If your backup files are lost or not available, your last resort option is to force SQL Server to rebuild the lost transaction log file. But this means that any changes to data since the last backup will be lost.

1. To rebuild the log file, first put the database into EMERGENCY mode:

USE master

GO

ALTER DATABASE MyTestDB

SET EMERGENCY

GO

Blog_20160121_6

2. Next, put the database into SINGLE USER mode:

ALTER DATABASE MyTestDB

SET SINGLE_USER WITH ROLLBACK IMMEDIATE

GO

3. Now try and force a transaction log to be built for the database. This can be done with either the ALTER DATABASE REBUILD command, or the DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS command.

To use the ALTER DATABASE REBUILD command, get the name of the log file and its location:

SELECT d.name AS dbname, m.name AS logicalname, m.physical_name

FROM sys.master_files m

JOIN sys.databases d ON (d.database_id = m.database_id)

WHERE m.type = 1 – – Log

ORDER BY 1, 2

 Blog_20160121_7

 Then execute the ALTER DATABASE REBUILD command with the correct file name and location:

 ALTER DATABASE [MyTestDB] REBUILD LOG ON

(NAME= MyTestDB_log, FILENAME=‘R:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\MyTestDB_log.ldf’) – -substitute the location of your log file here

GO

 If you prefer not to use the ALTER DATABASE command, using the DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS command will also rebuild your log file (with the same result – data loss).

 DBCC CHECKDB (MyTestDB, REPAIR_ALLOW_DATA_LOSS)

GO

 4. Finally, run this command to set the database back to MULTI USER mode:

 ALTER DATABASE MyTestDB

SET MULTI_USER

GO

 Conclusion

Once you have rebuilt your log file using either the ALTER DATABASE REBUILD option, or the DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS option, there is no way to go back. All the changes made to the data since the last backup were lost when the log file got deleted or corrupted, and there is no way to recover those transactions. This is because for some reason the database was not shut down cleanly, and all the transactions (that were in the log file) didn’t get written to disk. And if the log file gets deleted or corrupted, SQL cannot not run restart-recovery. Other reasons that a database might not shut down cleanly is because of power failure, hardware failure, corruption, insufficient disk space, or insufficient memory.

It is extremely important to not only have a reliable (and tested) backup/recovery plan in place, but also to know what amount of data loss is acceptable for your organization. This will help you determine your recovery mode and tlog backup frequency for each of your databases.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

SQL Server Audit Feature

— By Ginger Keys

Once upon a time there was an organization that depended on very accurate data entry rules.  For various reasons, some of the employees who were responsible for inputting and updating records were not very diligent about following rules to keep the data clean and uniform, reasoning that IT would go behind them and fix their mistakes.  True story.  Unsurprisingly, nobody claimed responsibility and the messy data continued to plague the databases.   Oh, if only the IT staff had known about and understood SQL Auditing…  There are also situations in which organizations are required by government regulations to audit changes to data.

So fast-forward to SQL 2008 and beyond, and a nice feature called SQL Server Audit is available.  If you need to know who has made changes to your database, your server, or objects, this tool will allow you to store and view the changes made.  SQL Audit is based on the extended events framework and events are recorded as soon as the permission-check for that event occurs.   Server level audits record events at the server level, such as creation or deletion of databases, logins, or linked servers, etc.  Database level audits record events at the database level, such as inserts, updates, stored procedure executions, dropping tables, etc.  Database level auditing is only available with Enterprise, Evaluation, and Developer editions.

SQL Server Audit is different from Change Data Capture (CDC) in multiple ways, including the fact that CDC does not record the user who made changes.  CDC was not meant for auditing, but rather provides historical information for changed data in a table.  CDC is not intended as an auditing tool, but more for supporting ETL processes, the features of which I will cover in another article.

The following steps demonstrate how to create a SQL Server Audit:

 1. Create the Audit

To create an audit for a database, connect to your instance through SSMS (SQL Server Management Studio) and expand the Security folder.  Right click on Audits and select New Audit…

Blog_20160114_1

There are several settings to configure for your audit.  Give your audit a Name, and select an Audit destination.  I have named the audit, selected a file location, and set the max rollover files to 15, and the max file size to 10MB as shown below.  (You should select your max rollover files and max file size dependent on your environment, disk space available, etc.)

Blog_20160114_2

The output of the audit (called the target) can be saved as event logs, or as a binary file.

Blog_20160114_3

If you choose to output your audit information to a Security Log or Application Log, these targets can be retrieved from either Windows Event Viewer (in Administrative tools on your server), or in View Audit Logs in SSMS (by right clicking on your audit located in the Security > Audits folder).  Special permission is required to write to the Windows Security Log…for more information, see: https://technet.microsoft.com/en-us/library/cc645889%28v=sql.105%29.aspx

Blog_20160114_4

If you choose a File location, you must specify the location of your file.  Depending on the events you decide to audit, the target files can potentially grow very large.  So be sure to review and archive them on a regular basis, and also be sure to limit access to these files only to privileged users.

 2. Create the Database Audit

After creating the audit in the Security folder of your instance, go to the Database you want to audit, open the Security folder, right click on Database Audit Specifications and select New Database Audit Specification…

Blog_20160114_5

To create audit specifications for your database, create a Name, select the Audit you previously created from the drop down box, and select the Audit Action Types:

The following provides a description of database-level audit action groups you can choose to specify what you want to audit in your database –  https://msdn.microsoft.com/en-us/library/cc280663.aspx

Blog_20160114_6

 3. Enable the Audit

After creating your Audit (under the Security folder) and Audit Specifications (under the Databases folder), you must enable both by right clicking on each, and selecting Enable

Blog_20160114_7

4. Verify the Audit

To test the audit, I executed both an UPDATE and DELETE statement on two different tables in my ABCompany database:

 

 USE ABCompany

GO

 

UPDATE Customers

SET contacttitle = ‘Marketing Mgr’

WHERE contacttitle = ‘Marketing Manager’

GO

 

DELETE FROM Orders

WHERE orderid = 10618

GO

To check the audit file to see who made what changes, we can run the following statement:

SELECT

LEFT(event_time, len(event_time) – 11) as date_time,

action_id,

succeeded,

session_id,

class_type,

session_server_principal_name,

server_principal_name,

server_instance_name,

database_name,

object_name,

statement

FROM sys.fn_get_audit_file

(

‘R:\Testing\*.sqlaudit’,default,default  –substitute your file location here

)

GO

The results show the date/time, the login name, database name, object name, the statement executed, along with other data:

Blog_20160114_8

For a complete description of all the columns in the sys.fn_get_audit_file function, click here https://msdn.microsoft.com/en-us/library/cc280765.aspx

Conclusion

SQL Auditing is a valuable feature that allows DBAs to enhance security and compliance, allowing organizations to log and track changes to data made by individual users and roles.  This article discussed a basic Database level audit.  For more in-depth information and to learn about Server level audits, go here https://msdn.microsoft.com/en-us/library/cc280386.aspx .

If you would like assistance configuring, maintaining, or monitoring your SQL Server and instances please contact us at SQLRx@sqlrx.com. We would be happy to answer any question you have, and would be interested to hear about your SQL experiences as well!  Visit us at www.sqlrx.com.

Disk Metrics – Why PerfMon’s, SQL Server’s Filestats, and Disk Vendors’ Values Sometimes Appear to Disagree

— By Jeffry Schwartz

Whenever multiple data sources are used, actual or perceived discrepancies can arise. Sometimes this is due to the location of the data collector, e.g., Windows, SQL Server, or the disk array. Other times, it is due to the collection rate and whether the information is a simple recording of continuously incremented data or it is an instantaneous sample. True sampling presents obvious issues because its accuracy is related directly to the sample rate. Clearly, the more often data is sampled, the more precisely it will represent actual behavior. However, this improved accuracy comes with the price of adding pressure to the disk, thereby distorting the picture of what the disk LUN is actually doing.

Disk Vendors

By default some disk array vendors sample disk performance data infrequently unless a more detailed view is required. This is important to know because this kind of sampling can miss spikes in activity completely, e.g., SQL Server checkpoints, which can be extremely write-intensive. As a result, a disk array that performs poorly during database checkpoints, but otherwise is effectively idle, will appear to be functioning well unless one of the samples happens to coincide with a checkpoint. Although the primary difference involves the sampling rate, the fact that the disk array measurements do not include the Windows I/O stack sometimes causes disk vendors to be suspicious of any Windows I/O metrics, and often suggest that either Windows’ I/O handling or its instrumentation is faulty. The author has not observed this.

Perfmon

Blog_20160107_1

PerfMon and similar programs that rely upon Windows performance counters retrieve the averages for I/O metrics from the disk drivers themselves, i.e., from the code inside Windows that actually handles the I/Os. The “sampling” rate simply determines how often the averages are recorded (and reset for the next interval). Therefore, these are not true samples because the disk driver constantly updates its statistics and the collector periodically retrieves the averages. As a result, the disk driver will detect intermittent activities such as checkpoint behavior. The primary difficulty with these measurements is that any summarization of the data inevitably involves averaging averages. For example, assume that the collector retrieves the I/O data from the driver every 30 seconds and one needs to summarize disk behavior over a 15-minute period. In this scenario, the period average will be the average of 30 averages. The statistical literature is replete with examples of situations when this is sufficiently accurate, as well as those when it is not. Regardless, it is easy to understand why the averaging of averages is preferable to the alternative of highly frequent true sampling.

SQL Server FileStats

Blog_20160107_2

SQL Server captures I/O performance data differently. It records the times at which an I/O request is initiated and completed. It is important to note that in order to complete an I/O, the thread handling the I/O must pass through the processors’ ready queue and become active on a processor so the I/O completion code can be executed. If the processors are sufficiently busy, a delay will occur while the thread waits for a processor. Although SQL Server measures these kinds of delays for an entire instance, it does not expose them via the sys.dm_io_virtual_file_stats DMV. Therefore, differences between PerfMon’s and SQL Server’s measurements can be due to processor delays or the comprehensive nature of the SQL Server data, or the averaging of averages for PerfMon data. Unfortunately, there is no way to be certain without obtaining extremely detailed data, which, as cited previously, will distort any measurements. Note: if the SQL Server I/O metrics are captured and averaged every 30 to 60 seconds, the difficulties with averaging averages will be present in this data as well. Therefore, the only way to avoid this issue is to subtract the first snapshot’s values from those of the second and compute the differences. This method was used to compute the SQL Server metrics in the table below.

Differences

A simple example, whose data appears in the table below, helps illustrate the differences and similarities between PerfMon and SQL Server. The rightmost two columns illustrate the percentage and absolute differences for each pair of metrics. The Pct Diff SQL Server vs. PerfMon column highlights the percentage differences, whereas the last column lists the absolute differences. The read, write, and I/O times are in milliseconds, so clearly, the two disk LUNs shown below were under extreme duress during the data collection period, and reads and writes performed very poorly. Good read and write times should be 20 milliseconds or less, but the best time shown in the table below is 548 milliseconds. The similarities between the two data sources are notable because all of the rate values are close, although there is more variation in the response times, in particular for writes on LUN 10 O. Closer inspection reveals that LUN 09 N’s response metrics are very close and that the 10 O LUN accounts for most of the differences. However, despite the cited differences, both data sources support the conclusion that the disk LUNs performed very poorly.

Blog_20160107_3

In summary, despite the significant differences in the origins of the PerfMon and SQL Server I/O metrics, both sets of metrics agree sufficiently and they also correctly detect I/O performance issues. Differences can result from the averaging of averages issues for PerfMon data, or processor-related delays and the comprehensive nature of the SQL Server I/O metrics. Regardless, either of these usually provides a more accurate representation of disk behavior than the infrequent sampling methods used by some disk vendors.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRX@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. We love to talk tech with anyone in our SQL family!