Tag Archives: Jeff Schwartz

Query Tuning and Missing Index Recommendations

–By Jeffry Schwartz

Overview

Most analysts are familiar with missing index recommendations provided by SQL Server when query plans are displayed within SSMS or reported by various missing index DMVs. Several questions arise concerning these recommendations:

  1. What determines whether a column becomes a key or included in the new index?
  2. Where does the ordering of the included columns come from? Does column ordering in queries affect the recommendations, e.g., will query orderings of A, B, C and C, B, A in the selection column list result in the one or two recommendations?
  3. How sophisticated are the missing index recommendations, i.e., does SQL Server compare recommendations and perform any kind of recommendation consolidation or optimization?

This article provides a reproducible example using six different queries that deliberately caused SQL Server to generate missing index recommendations and provide answers to these questions. The example is used to study missing index recommendations in detail, including how they relate to the underlying table and how query columns affect these recommendations. The article also illustrates how a single consolidated index can address the performance needs of all six queries.

Test Table Creation & Load

To determine missing index recommendation behavior, a generic table was constructed and filled with 20 million records. Each record contained an identity column, an ID column, a text column, and 47 metric columns whose values ranged between 1 and 10,000,000. The large number of table columns was used to insure SQL Server would choose an index option when appropriate. Six queries that incorporated various column combinations were executed (some of which differed only in column ordering). To minimize duplication of column values and skewing of query plans, the ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000 formula was used to generate values that were as random as possible. Two indices were created: a clustered index that used the identity column as its only key and a second nonclustered index that used DupID as its only column. The scripts for the creation, loading, and initial indexing of the table are shown below.

— ##############################################################

— Create test table

— ##############################################################

drop table FewDuplicates;

CREATE TABLE FewDuplicates (

IDCol bigint identity (20000000,1),

DupID bigint,

MyText varchar(10),

Metric01 bigint, Metric02 bigint, Metric03 bigint, Metric04 bigint, Metric05 bigint,

Metric06 bigint, Metric07 bigint, Metric08 bigint, Metric09 bigint, Metric10 bigint,

Metric11 bigint, Metric12 bigint, Metric13 bigint, Metric14 bigint, Metric15 bigint,

Metric16 bigint, Metric17 bigint, Metric18 bigint, Metric19 bigint, Metric20 bigint,

Metric21 bigint, Metric22 bigint, Metric23 bigint, Metric24 bigint, Metric25 bigint,

Metric26 bigint, Metric27 bigint, Metric28 bigint, Metric29 bigint, Metric30 bigint,

Metric31 bigint, Metric32 bigint, Metric33 bigint, Metric34 bigint, Metric35 bigint,

Metric36 bigint, Metric37 bigint, Metric38 bigint, Metric39 bigint, Metric40 bigint,

Metric41 bigint, Metric42 bigint, Metric43 bigint, Metric44 bigint, Metric45 bigint,

Metric46 bigint, Metric47 bigint

)

 

— ##############################################################

— Load original table

— ##############################################################

declare @DupID bigint = 1

declare @NumRecs bigint = 20000000

 

truncate table FewDuplicates

set nocount on

while (@DupID <= @NumRecs)

begin

insert into [dbo].[FewDuplicates] (

[DupID], [MyText],

[Metric01], [Metric02], [Metric03], [Metric04], [Metric05], [Metric06], [Metric07],

[Metric08], [Metric09], [Metric10], [Metric11], [Metric12], [Metric13], [Metric14],

[Metric15], [Metric16], [Metric17], [Metric18], [Metric19], [Metric20], [Metric21],

[Metric22], [Metric23], [Metric24], [Metric25], [Metric26], [Metric27], [Metric28],

[Metric29], [Metric30], [Metric31], [Metric32], [Metric33], [Metric34], [Metric35],

[Metric36], [Metric37], [Metric38], [Metric39], [Metric40], [Metric41], [Metric42],

[Metric43], [Metric44], [Metric45], [Metric46], [Metric47]

)

VALUES (

@DupID,‘my text’,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000

)

set @DupID += 1

end — group option loop

set nocount off

 

— ##############################################################

— Create indices on the test table

— ##############################################################

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

 

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

 

Queries & Execution Plans with Missing Index Recommendations

The six queries all performed a range scan based upon DupID and Metric01. Clearly, the ix_DupID index could be used for the first portion of the where clause, but none of the existing indices could assist in the resolution of the second portion of the where clause. Note: All queries used identical where clauses to insure that the only query differences involved the columns that were requested in the select clause. These variations employed different combinations of the first six metric columns included a variety of column orderings. All of the queries returned the same 809 rows. Note: Due to the random nature of the data, the reader will not obtain identical results, but they will be functionally similar. Each query was run separately after the following commands had been executed:

dbcc dropcleanbuffers with no_infomsgs

dbcc freeproccache with no_infomsgs

dbcc freesystemcache(‘TokenAndPermUserStore’) with no_infomsgs

These commands clear all the pertinent caches to insure reproducibility and prevent memory-resident portions of the database from skewing the results. Each query required approximately 90 seconds to execute as shown in Table 1. Each query required approximately one million logical and physical reads to complete. The SSMS option for returning the actual execution plan was set prior to execution, and the six queries and execution plans are shown in the individual query sections below. The first five queries resulted in different recommendations, but the sixth query’s plan and recommended missing index were identical to that of the fifth because the only difference between Queries #5 and #6 is the ordering of the columns (part of the answer to question #2). The differences among all the queries are summarized in Table 2 below.

Blog_20170601_T1

Table 1: Summary of Initial Query Executions & Timings

Blog_20170601_T2

Table 2: Summary of Query Selection Columns

Query #1

select [DupID], [MyText], [Metric01]

from FewDuplicates

where DupID between 100000 and 500000 and Metric01 between 10000 and 30000

Blog_20170601_1

Query #2

select [DupID], [MyText], [Metric02]

from FewDuplicates

where DupID between 100000 and 500000 and Metric01 between 10000 and 30000

Blog_20170601_2

Query #3

select [DupID], [MyText], [Metric03], [Metric06], [Metric04]

from FewDuplicates

where DupID between 100000 and 500000 and Metric01 between 10000 and 30000

Blog_20170601_3

Query #4

select [DupID], [MyText], [Metric04], [Metric06], [Metric02], [Metric01]

from FewDuplicates

where DupID between 100000 and 500000 and Metric01 between 10000 and 30000

Blog_20170601_4

Query #5

select [DupID], [MyText], [Metric01], [Metric03], [Metric05], [Metric02]

from FewDuplicates

where DupID between 100000 and 500000 and Metric01 between 10000 and 30000

Blog_20170601_5

Query #6

select [DupID], [MyText], [Metric05], [Metric02], [Metric01], [Metric03]

from FewDuplicates

where DupID between 100000 and 500000 and Metric01 between 10000 and 30000

Blog_20170601_6

Missing Index Recommendations Displayed via DMVs

The following query was used to provide some of the information enumerated in Table 3 below:

— ##############################################################

— Missing Index DMV Query

https://blogs.msdn.microsoft.com/bartd/2007/07/19/are-you-using-sqls-missing-index-dmvs/

— ##############################################################

SELECT migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,

‘CREATE INDEX [missing_index_’ + CONVERT (varchar, mig.index_group_handle) + ‘_’ + CONVERT (varchar, mid.index_handle)

+ ‘_’ + LEFT (PARSENAME(mid.statement, 1), 32) + ‘]’

+ ‘ ON ‘ + mid.statement

+ ‘ (‘ + ISNULL (mid.equality_columns,)

+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ‘,’ ELSE END

+ ISNULL (mid.inequality_columns,)

+ ‘)’

+ ISNULL (‘ INCLUDE (‘ + mid.included_columns + ‘)’, ) AS create_index_statement,

migs.*, mid.database_id, mid.[object_id]

FROM sys.dm_db_missing_index_groups mig

INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle

INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle

WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10

ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC

The following table lists the missing index recommendations as displayed by the DMVs. It also summarizes all of the missing index recommendations to make it easier to determine what a consolidated index might require. It is important to note that the missing index recommendations for Query #5 and #6 are the same, just as they were in the query plan listings above. Although discussion of the calculation and interpretation of the Improvement Measure column is beyond the scope of this article, it should be noted that the total improvement measure was approximately twice that of any other index recommendation and further investigation reveals that this index would satisfy both Query #5 and #6.

Blog_20170601_T3

Table 3: Summary of All Missing Index Recommendations

Composite Index Recommendation for All Six Queries

Either Table 2 or Table 3 could be used to determine a composite index. In-depth investigation reveals that since DupID and Metric01 are specified in the where clauses as inequalities, these should be key columns in any index, and review of Table 3 highlights this for all index recommendations. Since MyText is used in all of the queries and Metric01 is to be used as a key column, the only remaining columns are Metric02, Metric03, Metric04, Metric05, and Metric06. An index that employs these columns as included columns can “cover” all the queries shown in this article. For further information about covering indices, please refer to the following article: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns. The composite (and covering) index is shown below:

CREATE NONCLUSTERED INDEX ix_ResolveMultipleQueriesNeeds ON [dbo].[FewDuplicates] (

[DupID],

[Metric01]

)

INCLUDE (

[MyText],

[Metric02],

[Metric03],

[Metric04],

[Metric05],

[Metric06]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Query Plans for Queries #1 – #6 with Composite Index Added

Once the combined index was implemented, each query was run as before: caches cleared and SSMS actual execution plan option on. No query ran longer than one second even when the caches had been purged. Although the queries were run individually for the purposes of obtaining accurate timings, they were also run together for the purpose of showing the similarities of the execution plans with the new index in place. Table 4 lists the execution results and clearly, performance improved greatly for all queries. For example, durations decreased from approximately 90 seconds to about one second and reads dropped from approximately one million to four thousand. Examination of the query plans shows that all six queries used the new index to obtain the observed extremely fast performance.

Blog_20170601_T4

Table 4: Summary of Query Executions & Timings After Composite Index Added

Blog_20170601_7

Summary

Most readers probably already knew the answer to the first question: key columns are used to filter query results. The simplest example of this involves a where clause. Columns referenced by equalities are placed first, followed by columns that are used with inequalities. Some of the missing index DMVs actually differentiate between the types of relationships involved. The example queries show that table column ordering dictates included column order thereby answering question #2. Since table column ordering is used to specify included column ordering, SQL Server WILL NOT duplicate missing index recommendations when the columns are identical regardless of their ordering in the queries. The answer to the last question is that SQL Server does not appear to perform any consolidations or optimizations other than those cited in the answer to question #2. This knowledge is invaluable, particularly when evaluating missing index recommendations on production systems because unless the queries that cause SQL Server to generate the missing index recommendations have been executed recently, there is little information to associate the actual queries with the recommendations. Understanding those factors that affect SQL Server’s recommendations can simplify causing query identification and give an analyst more certainty that the correct associations have been made.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Table-Valued Parameter Performance Using C# Data Tables

The following testing and resulting data is a direct result of a recent issue that was discovered at one of our client sites.  After setting up SQL monitoring that does use a SQL trace to capture long running queries, suddenly certain processes would lock up and be unable to complete.  Initially we thought it was everything except the SQL trace causing the issue until we finally turned off all trace captures and the problem went away.  This was very concerning to us since using either SQL trace or XEvents is the method that we generally use to capture query performance. Without a fix, our performance tuning attempts would be severely hamstrung and slowed down. This problem had to be fixed yesterday!

Another side effect that would happen when things locked up was that despite having the max server memory set, SQL would end up using substantially more memory that was allocated to it.  At this client, developers were importing data from an application using C# and table-valued parameters which should have not been an issue. After a lot of searching, we finally came across some comments by another SQL expert who mostly described our issue and recommended that any C# TVP’s should have their columns defined or SQL could lock up if either trace or XEvents Completed Events are captured.  When we checked the C# code we found that the string columns that were being imported were not defined in an effort to make the import code flexible.  We were able definitively fix this very obscure problem with just one line of C# code that is in the below post.  The end result is a FASTER import of data that can be captured in trace or XEvents.  Nice!!

Since we had to put together this fix for a very undocumented issue, we sincerely hope that any DBA or developer who runs into this problem can correct it much faster than we did.  A link to download a zip file of all code displayed in this post is included at the end.  Enjoy!

********************************************

Table-Valued Parameter Performance Using C# Data Tables

— By Jeff Schwartz

Overview

Much has been written about the use of table-valued parameters (TVP) and their performance. These narratives appear to be contradictory because some state that TVPs should be used only for small datasets, whereas others assert that TVPs scale very well. Adding to the confusion is that the fact that most of the performance-related discussions focused on TVP use within T-SQL. Many of these examples employ either SQL trace or Extended Events (XEvents) to evaluate performance, but almost none of them has discussed what can occur when C# data tables and SQL trace or XEvents are used simultaneously. This paper details the analysis and resolution of a situation that occurred at a customer site when an existing C# application suddenly began to run extremely slowly after a very lightweight SQL trace that utilized a several-minute duration threshold and captured no frequent events was activated.

Initial Research

Preliminary research indicated that under very specific conditions the combination of TVPs and SQL trace could result in unintended performance consequences, e.g., excessive SQL Server memory consumption to the point where SQL Server exhausted all memory on the server regardless of the maximum memory setting value. The research also suggested that string usage and lengths in unspecified locations might have an impact on this issue. The question of whether TVPs might have some record-count performance threshold above which performance would degrade was also examined. Although bulk insert was not used by the application, the research team also decided to compare bulk and TVP insertion speeds because that is another source of conflicting information.

Since very little literature exists that discusses the combination of TVPs and SQL trace, the only recourse involved creating test frameworks, performing actual data loads, and using internal application timers in conjunction with either SQL trace or XEvents to monitor performance.

Test Frameworks

1.Two frameworks were constructed: T-SQL-only and C# calling a T-SQL stored procedure.

a. The T-SQL-only test loaded data into a TVP, which then called a stored procedure to perform the insertion into the final table.

b. The development team provided C# code that illustrated how the user application read a flat file into a C# data table, which was then passed as a TVP to a T-SQL stored procedure. This code was followed as closely as possible in the test framework.

2.A SQL server table containing data captured from the sys.dm_db_index_operational_stats Data Management View (DMV) supplied the data for testing because

a. The data table contained approximately 3.2 million records, thereby being large enough to stress the application code and SQL Server adequately. The largest data file provided by the application team was approximately 1.6 million records, so the testing could push well beyond current record levels.

b. Each data record contained 47 columns, which was wider in terms of data columns than any application table being loaded.

c. No string value columns existed in the data. This insured that user string data could not affect any interaction between the C# application and SQL Server, and that any string-related behavior was strictly due to the manner in which the application loaded the data table or the TVP, or the way the TVP parameter-passing was recorded in the SQL trace. All data columns were smallint, int, bigint, or date. This also insured exact record lengths so that any record count-related behavior was independent of record size. Some of the data files being loaded by the development team contained only fixed-width data types, whereas others contained at least a few variable length string columns.

3.The frameworks were designed to load various numbers of records using the same starting point to determine whether a performance threshold existed for TVPs, regardless of whether SQL trace (or XEvents) was active, as well as to determine whether even small numbers of records could be processed with the SQL trace (or XEvents) active.

4.Thirty-one different record levels, shown in Table 1, were used for testing to insure any record-count gaps were small. A separate CSV file of each length was created from the data cited in #2, always beginning with the first record.

Blog_20160901_1

Table 1: Record Counts used in Testing

5.Several methods of loading the flat file CSV data into C# data tables were developed and tested. The C# data table configurations were as follows:

a. Unlimited string length columns (initially used in the production C# application) – herein noted as NoStringMax (String #N/A on graphs)

b. Fixed length string columns of at most 30, 100, 500, or 1000 characters each – herein noted as StringMax (String <#> on graphs). Note: the development team stated that no column would exceed 1,000 characters in length.

c. Exact data type mapping so that the data table column types matched those of the receiving SQL Server table exactly – herein noted as Map

d. A common routine was used by options a, b, and c above to send the data table to SQL Server using a TVP.

e. The section of code that loaded the data table from the flat file was timed separately from the routine cited in d, which also was surrounded by its own timers. This enabled comparison of data load, TVP passing, and combined times.

f. The name of the CSV file is passed in as a parameter along with an adjusted version of the file name so the appropriate test identifiers can be written to the application log file for later analysis.

6.The tests were run with the following SQL Server monitoring options:

a. No SQL Server monitoring at all.

b. SQL trace monitoring including all normally captured events except for completed ones. These were handled as follows:

i.Exclude all completed events

ii.Include ad-hoc and batch completed events with a two-second duration threshold

iii.Include ad-hoc and batch completed events as well as completed statement events with a two-second duration threshold

c. SQL Extended Events (XEvents) including all normally captured events except for completed ones. These were handled as follows:

i.Exclude all completed events

ii.Include ad-hoc and batch completed events with a two-duration threshold

iii.Include ad-hoc and batch completed events as well as completed statement events with a two-duration threshold

7.All tests were performed on the same server to eliminate network and hardware variability. This server had a quad-core Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz processor, 32 GB of RAM, and a 500 GB SSD HD.

8.Various combinations of cold and warm SQL Server caches, cold and warm Windows caches, and SQL Server maximum memory limit sizes were tested. Some of the earlier attempts will be discussed in the Results section below. The final specifications were as follows:

a. No applications other than SQL Server, SSMS, and the C# application were active on the test server.

b. SQL Server’s maximum memory limit was set to 12 GB. This allowed SQL to go over its limit without distorting overall system behavior by exhausting server memory entirely.

c. Hot Windows file cache, i.e., a full run that loaded ALL of the flat file data into RAM was performed before the timing tests were run. This insured that the HD was involved minimally.

d. Hot SQL Server buffer cache, achieved by running the next series soon after the last run completed (with no intervening runs), and by running each sequence four times in immediate succession.

C# Code

The NoStringMax routine cited in #5a in the previous section is shown in Figure 1.

Blog_20160901_2

Figure 1: Test C# Data Table Load Routine Using No String Length Maximums

The StringMax routine cited in #5b in the previous section is shown in Figure 2. The ONLY difference between this routine and the previous one is the addition of the MaxLength assignment that is highlighted below. This single line sets the maximum length of each data table column to the value of the iMaxStringLen parameter. As cited in #5b, this value ranged from 30 to 1,000 during testing.

Blog_20160901_3

Figure 2: Test C# Data Table Load Routine Using String Length Maximums

The Map routine cited in #5c in the previous section is shown in Figure 3 through Figure 6. Logically, this code is identical to that of #5a and #5b, but the individual column mappings make the routine much longer. Note: since no strings are used for the data table columns, the MaxLength parameter is unnecessary.

Blog_20160901_4

Figure 3: Test C# Data Table Load Routine Using Data Type Mapping – Part 1

Blog_20160901_5

Figure 4: Test C# Data Table Load Routine Using Data Type Mapping – Part 2

Blog_20160901_6

Figure 5: Test C# Data Table Load Routine Using Data Type Mapping – Part 3

Blog_20160901_7

Figure 6: Test C# Data Table Load Routine Using Data Type Mapping – Part 4

The routine cited in #5d in the previous section is shown in Figure 7.

Blog_20160901_8

Figure 7: C# Test Routine that Sends Data Table to SQL Server Using a TVP

Results

1.The T-SQL code alone, cited in Test Frameworks #1a did not recreate the problem observed at the customer site when either SQL trace or Extended Events (XEvents) were used, so additional testing was unnecessary.

2.No problems occurred when the C# application was used in conjunction with SQL trace or XEvents as long as no completed events were captured. Therefore, capturing only specific trace events created the problems.

3.Further research showed that when the C# code was running, adding ad-hoc or batch completed events to SQL traces or XEvent sessions caused issues. Adding completed statement events did not change things appreciably.

4.Interestingly, the extreme misbehavior was triggered by the combination of using NoStringMax C# code and having SQL trace or XEvents capture ad-hoc or batch completed events. Although the StringMax and Map routines ran a little more slowly for certain record count levels when SQL trace or XEvents captured ad-hoc or batch completed events, the dramatic memory consumption issues did not occur at all when these methods were used.

5.Initially, testing employed a 28 GB maximum SQL Server memory setting, but as occurred at the customer site, when the problem arose, SQL Server consumed all the memory on the server, which caused the server fundamentally to stop working. Using this setting and the NoStringMax C# code, CSV files with record counts up to 25,000 could be processed without taking hours, failing, or causing the server to run completely out of memory. However, the application could not reach the 50,000 record level.

6.Since testing needed to process MUCH higher numbers of records, the decision was made to reduce SQL Server memory to 4 GB in an attempt to provide additional memory space for SQL Server memory overflow. Another reason for lowering the SQL Server memory allocation was to insure that the tests consumed all of SQL Server’s buffer pool and that SQL Server had to operate against memory limits. This consideration, as well as the run-time variations, necessitated the four successive iterations for each combination.

7.Unfortunately, using the 4 GB setting, the lower-end NoStringMax C# code runs while capturing SQL trace ad-hoc or batch completed events caused application failures at approximately the 4,000 or 8,000 record levels. This clearly indicated an increase in SQL Server memory was required, so it was increased to 8 GB. Although the runs processed more records, they still crashed well before the 25,000 record level. Finally, moving the memory limit up to 12 GB enabled everything to run as before, effectively mimicking the 28 GB testing without running the risk of exhausting Windows memory.

8.Figure 8 highlights the NoStringMax C# code runs through the 25,000 record level while SQL trace or XEvents ad-hoc or batch completed events were being captured. It is evident that the run times were unacceptable. When the same test levels are viewed in Figure 9, the problem becomes abundantly clear. With SQL trace or XEvents capturing ad-hoc or batch completed events, the run times for the 10,000 record tests using the NoStringMax C# code ranged between 483 and 584 seconds. When ad-hoc or batch completed events were not captured, the run times were approximately 0.14 SECONDS! The values for all other tests, including NoStringMax C# code with SQL trace and XEvents ad-hoc or batch completed events off, are shown in Figure 9.

Blog_20160901_9

Figure 8: TVP Load Timing when Trace On or XEvents on

9.Although many lines appear in Figure 9, three groupings exist and these are the most important. Group 1, which consists of dotted lines, includes all tests during which ad-hoc or batch completed events were not captured. Group 2, which consists of solid lines, includes all tests during which SQL trace captured ad-hoc or batch completed events. Group 3, which consists of dashed lines, includes all tests during which XEvents captured ad-hoc or batch completed events. Important note: The NoStringMax runs shown in Figure 8 are NOT in Figure 9 because of scaling considerations. Figure 9 highlights several notable issues. Most importantly, once a maximum string length is specified, performance improves even when SQL trace or XEvents ad-hoc or batch completed events are not captured. In addition, the terrible performance problems go away. Interestingly, the behaviors of the various StringMax and Map runs were almost identical through approximately 175,000 records and then again at about 800,000 records. In between, unexplained divergence occurs, but it is only a few seconds and occurred primarily between the 200,000 and 700,000 record levels. The pattern shown in Figure 9 was repeated in every test sequence conducted. It is noteworthy that the StringMax and Map versions of the routine outperformed the NoStringMax under all comparable trace or XEvent-related conditions, and at higher volumes, even when completed events were captured.

Blog_20160901_10

Figure 9: TVP Load Timing when Trace/XEvents off OR Trace or XEvents on and Mapped Data Table or Limited-Length String Values Used

10.Once the number of records exceeeded two million, the load of the C# data table began to crash due to application memory exhaustion. Note: neither Windows nor SQL Server memory was consumed excessively during this phase, so the limitation was strictly application-related.

11.Figure 10 summarizes the overall insertion rates of all previously cited methods at the two million record level, in addition to the bulk insert method. This graph shows clearly that the worst performer was the unlimited string method, and that the best one was the bulk insert method. Note: these rates exclude the reading of the flat file into the C# data table as well as the creation of the file that was suitable for bulk insertion.

Blog_20160901_11

Figure 10: TVP versus Bulk Insert Records per Second

Conclusions

The use of unlimited-length C# strings with data tables not only performs worse without capturing any SQL trace or XEvents ad-hoc or batch completed events, it performs dreadfully when they are. Their use can cause poor overall server performance if the server’s memory is exhausted. Finally, their use prevents the ability to monitor SQL Server query performance using either SQL trace or XEvents. However, when a maximum string length is specified, regardless of its length, performance without SQL trace or XEvents improved and the problematic interaction with SQL trace or XEvents was mitigated almost completely. Since the Map method is inefficient and limiting from an application development perspective and performance was not substantially better than with limited-length strings, its use does not appear to be beneficial.

No table-valued parameter scalability issues were observed with the possible exception of the 200,000 to 700,000 record range when performance inexplicably, and consistently, dipped. However, from 800,000 to 2 million, performance experienced no such dip. Interestingly, standard bulk insert performance exceeded that of TVPs when the same data was used.

Recommendations

Since the code to implement the maximum-length string involves only one line and the performance monitoring and improvement benefits are substantial, the <data table name>.MaxLength = <n> statement should be included in data table load routines as shown by the highlighted line in Figure 2. <n> should be the maximum expected length of any data column. Limiting string length will allow C# TVP data loads to run more efficiently and also enable customers to conduct normal query performance monitoring.

********************************************

The code shown in this post can be downloaded from here….  SQLRX_TVP_CSharp_Code

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

SQLRx Presenting at SQL Saturday Baton Rouge 2016

SQLRx (@SQLRX) is proud to announce that two of our resident SQL Experts will be presenting sessions at SQL Saturday #515 in Baton Rouge, LA on August 6, 2016. If you have not signed up to attend, you are going to miss out!!

Blog_20160804_4

http://www.sqlsaturday.com/515/eventhome.aspx

Lori Brown (@SQLSupahStah) will give a lesson on how to be an Actively Proactive DBA. Come have fun and learn at her session, Actively Proactive – Straightforward Ways to Manage and Diagnose SQL. Attendees will be taken through the things a proactive DBA does and will also be treated to some quick ways to troubleshoot like a SQL Super Star!

Blog_20160804_1

Scripts will be given out to all who download the presentation materials. Come and get ‘em!!

Blog_20160804_2

Jeff Schwartz will presenting Rx for Demystifying Index Tuning Decisions. Jeff will cover ways to find poor performing queries and tuning them using various methods including execution plans, usage statistics and missing indexes.

Blog_20160804_3

Since Jeff has been tuning things for almost 40 years, he can show you some things that will be extremely useful and will help you tune like a pro.

If you are in Baton Rouge this weekend come by and visit with us! SQLRx is a sponsor of SQL Saturday Baton Rouge 2016 and will have a booth where you can come sit down with Lori or Jeff and talk tech. We will also be handing out goodies as well as giving away a 2 TB external drive at the end of the day. Also, one lucky attendee will win a free Ultra Light Performance Assessment from SQLRx at one of our presenter’s sessions. We will tell that lucky winner about their SQL performance for free….Winner, Winner, Chicken Dinner!!

Hope to see you in Baton Rouge!

Finding overlapping times with the LAG Function

— by Jeffry Schwartz

Some functions in SQL are a little difficult to use and understand. I was recently using the LAG function to shed some light on a performance issue, so I thought I would show how I used it and the results it produced in a way that is easy to understand.  At one of our clients after evaluating SQL trace data, I noticed that the attention records (caused by problems with the application, connections, etc.) not only occurred around some non-yielding scheduler messages that had been appearing along with timeouts, but also that the attention records were overlapping, i.e., sometimes one began and then others occurred before the first one ended.  I’ve looked at the attention data from other clients and found that not only do they not overlap like this, they don’t last long enough to overlap, meaning that this is a special case.

To show clearly which ones overlap, I decided to use the lag function to compare the end time of an older one with the start time of the current one.  If the start time of the current record occurs before the end time of a previous record, the flag is set.  This way I don’t have to scrutinize the timestamps and compare them manually.  I partition by ServerName because I want to compare all those from the same instance and I want to order them by StartTime so when I compare adjacent records, they are in time order.  You can partition and order by anything you want.

Here is the query, which is generic enough to be understood easily.  I’ve included the timestamps themselves so you can see how it works.

SELECT StartTime, EndTime,

CASE WHEN StartTime < LAG(EndTime,1,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 1],

CASE WHEN StartTime < LAG(EndTime,2,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 2],

CASE WHEN StartTime < LAG(EndTime,3,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 3],

CASE WHEN StartTime < LAG(EndTime,4,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 4],

CASE WHEN StartTime < LAG(EndTime,5,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 5],

LAG(EndTime,1,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime1],

LAG(EndTime,2,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime2],

LAG(EndTime,3,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime3],

LAG(EndTime,4,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime4],

LAG(EndTime,5,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime5],

Duration / 1000.0 AS [Duration Secs], Duration / 60000.0 AS [Duration Minutes]

–, SPID, ClientProcessID, RowNumber,

–ServerName AS [Instance], LTRIM(ApplicationName) AS [Application],

–DBName AS [Database], HostName, NTUserName, LoginName AS [Login Name]

FROM dbo.SQLTraceReceiver WITH (NOLOCK)

INNER JOIN SQLStaticDatabaseInfo ON DatabaseID = [DBID]

WHERE SQLTraceReceiver.EventClass = 16 — Attention

 

Here is the output:

Blog_20160526_1

A timestamp of 12:00:00 AM really means NULL because the lag is invalid.  I highlighted the overlaps to make them easier to see.  Obviously, you may not need to use lag for time evaluations as I do, but this shows you how to compare values of adjacent records and report something about the comparison without having to display the actual lag values.  I left the [LAG EndTime<n>] columns in the query so you could see it in action, but for the final real query, I actually would not include them and would add the descriptive columns that are commented out back in.

Lag takes a little getting used to, but you once you play with it a bit as in this example, it makes perfect sense.  These functions came out on 2012, so if you are using this version or higher, you are good to go.  Note:  I could have used LEAD to go the other way, i.e., to compare future records using the current record.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Using Ranking Functions to Partition and Order Unfamiliar Data

The Problem

How many times have you had to subdivide, group, rank, and organize data when you knew very little about the data itself except for the meanings of the columns? This dilemma is one that the members of the SQLRx team encounter frequently. When a customer requests help with a performance problem, we often do not know the number of databases, tables, or indices (or how busy any of them are) before we begin our analysis of the performance data. We also do not know beforehand which metrics will provide the most insight into a customer’s problems. The unknown nature of the data necessitates that we make no assumptions and therefore, look at all performance data.

 

To expedite our analyses, we developed Reporting Services routines that enable us dynamically to depict the data graphically. This is particularly useful for processing data from the SQL Server DMVs, which can produce voluminous amounts of data. For example, a simple 12-hour collection can produce millions of data records that are written periodically and efficiently by our T-SQL scripts into flat files for post-processing off-site. To put the analysis problem into better perspective, it is instructive to consider how one might graph the single-record lookups metric for the 6,035 possible database/table/index combinations contained in the Index Operational DMV data used in this article. If every index were used during the data collection period and each graph contained 14 series, this would result in 432 graphs. Obviously, this is unmanageable. Unless some method is devised to place the most interesting series on the first graph or two, the graphs will be as useless as the number columns themselves. A member of our team, Jeff Schwartz, developed some T-SQL code that grouped the data by metric and any other subdivision entity, e.g., instance, database, file, table, or index without resorting to cursors or dynamic SQL. The series were also ranked with the largest N (specified by the user via the Reporting Services UI) maximums on the first graph with the next N series on second graph, etc. The partitioning and ranking mechanism he devised, which employs the DENSE_RANK and ROW_NUMBER T-SQL ranking functions, is applicable to variety of data subdivision and organization problems and is the basis for this article.

 

Initial Attempts and Experimentation

Although the NTILE function appeared to be the obvious choice, Jeff found that he could not obtain the desired metric partitioning for the graphs by using this function.  NTILE requires a fixed number of partitions. Some metrics had hundreds of unique series, whereas others had one, seven, or nine (see Table 1).  For some combinations, several graphs were created for a given metric, but each graph only had one or two series on it! For those with an intermediate number of combinations, NTILE evenly divided the total number of series into equal portions. This meant that although the user specified a series per graph value such as seven, each graph would only contain five series on it (illustrated in Table 2 by the Leaf Ghosts records), thereby effectively ignoring the user specification. The query below, whose output is shown in Table 1, determines how many unique column combinations existed in the data for each metric.  Clearly, the number per metric varied greatly.  Please also note that the GroupID was assigned automatically based upon the order by clause within the DENSE_RANK function. This will become an integral part of the final partitioning strategy.

 

SELECT MetricName, DENSE_RANK() OVER(ORDER BY MetricName) AS GroupID, COUNT(*) AS [Count]

FROM MeasureBeforeUpdate

GROUP BY MetricName

ORDER BY MetricName

 

Blog_20160307_1

Method Comparisons

The next task was to reestablish the ordering for all the graphs with the worst ones on the first graph, the next worst on the second, etc.  After a fair amount of experimenting, Jeff hit upon the following combination that does the entire grouping, ranking, splitting, etc. all in one pass as shown below (it also uses a CTE to simplify the coding and improve legibility).  The example below shows all the building blocks so you can see how the final query was derived. For the sake of comparison, ALL attempted ranking variations are shown in the query. The ones that proved useful, DR and RNDiv, are highlighted. along with the combined item named MeasureRank. To aid with interpretation, @I_GroupNbr specifies the number of series per graph provided by the user, @MaxCount determines the largest number of combinations (Count in Table 1), and @MaxGroups specifies the maximum number of possible groups for each metric. This value is computed earlier in the stored procedure to simplify the computations greatly and is established using the following formula: SET @MaxGroups = CEILING (@MaxCount * 1.0 / @I_GroupNbr). The DENSE_RANK function provided the metric groupings, and the ROW_NUMBER function partitioned the columns by metric and ordered them by the maximum value in descending order. The integral division enabled maintaining the relative ranking within a group. The query output is shown below.

WITH RankingList AS (

SELECT MsId,

@MaxGroupID + (DENSE_RANK() OVER (ORDER BY MetricName) – 1) * @MaxGroups +

NTILE(@MaxGroups) OVER (PARTITION BY MetricName ORDER BY MetricName)

AS [OldMeasureRank],

DENSE_RANK() OVER (ORDER BY MetricName) AS DR,

NTILE(@MaxGroups) OVER (PARTITION BY MetricName ORDER BY MetricName) AS NT,

ROW_NUMBER() OVER (PARTITION BY MetricName ORDER BY MaxVal DESC) AS RN,

((ROW_NUMBER() OVER (PARTITION BY MetricName ORDER BY MaxVal DESC) – 1) / @I_GroupNbr) + 1 AS RNDiv,

(ROW_NUMBER() OVER (PARTITION BY by MetricName ORDER BY MaxVal DESC) – 1) % @I_GroupNbr AS RNMod,

@MaxGroupID + (DENSE_RANK() OVER (ORDER BY MetricName) -1) * @MaxGroups + ((ROW_NUMBER() OVER (PARTITION BY MetricName ORDER BY MaxVal desc) -1) / @I_GroupNbr) + 1 AS [MeasureRank]

FROM MeasureBeforeUpdate

)

SELECT GroupId = RankingList.MeasureRank, RankingList.*, MeasureBeforeUpdate.*

FROM MeasureBeforeUpdate

INNER JOIN RankingList ON RankingList.MsId = MeasureBeforeUpdate.Msid

Blog_20160307_2

Final Code and Sample Graphs

The final code compares the original partitioning code values with those of the new method. The biggest differences involve the replacement of NTILE with ROW_NUMBER and the addition of integral division. This combination automatically numbers the series for a given group and resets this number once it hits the maximum number of items to be shown on a Reporting Services page.  Since this is a parameter specified by the user, it now breaks the output up into N sections depending upon how many series are requested for each graph. Examples of the graphs with the number of series per graph specified to be seven are shown below the query. As highlighted in the Table 1, only five Leaf Page Merges combinations existed, whereas there were 105 for Leaf Inserts. As a result, only one graph was needed for the first metric, but 15 were possible for the second. Since each graph set has its own ID (as displayed in the graph titles), it is trivial to terminate the graph generation process early by avoiding the calls for any group greater than a certain value. When returning large numbers of metrics, e.g., 40 for Index Operational statistics, being able to terminate graphing early, e.g., at two, can result in a reduction of more than 93 percent in graph generation duration. With this particular data, the total possible number of graphs was 907, whereas by stopping at only two graphs per metric, the total decreased to 61.

 

WITH RankingList AS (

SELECT MsId,

@MaxGroupID + (DENSE_RANK() OVER (ORDER BY MetricName) – 1) * @MaxGroups + NTILE(@MaxGroups) OVER (PARTITION BY MetricName ORDER BY MetricName) AS [OldMeasureRank],

@MaxGroupID + (DENSE_RANK() OVER (ORDER BY MetricName) -1) * @MaxGroups + ((ROW_NUMBER() OVER (PARTITION BY MetricName ORDER BY MaxVal DESC) -1) / @I_GroupNbr) + 1 AS [MeasureRank]

FROM MeasureBeforeUpdate

)

SELECT GroupId = RankingList.MeasureRank, RankingList.*, MeasureBeforeUpdate.*

FROM MeasureBeforeUpdate

INNER JOIN RankingList ON RankingList.MsId = MeasureBeforeUpdate.Msid

Blog_20160307_3

Blog_20160307_4

Blog_20160307_5

 

 

 

Conclusion

This article illustrated the pros and cons of using various ranking functions, especially in combination, which is seldom shown in ranking code examples. This article provides a different perspective regarding Windows ranking functions and their use in solving everyday partitioning, ranking, and organization problems.

 

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

 

SQL Server 2012/2014 Extended Events for Developers – Part 1

— By Jeffry Schwartz

Microsoft introduced Extended Events, a.k.a. XEvents, in SQL Server 2008 to aid in the analysis of performance problems. Both the capabilities and syntax were changed significantly on SQL Server 2012, and new (and very important) features were added in SQL Server 2014. Therefore, the focus of this document will be on XEvent usage with SQL Server 2012 and 2014. Specifically, it will concentrate on how developers can use XEvents to evaluate the performance of their queries and stored procedures regardless of how they are invoked, e.g., from a user interface.

Creating and Starting an XEvent Session

The first action is to define and create an XEvent session. One can think of a session as a definition of the events that will be captured along with any filtering that may be appropriate for each event. Once defined, an analyst can simply start and stop the session without specifying anything else. This makes it ideal for developers in a shared environment because they can establish a tailored definition once and then use it repeatedly for arbitrary periods. The code below deletes a pre-existing session named DevPerformanceMonitoring, and establishes a new session definition.

USE master

GO

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name=‘DevPerformanceMonitoring’)

DROP EVENT SESSION DevPerformanceMonitoring ON SERVER

GO

CREATE EVENT SESSION DevPerformanceMonitoring ON SERVER

ADD EVENT sqlserver.rpc_completed(

SET collect_statement=(1)

ACTION(sqlserver.database_id, sqlserver.database_name, sqlserver.sql_text, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.server_principal_name, sqlserver.query_hash, sqlserver.query_plan_hash)

WHERE ([duration]>(10000) AND database_id = 9) — SQL won’t tolerate a variable in here…ugh!

),

ADD EVENT sqlserver.sp_statement_completed(

SET collect_statement=(1), collect_object_name=(1)

ACTION(sqlserver.database_id, sqlserver.database_name, sqlserver.sql_text, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.server_principal_name, sqlserver.query_hash, sqlserver.query_plan_hash)

WHERE ([duration]>(10000) AND database_id = 9)

),

ADD EVENT sqlserver.sql_batch_completed(

ACTION(sqlserver.database_id, sqlserver.database_name, sqlserver.sql_text, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.server_principal_name, sqlserver.query_hash, sqlserver.query_plan_hash)

WHERE ([duration]>(10000) AND database_id = 9)

),

ADD EVENT sqlserver.sql_statement_completed(

SET collect_statement=(1)

ACTION(sqlserver.database_id, sqlserver.database_name, sqlserver.sql_text, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.server_principal_name, sqlserver.query_hash, sqlserver.query_plan_hash)

WHERE ([duration]>(10000) AND database_id = 9)

)

ADD TARGET package0.event_file(SET FILENAME= N’D:\SQLRx\SQLScriptOutput\DevPerf.xel’)

WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF

 

This session monitors RPC completed, SP statement completed, SQL batch completed, and SQL statement completed events whose duration is longer than ten milliseconds (10,000 microseconds) and whose database ID is 9. It stores the information in a file that will reside in the D:\SQLRx\SQLScriptOutput directory, and has the DevPerf prefix as well as the .XEL extension. Note: This directory MUST exist before starting the session or it will fail. Note: Unfortunately, as noted in the comments for the rpc_completed event (and still true as of SQL Server 2014 SP1), variables cannot be used in WHERE clauses. This causes the repetitive filtering code shown above. Should one desire filtering based upon user login, this should be specified as one of the filters in the where clauses, e.g., server_principal_name = (‘<instance>\<login>).

To start this session, the user must issue the following command: ALTER EVENT SESSION DevPerformanceMonitoring ON SERVER STATE = START

To stop the session, the user must issue the following command: ALTER EVENT SESSION DevPerformanceMonitoring ON SERVER STATE = STOP

The actual file names will be DevPerf<large integer value>.xel where <large integer value> is the number of milliseconds between the date and time that the file is created and January 1, 1600 as shown below:

Blog_20160128_1

Obtaining Performance Data Directly from an XEvent XEL File

SSMS Method

Once the events have been captured to a file by starting and then stopping a session, the next step is to query the data. One can use two methods for this: open the XEL file directly from SSMS or use a T-SQL query to read it. The first method is a bit limited because the developer cannot perform searches easily and its output is shown below. Note: times are in microseconds.

Blog_20160128_2

T-SQL Method

The second method is a bit daunting at first, but once a developer has an example from which to work, the task is reasonable, and with practice, relatively easy. The following script illustrates how to navigate the XEL file DIRECTLY using the sys.fn_xe_file_target_read_file function and begin extracting the information, for which XML format is used. The first query lists the event names along with their associated timestamps and XML. It uses the cross apply operator to gain access to the event_data XML column so the event’s name and timestamp can be extracted. This query illustrates the method by which someone new to the process can understand what is contained within the event_data XML, and it also demonstrates how to extract single data elements from the XML.

The handling of the @timestamp node merits further discussion. Event_data timestamps are expressed in UTC time, so one needs to adjust them to understand exactly when things happened. The code in the script that makes it time zone agnostic and automatically determines what the adjustment from UTC time should be in order to be correct for the current time zone.

The second query uses the same mechanism to summarize the number and names of all the events contained within a particular file. The output for both select statements immediately follows the script. The developer can click on the xml_event_data within SSMS and view the entire XML data set.

DECLARE @TraceDir varchar(512)

DECLARE @XeventsTraceName varchar(512) = ‘DevPerf_0_130966819185620000.xel’

SET @TraceDir = ‘D:\SQLRx\SQLScriptOutput\’

— dump entire XEL file so we can see what nodes are present

SELECT event_xml.value(‘(./@name)’, ‘varchar(128)’) AS EventName,

DATEADD(hh,

DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP),

event_xml.value(‘(./@timestamp)[1]’, ‘datetime2’)

) AS PostedTimestamp,

xml_event_data

FROM

(SELECT CAST(event_data AS XML) AS xml_event_data

FROM sys.fn_xe_file_target_read_file(@TraceDir + @XeventsTraceName, null, null, null)

) AS event_table

CROSS APPLY xml_event_data.nodes(‘//event’) EventDataNodes (event_xml);

— dump entire XEL file summarizing by event name so we can be sure we have reported all the ones that have been captured

WITH EventDetails AS (

SELECT event_xml.value(‘(./@name)’, ‘varchar(128)’) AS EventName

FROM

(SELECT CAST(event_data AS XML) AS xml_event_data

FROM sys.fn_xe_file_target_read_file(@TraceDir + @XeventsTraceName, null, null, null)

) AS event_table

CROSS APPLY xml_event_data.nodes(‘//event’) EventDataNodes (event_xml)

)

SELECT EventName, COUNT(*) AS [Count]

FROM EventDetails

GROUP BY EventName

ORDER BY EventName

 

Blog_20160128_3

Obtaining XEvent Record Details

The next script uses the same method as the previous one. The primary differences are that it extracts more data from the XML and uses a common table expression to simplify the inclusion of certain events and the exclusion of certain SQL statements in which we are not interested. The where EventName in (@EventNameToBeSearchedFor1, @EventNameToBeSearchedFor2, @EventNameToBeSearchedFor3, @EventNameToBeSearchedFor4) and SQLStatement not like ‘%DefaultFileStreamFileGroup%’ and SQLStatement not like ‘%CollationVersion%’ portion accomplishes this filtering. The output appears below the script and lists the events along with their duration, CPU time, last row count, physical reads, logical reads, writes, and the text of the SQL statement.

The reader should note the action and data references that are highlighted in the code below. Action items are generally included with every XEvent record (sometimes referred to as global columns), whereas data items contain information specific to an event, e.g., sp_statement_completed. In order to augment the script below, one must understand the differences between actions and data items.

Not all events utilize the same keywords, so the “case when” code below handles the differences to extract the SQL statement from the appropriate node. Another reason for using the common table expression was to make the NOT LIKE operator function simply and as desired. When the NOT LIKE operator was used in a where clause directly with the XML, the code was very clumsy and difficult to achieve the desired results. Placing the XML portion in a common table expression with the where clause on the outside greatly simplified the code and enabled it to operate correctly.

DECLARE @EventNameToBeSearchedFor1 VARCHAR(40)

DECLARE @EventNameToBeSearchedFor2 VARCHAR(40)

DECLARE @EventNameToBeSearchedFor3 VARCHAR(40)

DECLARE @EventNameToBeSearchedFor4 VARCHAR(40)

DECLARE @TraceDir VARCHAR(512)

DECLARE @NumMinutesUTCDiff INT = DATEPART(tz,SYSDATETIMEOFFSET())

DECLARE @XeventsTraceName VARCHAR(512) = ‘DevPerf_0_130966819185620000.xel’

 

SET @TraceDir = ‘D:\SQLRx\SQLScriptOutput’

SET @TraceDir = @TraceDir + ‘\’

SET @EventNameToBeSearchedFor1 = ‘sp_statement_completed’

SET @EventNameToBeSearchedFor2 = ‘sql_statement_completed’

SET @EventNameToBeSearchedFor3 = ‘sql_batch_completed’

SET @EventNameToBeSearchedFor4 = ‘rpc_completed’

— Parse XML from the file to show column-wise output

— use CTE to simplify the where clause like conditions

;

WITH ParsedXMLData AS (

SELECT

event_xml.value(‘(./@name)’, ‘varchar(128)’) AS EventName,

DATEADD(hh, @NumMinutesUTCDiff, event_xml.value(‘(./@timestamp)’, ‘datetime’)) AS PostedTimestamp,

event_xml.value(‘(./action[@name=”database_name”]/value)[1]’, ‘varchar(128)’) AS DatabaseName,

OBJECT_NAME(CAST(event_xml.value(‘(./data[@name=”object_id”]/value)[1]’, ‘int’) AS INT), CAST(event_xml.value(‘(./data[@name=”source_database_id”]/value)[1]’, ‘int’) AS INT)) AS ObjectName,

event_xml.value(‘(./data[@name=”object_type”]/text)[1]’, ‘varchar(25)’) AS ObjectType,

CAST(event_xml.value(‘(./data[@name=”duration”]/value)[1]’, ‘bigint’) AS BIGINT)/1000000.0 AS DurationSecs,

CAST(event_xml.value(‘(./data[@name=”cpu_time”]/value)[1]’, ‘bigint’) AS BIGINT)/1000000.0 AS CPUTimeSecs,

event_xml.value(‘(./data[@name=”row_count”]/value)[1]’, ‘int’) AS [RowCount],

event_xml.value(‘(./data[@name=”last_row_count”]/value)[1]’, ‘int’) AS LastRowCount,

event_xml.value(‘(./data[@name=”physical_reads”]/value)[1]’, ‘bigint’) AS PhysicalReads,

event_xml.value(‘(./data[@name=”logical_reads”]/value)[1]’, ‘bigint’) AS LogicalReads,

event_xml.value(‘(./data[@name=”writes”]/value)[1]’, ‘bigint’) AS Writes,

event_xml.value(‘(./data[@name=”nest_level”]/value)[1]’, ‘int’) AS NestLevel,

CASE WHEN event_xml.value(‘(./data[@name=”statement”]/value)[1]’, ‘varchar(max)’) IS NULL THEN

event_xml.value(‘(./data[@name=”batch_text”]/value)[1]’, ‘varchar(max)’) ELSE

event_xml.value(‘(./data[@name=”statement”]/value)[1]’, ‘varchar(max)’) END AS SQLStatement,

event_xml.value(‘(./action[@name=”sql_text”]/value)[1]’, ‘varchar(max)’) AS SQLText,

event_xml.value(‘(./data[@name=”source_database_id”]/value)[1]’, ‘int’) AS source_database_id,

event_xml.value(‘(./data[@name=”object_id”]/value)[1]’, ‘int‘) as object_id,

event_xml.value(‘(./data[@name=”object_type”]/value)[1]’, ‘varchar(25)’) AS object_type

FROM

(SELECT CAST(event_data AS XML) AS xml_event_data

FROM sys.fn_xe_file_target_read_file(@TraceDir + @XeventsTraceName, null, null, null)

) AS event_table

CROSS APPLY xml_event_data.nodes(‘//event’) EventDataNodes (event_xml)

)

SELECT *

FROM ParsedXMLData

WHERE EventName IN (@EventNameToBeSearchedFor1, @EventNameToBeSearchedFor2, @EventNameToBeSearchedFor3, @EventNameToBeSearchedFor4)

AND SQLStatement NOT LIKE ‘%DefaultFileStreamFileGroup%’

AND SQLStatement NOT LIKE ‘%CollationVersion%’

 

Blog_20160128_4

Summary

This article covered how to create, start, and stop XEvents sessions using T-SQL and report the output using either SSMS or T-SQL. SSMS provides a convenient method for viewing the data quickly, but is difficult to use for searches of any kind or any additional filtering. T-SQL appears to be difficult at first because of the XQuery language, but once one becomes accustomed to the XML layout, obtaining specific pieces of data is reasonably easy. More advanced subjects such as collecting query_post_execution_showplan events and connecting these plans to other completed events will be covered in subsequent articles.

Disk Metrics – Why PerfMon’s, SQL Server’s Filestats, and Disk Vendors’ Values Sometimes Appear to Disagree

— By Jeffry Schwartz

Whenever multiple data sources are used, actual or perceived discrepancies can arise. Sometimes this is due to the location of the data collector, e.g., Windows, SQL Server, or the disk array. Other times, it is due to the collection rate and whether the information is a simple recording of continuously incremented data or it is an instantaneous sample. True sampling presents obvious issues because its accuracy is related directly to the sample rate. Clearly, the more often data is sampled, the more precisely it will represent actual behavior. However, this improved accuracy comes with the price of adding pressure to the disk, thereby distorting the picture of what the disk LUN is actually doing.

Disk Vendors

By default some disk array vendors sample disk performance data infrequently unless a more detailed view is required. This is important to know because this kind of sampling can miss spikes in activity completely, e.g., SQL Server checkpoints, which can be extremely write-intensive. As a result, a disk array that performs poorly during database checkpoints, but otherwise is effectively idle, will appear to be functioning well unless one of the samples happens to coincide with a checkpoint. Although the primary difference involves the sampling rate, the fact that the disk array measurements do not include the Windows I/O stack sometimes causes disk vendors to be suspicious of any Windows I/O metrics, and often suggest that either Windows’ I/O handling or its instrumentation is faulty. The author has not observed this.

Perfmon

Blog_20160107_1

PerfMon and similar programs that rely upon Windows performance counters retrieve the averages for I/O metrics from the disk drivers themselves, i.e., from the code inside Windows that actually handles the I/Os. The “sampling” rate simply determines how often the averages are recorded (and reset for the next interval). Therefore, these are not true samples because the disk driver constantly updates its statistics and the collector periodically retrieves the averages. As a result, the disk driver will detect intermittent activities such as checkpoint behavior. The primary difficulty with these measurements is that any summarization of the data inevitably involves averaging averages. For example, assume that the collector retrieves the I/O data from the driver every 30 seconds and one needs to summarize disk behavior over a 15-minute period. In this scenario, the period average will be the average of 30 averages. The statistical literature is replete with examples of situations when this is sufficiently accurate, as well as those when it is not. Regardless, it is easy to understand why the averaging of averages is preferable to the alternative of highly frequent true sampling.

SQL Server FileStats

Blog_20160107_2

SQL Server captures I/O performance data differently. It records the times at which an I/O request is initiated and completed. It is important to note that in order to complete an I/O, the thread handling the I/O must pass through the processors’ ready queue and become active on a processor so the I/O completion code can be executed. If the processors are sufficiently busy, a delay will occur while the thread waits for a processor. Although SQL Server measures these kinds of delays for an entire instance, it does not expose them via the sys.dm_io_virtual_file_stats DMV. Therefore, differences between PerfMon’s and SQL Server’s measurements can be due to processor delays or the comprehensive nature of the SQL Server data, or the averaging of averages for PerfMon data. Unfortunately, there is no way to be certain without obtaining extremely detailed data, which, as cited previously, will distort any measurements. Note: if the SQL Server I/O metrics are captured and averaged every 30 to 60 seconds, the difficulties with averaging averages will be present in this data as well. Therefore, the only way to avoid this issue is to subtract the first snapshot’s values from those of the second and compute the differences. This method was used to compute the SQL Server metrics in the table below.

Differences

A simple example, whose data appears in the table below, helps illustrate the differences and similarities between PerfMon and SQL Server. The rightmost two columns illustrate the percentage and absolute differences for each pair of metrics. The Pct Diff SQL Server vs. PerfMon column highlights the percentage differences, whereas the last column lists the absolute differences. The read, write, and I/O times are in milliseconds, so clearly, the two disk LUNs shown below were under extreme duress during the data collection period, and reads and writes performed very poorly. Good read and write times should be 20 milliseconds or less, but the best time shown in the table below is 548 milliseconds. The similarities between the two data sources are notable because all of the rate values are close, although there is more variation in the response times, in particular for writes on LUN 10 O. Closer inspection reveals that LUN 09 N’s response metrics are very close and that the 10 O LUN accounts for most of the differences. However, despite the cited differences, both data sources support the conclusion that the disk LUNs performed very poorly.

Blog_20160107_3

In summary, despite the significant differences in the origins of the PerfMon and SQL Server I/O metrics, both sets of metrics agree sufficiently and they also correctly detect I/O performance issues. Differences can result from the averaging of averages issues for PerfMon data, or processor-related delays and the comprehensive nature of the SQL Server I/O metrics. Regardless, either of these usually provides a more accurate representation of disk behavior than the infrequent sampling methods used by some disk vendors.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRX@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. We love to talk tech with anyone in our SQL family!

Using Common Table Expressions to Improve Query Performance

-by Jeffry Schwartz

Queries with multiple join clauses create particularly difficult issues when they perform poorly. Join ordering is extremely important because if a join creates a huge interim result set at the beginning of the query execution, SQL Server must continue to handle this result set until it can be trimmed down with filtering or additional join clauses. If this does not occur until the end of query execution, the query will usually perform poorly. However, if the number of records from a large table can be limited early in the execution or the join to the large table is added at the end of the query, the query will usually perform better. Therefore, when large tables are involved the ordering and the types of joins can dramatically affect the performance of a query. When multiple large tables are involved, it is often quite difficult to determine which one caused the poor performance.

One way to influence SQL Server’s ordering of joins is to use Common Table Expressions (CTEs), which were introduced in SQL Server 2005. Using CTEs also has the benefit of allowing a developer to break up a very complex query into its component parts, which can simplify the logic as well as assessment of exactly which joins dramatically affect performance. In addition, since the query is broken up into component parts, development, maintenance, and readability can be simplified greatly. A CTE is defined to be a “temporary named result set, derived from a simple query and defined within the execution scope of a SELECT, INSERT, UPDATE, or DELETE statement.” “CTEs can be thought of as alternatives to derived tables (subquery), views, and inline user-defined functions.” Source: http://en.wikipedia.org/wiki/Common_table_expressions. The following query provides an illustration of a CTE in its most basic form:

with Top30000 as ( — just check the first 30,000 entries

select top 30000 Duration

from <table> with (nolock)

where Duration is not null

)

select min(Duration) as [Min Duration]

from Top30000

Clearly, this query could be written in one statement and the query plan would undoubtedly be the same regardless of whether it used a CTE. One key thing to remember about CTEs is that they only exist within the scope of the query. For example, any subsequent T-SQL statements will not have access to the Top30000 entity, so it is NOT a temporary table. A slightly more sophisticated example is shown below:

use AdventureWorks2008R2;

with ProductReviewID as (

SELECT p.ProductID, p.Name, pr.ProductReviewID, p.ListPrice

FROM Production.Product p

LEFT OUTER JOIN Production.ProductReview pr ON p.ProductID = pr.ProductID

WHERE p.ProductID = 718

)

SELECT DISTINCT p.ProductID, p.Name, p.ListPrice, sd.UnitPrice AS [Selling Price]

FROM Sales.SalesOrderDetail AS sd

INNER JOIN ProductReviewID AS p ON sd.ProductID = p.ProductID

Again, this query could be constructed without using a CTE. In this particular case, I selected two sample join queries from BOL and combined them into a single query using a CTE. Although the filter is hard-wired in this example, it is illustrative of filtering the result set early in the query to reduce the number of records being used by the last part of the query. Note also that the last portion of the query had to join to the CTE in order for the CTE to be visible. A subquery can also accomplish this.

Most CTE examples illustrate recursion or utilize the following layout that is similar to the one that used AdventureWorks above:

CTE1

CTE2 select from CTE1

Select from CTE2 joining other tables

However, this layout is also possible:

CTE1

CTE2

Select from CTE2 joining CTE1 joining other tables

As is this form:

CTE1

CTE2

CTE3, which selects from CTE2 joining CTE1

Select from CTE3 joining other tables

One method that I have used extensively to develop efficient queries is to build complex queries gradually using the last layout, i.e., select from tables and gradually introduce joins while maintaining excellent performance. It is critical that representative data be used during this process, i.e., the table sizes used for testing should be representative of what the production database contains. For example, most of the large tables in the AdventureWorks database contain approximately 20,000 records. With table sizes this small, almost any query design works well. However, if the tables contain 200,000 or 2,000,000 records the behavior would be quite different. In these situations, joins can provide substantial filtering or actually expand the result set. The advantage of the gradual addition approach is that the developer can identify immediately those joins that significantly degrade performance and then attempt to utilize other joins until only the most troublesome joins remain. Although a very efficient form of the query can often be devised using only CTEs, sometimes, interim results must be stored in a temporary table, which is then joined with the large troublesome table.

In summary, although using CTEs will not force SQL Server to perform joins in the order specfied in the T-SQL code, it can often influence SQL Server to adhere more closely to a desired join ordering sequence, especially if filtering clauses are used early in the CTE sequence. Adding joins against large table at the end of a query can also achieve a similar result. Regardless, using a gradual approach to building queries with CTEs enables developers to understand exactly what the effect of every join is and therefore, create the most efficient query possible.

This is the beginning of a series on CTE’s and performance. Stay tuned for more CTE posts to come. I’ll show some examples of how to take an existing query with joins and convert it to a CTE with performance improvement.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRX@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. We love to talk tech with anyone in our SQL family!

The Pros & Cons of Hyperthreading

— By Jeff Schwartz

Hyperthreading is one of those features that most IT personnel have heard about, but many do not understand exactly what it does or why it may improve or degrade the performance of their servers. Although hyperthreaded processors have been available for many years, the confusion surrounding it is as commonplace today as it was when the technology was introduced in 2002. Server, processor, and motherboard manufacturers have contributed to the confusion by activating the feature by default for several years after it was introduced, then deactivating it by default for a few years, and recently by activating it again by default. In addition, for many years, determining whether it was on or off could be accomplished only by reviewing the BIOS settings during a reboot, which is extremely difficult to schedule for most production servers.

What is Hyperthreading?

Hyperthreading involves having two distinct pipelines into a single processor execution core. Research into code sequences showed that code executing on a processor frequently, and naturally, pauses because of things such as main memory fetches into the processor cache and improper code branch prediction. The rationale is to provide the execution portion of an idling core processor with work to do while these kinds of activities are handled. These two pipelines are presented to the outside world as two distinct processors: one physical and one “logical.” Therefore, as far as any operating system is concerned, twice as many processors are available when hyperthreading is active compared to when it is not.

Why not use it all the time?

In theory, utilizing available processor cycles is a great idea and therefore, this mechanism should be used all the time. However, in practice, some problems arise. Two main issues are involved: processor schedulers and the length of time a unit of work stays on a processor without interruption. Since the execution portion of a processor is still a single entity, scheduling appropriately is critically important, i.e., two threads from the same process should NOT be scheduled on the same core. If they are, each thread will take turns stopping the other and this degrades performance. The difficulty, particularly in the SQL Server environment, is that SQL Server schedules its own threads and depending upon the SQL Server version as well as the Windows version on which it is running, SQL Server may or may not be aware of the physical/logical distinction or know how to handle this distinction properly. Therefore, there is a good chance that it may schedule two dependent threads on the same core. The other issue is that a large or parallelized SQL Server query threads usually consume their full time allotment on a processor before they are suspended. Any thread sitting in the other pipe of a hyperthreaded processor will wait the full time before it is allowed to run again. If that thread is from the same query, the two threads will stop each other from running and make the query run substantially longer. Therefore, under normal conditions, most SQL Server customers do not benefit from employing hyperthreading.

One type of workload, high volume small transaction processing, DOES benefit from hyperthreading, but even then it does not benefit as much as one might expect. Benchmarks and actual user experiences have demonstrated that at most, a 30 percent gain can be obtained when transactions are extremely small, e.g., they execute in under 200 milliseconds. Therefore, if a server only handles small high volume transactions regardless of whether they are inquiry, insert, or update, it will benefit from hyperthreading. As soon as any large queries arrive at the server, the benefit is gone and the problems cited in the previous paragraph arise.

How can I tell whether Hyperthreading is active?

Historically, this has been difficult to determine, but in the past few years, Windows and WMI have improved to provide this information without interrogating the BIOS. Older versions did not provide adequate or accurate information, but beginning with Windows 2008, the information is quite useful. This results from the fact that older versions of Windows either did not know or report the physical/logical processor distinction. The following PowerShell (other languages that can access WMI will work also) commands provide the necessary information. The first command returns the number of sockets on the system followed by the number of logical processors, and the second command returns information regarding the processors themselves. The first output was obtained on a quad-core single-socket system with hyperthreading turned off, and the second output was obtained from the same system with hyperthreading turned on. Clearly, the number of logical processors doubles when hyperthreading is turned on.

gwmi win32_computersystem | fl NumberOfProcessors, NumberOfLogicalProcessors

Blog_20150813_1

Conclusion

Hyperthreading can be useful in a SQL Server environment if the workload is comprised solely of extremely lightweight transactions. Under these conditions, the most one can expect to obtain from hyperthreading is a 30 percent improvement. If the server is running the latest versions of Windows and SQL Server, it may be possible to obtain some improvement for workloads that are highly transactional with a few small reports mixed in. However, in most SQL Server environments, hyperthreading will most likely degrade performance.

Introduction to SQL Server Execution Plans

— By Jeff Schwartz

Recently, a coworker asked me about a query performance problem that had suddenly arisen as they so often do. During testing, this particular query had performed extremely quickly, but when it was shown to a C-level person, it dragged terribly making this impatient person wait 80 seconds for the answer to come back. We all know how valuable their time is, so this had to be solved quickly. He showed me the query, which I had never seen before, along with the execution plan. An execution plan comprises the steps by which SQL Server provides the answer to a query and SSMS provides graphical representations of this mechanism. The graphical plans take two forms: estimated and actual. An estimated plan is what SQL Server anticipates it will do, whereas an actual plan shows what it really did. Database statistics and other factors, e.g., the number of indices, may create significant differences between the two. If possible, the actual plan is the better choice, but sometimes the problematic query runs too long or cannot be executed in the current environment because it changes the database and the locking may create bottlenecks. In these cases, the estimated plan must suffice.

Since the execution plan provides a complete picture of how SQL Server did things (assuming an actual plan), knowing how to interpret these plans is critical for solving performance problems as quickly as possible. In the aforementioned case, the analysis, discussion, and experimentation process only required about 45 minutes to reduce the execution time from 80 seconds to 3 seconds. It also helped everyone understand the data better so these kinds of issues will be less likely to occur in the future.

Execution Plan Basics

Terminology

When a query contains multiple statements, multiple query execution plans are drawn. Nodes, i.e., operators, comprise an execution plan that is also known as a query plan. These nodes specify the operation that will be performed as part of the query or statement. Each node is connected to a parent node by arrows that indicate the estimated or actual number of rows produced by the operator. As shown in Figure 1 and Figure 2 below, the width of the arrow is proportional to the number of rows returned, e.g., an operator that returns 5.5 million rows will be extremely wide whereas one that returns one hundred rows will be very thin.

Blog_20150316_1

Figure 1: Arrow Thickness Example – Small Number of Rows

Blog_20150316_2

Figure 2: Arrow Thickness Example – Large Number of Rows

Details concerning a particular operator can be obtained by hovering over the operator as shown in Figure 3. For complete details, shown in Figure 4, select the operator, right click, and select properties.

Blog_20150316_3

Figure 3: Operator Details Popup

Blog_20150316_4

Figure 4: Operator Details Properties Pane

Efficient Execution Plan Example

The plan below illustrates a simple and efficient execution plan. The SQLTraceReceiver Table contained 20,815,160 rows and the query returned only 243. Only 979 logical reads were performed, the CPU time was 16 milliseconds, and the duration was 2,194 milliseconds (or 2.194 seconds). As cited previously, the arrow widths are very thin indicating that the smallest amount of data possible was returned. This is supported by the fact that a NonClustered Index Seek operator, which looks up a row in the NonClustered index using a key, was used against the ix_AppName index. The Key Lookup operator obtains data from the underlying clustered index. In this particular case, the * in the output list of the select statement causes the Key Lookup. To summarize, SQL Server uses the ix_AppName index to navigate to a specific record and then returns all the columns of that record using the Key Lookup operator.

Blog_20150316_5

Figure 5: Efficient Execution Plan

Inefficient Execution Plan Example

The plan below illustrates an inefficient execution plan. In this case, the use of the ISNULL function in the where clause forces SQL Server to rummage through ALL of the 20+ million records in the table to find the desired 243. A Clustered Index Scan does exactly what its name suggests: it scans the entire table, which in this case, is a clustered index. It is important to note the little arrows in the beige circle. This indicates a parallelized operator and means that the problem is so large it must be broken up into MAXDOP pieces to complete in as timely a manner as possible. Whenever you see a plan with scans of large tables and parallel operator indicators, the query is most likely going to perform poorly. Note: small tables will be scanned regardless, so it is important to understand the number of rows in a table before jumping to the conclusion that the scan is necessarily bad. Although this plan looks simpler, it is much worse in terms of performance as indicated by the following statistics:

  • 1,941,682 logical reads
  • 21,531 milliseconds (21.531 seconds) of CPU time
  • 299,446 milliseconds (299.446 seconds) duration

Blog_20150316_6

Figure 6: Inefficient Execution Plan

The following table summarizes the differences between the two queries:    Blog_20150316_7

Table 1: Comparison of Efficient and Inefficient Execution Plans

Clearly, the first plan was far more efficient and both returned the same data.

Conclusion

Understanding execution plans is critically important for analysts because they enable them to understand why a query that usually runs perfectly, suddenly and consistently takes forever with different parameters. Although this article only discusses a few operators, they are some of the most important ones to examine when a query performs poorly.