Monthly Archives: November 2017

Rx for Demystifying Index Tuning Decisions – Part 6

— by Jeffry Schwartz

Review

In Parts 1 through 5 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices
  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices
  • Overlapping Indices

 

Part 6 contains an overview of dynamic management functions as well as a detailed discussion of the sys.dm_db_index_usage_stats DMV (and how to use it).

 

Dynamic Management Views and Functions

Dynamic Management Views and Functions are “Designed to give you a window into what’s going on inside SQL Server.” There are two types: DMV – Pure view, i.e., no parameters required and DMF – Table-valued function, i.e., parameters required. The parameters usually specify database, table, index, partition, etc. They provide a significant amount of information regarding the system, databases, performance, and internal workings of SQL Server. Most DMVs and DMFs are simple to use, but one must reconcile the numeric IDs with static views that contain textual names.

Most values are accumulated from last SQL Server instance restart. To determine interval-specific values one must calculate differences between individual sample records. However, one must be certain to difference records with same database ID, object ID, index ID, and possibly, partition ID. This data is perfect for periodic or intermittent sampling because no data is lost during the sampling process. Note: sys.dm_db_index_operational_stats may be the exception if a table is used intermittently because when tables haven’t been used for a period of time they are removed from the cache and their metrics are cleared. Capture rates can range from every 30 seconds to a few times per day;   the need for granular analysis normally dictates the collection frequency.

Although several methods for collecting data exist, two general usage scenarios are instructive.

Method 1

  1. Capture a snapshot before monitored activities begin and store the results in a SQL table or a spreadsheet
  2. Execute the workload (whether natural production or test)
  3. Capture a snapshot again and compare value differences
  4. Load into spreadsheets for further analysis, if necessary

Method 2

  1. Capture a snapshot every <n> minutes and store results in a flat file
  2. After the workload and capture processes are complete, load data into SQL tables
  3. Use SQL Server to compare incremental value differences and store results
  4. Extract interval data to spreadsheets for further analysis

Several general and static views are required for converting numeric DMV and DMF identifiers into understandable text

sys.databases

Lists all databases and their IDs so proper associations can be made

sys.partitions

Only way to decode HOBT (Heap or Binary Tree) IDs returned by lock-specific information, e.g., blocked process records and sys.dm_os_waiting_tasks

sys.configurations

Provides information regarding OS and SQL Server configurations

Several database-specific views are needed to convert various database-specific IDs into understandable text. Each of these must be interrogated for each database separately.

sys.objects

Lists all database objects such as tables, views, stored procedures, etc.

sys.indexes

Lists all indices and their associated table IDs

Does not provide row counts as sysindexes does

sys.filegroups

Lists all file groups and their IDs

sys.database_files

Lists all physical database files and their IDs

sys.schemas

Lists all database schemas

Index Related Dynamic Management Views and Functions

Two DMV/DMFs are used to obtain index-related performance information and they are BOTH needed to obtain an accurate and comprehensive perspective regarding index usage. The differences between physical and logical index access metrics were discussed earlier in this series. sys.dm_db_index_usage_stats returns information regarding the query code usage of tables and indices, e.g., inserts, updates, deletes, random accesses, and sequential accesses. These metrics most closely match Query Plan operators, and the information is retained much longer than operational (physical) stats, so it is quite useful for long-term evaluation of index usage, i.e., whether indices are used a great deal or not at all. sys.dm_db_index_operational_stats “returns current low-level I/O, locking, latching, and access method activity for each partition of a table or index in the database.” These metrics track actual index and table operations, e.g., singleton lookups and range scans. Information for certain structures may be deallocated when table no longer in cache, so this is not a reliable source of long-term index usage.

sys.dm_db_index_usage_stats DMV

The sys.dm_db_index_usage_stats DMV requires no input parameters and returns one row per index, the number of seeks, scans, lookups, and updates for user and system queries. As discussed earlier in this series, seeks are random or pseudo-random (single record or range of records as long as a key is used). Scans are fully sequential. The time of the last seek, scan, lookup, and update for user queries is also provided by this function. The metrics are divided into two categories, system and user. The System category is comprised of work generated by maintenance activities, e.g., statistics updates. The User category is comprised of insert, update, delete, and select operations. The metrics report user statements, not record counts. Therefore, one insert statement can result in millions of rows being inserted, so the metric will show one, not millions. This DMV helps determine index and table usage patterns and is particularly useful for identifying indices that are seldom, if ever, used by queries, especially if SQL Server instance has been running for a very long time, e.g., months. As cited previously, these metrics are much more trustworthy for long-term evaluations than operational stats because of possible cache removal issues. Table 8 illustrates the type of information that is available from this function. Each of the highlighted indices was usually accessed sequentially, i.e., a full scan. Since one of the tables contained 71 million rows and the other contained almost 34 million rows, these statistics clearly demonstrate tuning opportunities. Reminder: random accesses can be either single record lookups or a filtered range scan. Table 10 shows data from an actual customer’s SQL Server instance that had been up for 271 days, i.e., approximately nine months. This view highlights the indices that are fully scanned and those that are hardly accessed. The highlighted row in Table 9 shows an index that is updated, but never used. It is easy to become distracted by all the update activity and lose sight of the fact that the index had not been used for inquiry purposes in 271 days.

Blog_20171130_1b

Table 8: Index Usage Stats Summary

 

Blog_20171130_2b

Table 9: Index Usage To-Date Summary (271 Days)

 

The following code uses sys.dm_db_index_usage_stats to list rarely-used indices for a specific database with resolved names and its output appears in Table 10.

declare @dbid int = db_id()

select objectname=object_name(inxusage.object_id), inxusage.object_id, indexname=sysinx.name, sysinx.index_id, user_seeks, user_scans, user_lookups, user_updates

from sys.dm_db_index_usage_stats inxusage,

sys.indexes sysinx

where database_id = @dbid and       objectproperty(inxusage.object_id,‘IsUserTable’) = 1 and   sysinx.object_id = inxusage.object_id and

sysinx.index_id = inxusage.index_id

order by (user_seeks + user_scans + user_lookups + user_updates) asc

Blog_20171130_3b

Table 10: Example – Unused Tbl1 Indices

 

Table 11 illustrates the kind of overall summary report that this view when combined with static index information and the missing index DMVs that will be discussed later can generate. This table shows clearly which SQL Server tables have several indices, how many are duplicated or unused, and how many others were suggested by SQL Server. The row counts are often invaluable for this kind of overall analysis, which can direct an index tuning study.

Blog_20171130_4b

Table 11: Index/Table Overall Summary

 

The next article in this blog series will cover the sys.dm_db_index_operational_stats DMF in detail.   Until then…good luck with your tuning efforts and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 5

— by Jeffry Schwartz

Review

Due to the holiday week, this part will be shorter than usual. In Parts 1 through 4 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices
  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices

 

Part 5 covers the following topic:

  • Overlapping Indices

Overlapping Indices

Overlapping indices duplicate the functionality of one another in some way. The easiest way to understand this concept is to see an actual example as shown in Table 6. The simplest example is Inx4, which completely duplicates Inx3. Since the table contained almost 17 million records, this is a very undesirable situation. Table 7 highlights the fact that SQL Server only used Inx3 ONCE, so this provides even more incentive to eliminate Inx3. This data also indicates that Inx1 can be removed because it was never used. Note: the usage data was obtained using two different index DMVs/DMFs, which will be discussed later.

Blog_20171130_1a

Table 6: Overlapping Index Examples

 

Blog_20171130_2a

Table 7: Overlapping Index Usage

 

Examining the index layouts further we can see clearly that Inx1 through Inx4 could be consolidated into one index that would handle all usage. The composite index is shown below. It may be possible to exclude CurrCode and ECID from the composite index, but more research would need to be conducted before that choice could be committed to safely. Specifically, since the number of User Lookups was quite high on the table, some queries do not have all the columns they require and two of these may be CurrCode and ECID. Query plans are invaluable in answering these kinds of questions.

Blog_20171130_3a

At first glance, it appears that Inx7 and Inx8 might be combined as well. However, since the secondary keys are different and both are used for range scanning only (see Table 7), there is no certain way to combine these and also handle the queries that use them properly. Therefore, it is best to leave these alone.

The information above was determined using the following system views: sys.all_columns, sys.databases, sys.dm_db_partition_stats, sys.index_columns, sys.indexes, and sys.tables.

The next article in this blog series will cover dynamic management views.   Until then….happy tuning and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 4

— by Jeffry Schwartz

Review

In Parts 1 through 3 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

https://blog.sqlrx.com/2017/11/02/rx-for-demystifying-index-tuning-decisions-part-2/

https://blog.sqlrx.com/2017/11/09/rx-for-demystifying-index-tuning-decisions-part-3/

Part 4 covers the following topics:

  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices

Query Optimizer

Overview

The query optimizer determines the appropriate execution plan for a query. “The SQL Server Query Optimizer is a cost-based optimizer.   It analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans and selects the plan with the lowest cost of the choices considered.   Indeed, given that the Query Optimizer cannot consider every possible plan for every query, it actually has to do a cost-based balancing act, considering both the cost of finding potential plans and the costs of plans themselves.”

“Cost-assessment of each plan – While the Query Optimizer DOES NOT GENERATE EVERY POSSIBLE EXECUTION PLAN, it assesses the resource and time cost of each plan it does generate. The plan that the Query Optimizer deems to have the lowest cost of those it’s assessed is selected, and passed along to the Execution Engine.” Source: http://www.simple-talk.com/sql/sql-training/the-sql-server-query-optimizer. Therefore, the best plan is NOT GUARANTEED and too many choices yield indeterminate results.

Statistics

Statistics contain information about distribution of values in one or more columns of table or indexed view. The Query Optimizer uses statistics to estimate the number of rows that a particular query operator will generate. Outdated statistics can cause a query to use a suboptimal index or perform a full scan, so it is prudent to insure that they are up-to-date as much as possible. Large numbers of insertions or key modifications can cause them to be skewed and inaccurate with respect to the current data because even when statistics are updated automatically they may lag behind the actual data. The AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are automatically on by default in Tempdb and often on by default in many other databases. Please see the following article for a complete discussion of statistics and when SQL Server determines that they are sufficiently out-of-date to warrant updating: https://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx.

Out-of-date statistics can be determined with very little overhead by using the stats_date (object_id, index_id) function and the sys.indexes view. Since creating or updating statistics can be expensive depending upon table size, sample size, and frequency, several options are available to keep the overhead at a reasonable level and these are shown below.

  • FULLSCAN (most accurate and expensive to create)
  • SAMPLE <n> PERCENT or ROWS
  • RESAMPLE (uses last settings)
  • ALL or COLUMNS or INDEX

Balancing the need for updated statistics and the costs of creating or updating them can cause customers to turn off all automatic options and then update them manually using the preceding commands. However, sometimes these good intentions can cause surprisingly out-of-date statistics as shown in Table 4 below. This customer did not intend for this situation to occur, but as the reader can see clearly from the example, the statistics of many large tables were not updated for over a year. Obviously, this is an undesirable situation, especially since many thousands of records were inserted into some of these tables daily.

Blog_20171116_1

Table 4: Extremely Outdated Statistics

 

Query Plan Consequences of Too Many Indices

Larger numbers of indices create exponentially more query plan possibilities. When too many choices exist, the Optimizer will give up partway through and just pick the best plan thus far. As more indices are added the problem worsens and compilation times, i.e., processor times, increase to a point. This can be illustrated best by reviewing an actual customer example. In this case, one table had 144 indices attached to it and several others had between 20 and 130 indices. The queries were quite complex with as many as fifteen joins, many of which were outer joins. Query and index tuning were impossible because query performance was often counterintuitive and sometimes nonsensical. Adding an index that addressed a specific query need often made the query run worse one time and better the next. Note: Cached query plan issues, e.g., parameter sniffing or plan reuse were not problems in this case. The only solution was to tear down ENTIRE indexing structure and rebuild it with SQL Server’s guidance and nine days’ worth of production queries. Table 5 summarizes the results of the index restructuring project. The performance of 98 percent of the queries was comparable to or better than it was when the large numbers of indices were present.

Blog_20171116_2

Table 5: Index Optimization Project Results

 

How Did Things Get This Bad?

The original application was built in Microsoft Access and each office used its own version of the database. The company decided to consolidate all of the Access databases into a single corporate-wide SQL Server database and implemented most of the indices that had been used within Access. No formal or cohesive indexing strategy was ever implemented. Desperation combined with bad luck and extremely complex queries created a perfect environment for this kind of problem to arise. The list below summarizes the key points regarding how they got themselves into this mess.

  • 24/7 application because of 3 worldwide time zones and replication
  • Many queries were at least 7,000 characters long with 15 inner and outer joins
  • Performance tanked after more than 100 separate databases were combined into a single SQL Server database
  • Application did not scale and they could not return to separate databases
  • Ran Database Engine Tuning Advisor repeatedly, and blindly added ALL indices it recommended (more later on DTA)
  • Performance worsened, so they added more indices based upon experimentation and hope
  • Database crashed and after restoration from an older database backup (problems occurred during some of the restorations) performance worsened
  • So many indices existed that they did not know which ones still were needed, so they scripted all indices from before and implemented them with new names just to be sure all requisite indices were present in the database
  • They had no idea how to determine unused, unnecessary, or fully or partially duplicated indices

Index Insertion Operation

Every record insertion creates overhead maintenance work because all non-subset indices are updated for every insert. Page splits may be involved if a page has been filled already. Upper index levels may be affected and may split as well. A page split is defined as follows: “When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations.” Locks are held while all this happens, and fillfactor and insertion rate affect frequency of this activity. Further research can be conducted via http://msdn.microsoft.com/en-us/library/ms177459.aspx.

Consequences of Too Much Overhead

Increased overhead leads to lower transaction throughput, increased blocking, and increased deadlocking. These worsen if queries do not use the Nolock query hint because Page and Row locks are acquired and potentially held longer. Use of Nolock can greatly reduce these problems for selection queries — do not forget to add the Nolock hint to join clauses. Research transaction level isolation for further information. In high volume record insertion cases, page latch wait time can become a major constraining factor.

The next article in this blog series will cover how to detect indices whose functionality is duplicated by other indices.   Until then….happy tuning and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Bob Ward to speak at DFW BI & Analytics Community Meeting on 12/6

DFW_BI_Community

Come see Bob Ward from Microsoft speak about “Fast and Furious: SQL 2017” at the DFW BI & Analytics Community Meeting on 12/6, 11:30-1:00 at Noah’s in Irving.  You can register on the “Event” section of the ISI website at www.isi85.com .

The DFW Business Intelligence and Analytics Community welcomes Bob Ward of the Microsoft Database Systems Group. You may already know him from his blog SQL Server According to Bob.   If you were unable to attend 2017 PASS Summit, Bob will be giving the same groundbreaking presentation he delivered there.

Is your database slowing down your business? Are you using Linux as a platform but can’t find the right database that meets your performance needs without busting your budget? Built on the industry recognized “it just runs faster” SQL Server 2016 engine, SQL Server 2017 provides new capabilities for performance, value, and choice. In this session, we show you features and a “behind the scenes look” at what powers SQL Server 2017 for Linux and Windows. And if you are a database expert and want to tune the SQL Server engine to its maximum turbocharged capacity, we show you the tools, diagnostics, and techniques that make SQL Server 2017 the ultimate database platform choice to run your applications at a fast and furious speed.

Rx for Demystifying Index Tuning Decisions – Part 3

— by Jeffry Schwartz

Review

In Parts 1 and 2 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

https://blog.sqlrx.com/2017/11/02/rx-for-demystifying-index-tuning-decisions-part-2/

Part 3 covers the following topic:

  • Using query plans to determine relationships between queries and indices

Query or Execution Plans

Query plans have already been mentioned as being useful for query tuning. They can be invaluable for determining which part of a query causes performance issues, which indices are used, and HOW the indices are accessed, e.g., randomly or sequentially. A full discussion of query plans is beyond the scope of this presentation, but graphical plans are used most often. Graphical plans come in two forms: Estimated (predicted behavior) and Actual (actual behavior). XML plans contain additional information, but they are difficult to read even with an XML editor. Querying the XML data can provide invaluable information, but the queries to extract this information can consume a great deal of CPU time, especially when many complex plans must be examined, so it is best to peruse this kind of data on a server that is not used for production work whenever possible.

Interpreting Graphical Query Plans

It is important for any database analyst to have a basic understanding of how to read query plans. These plans are graphical flow diagrams of how the query will be executed and what information will flow from one operator to another. The diagrams flow from bottom right to top left, and quite often the most crucial operators are near the right and sometimes near the bottom as well. Each node in the tree structure is represented by an icon and specifies the logical operator that will be used to execute that portion of the query. A popup appears displaying details about the operator when the mouse pointer is placed over an operator. This information is also available, and often easier to read, via the properties window (F4 in SSMS or right-click the node and click Properties). Each node is related to a parent node by arrows whose width is proportional to the number of rows returned by the operator. Note: The actual number of rows is used when available. Otherwise, the estimated number of rows is used. Sometimes, these are vastly different and these differences often indicate that statistics are not up-to-date or sufficiently comprehensive.

Large operators may run on parallel threads because the single-threaded version would run too long. For parallel queries that involve multiple threads/CPUs the node properties will display information about the operating system threads used. Parallelized nodes have graphical indicators, two arrows on a beige circle, to inform the viewer that the operator will be divided into multiple pieces for execution as shown in Figure 1 and Figure 2.

Blog_20171109_1

Figure 1: Query Plan Parallelism Operator

It is important to know what the most common data access operators are, as well as how to interpret them. An index seek operator indicates the retrieval of one or more rows using keys. Clustered or NonClustered indicate the type of index that is accessed, and a NonClustered Index Seek could possibly indicate the use of a covering index. Key and RID Lookups indicate a lookup of a row using the row ID or clustering key (heap or clustered index) to obtain information from the record that is not contained in the nonclustered index that was used for primary access. Key and RID Lookup operators are most often associated with a nested loop operator and a nonclustered seek operator. The Key and RID Lookup operators usually produce output, but not always. When they are present, the output column list indicates a possible opportunity to add those columns to the nonclustered index to create a covering index for this query. When the output column is missing, these operators are used to effect a join. A simple way to determine whether output columns exist is to view the Output List that is displayed for the operator popup like the list shown in Figure 3. Index Scan (Clustered or NonClustered) operators scan ALL rows of the index even if a where clause is used. Scan operators read ALL rows from a table or clustered index regardless of the where clause. All scans should be avoided a much as possible unless the tables are small. Examples of a clustered index scan operator and its properties are shown in Figure 2 and Figure 3.

 

Blog_20171109_2

Figure 2: Clustered Index Scan Operator Graphic

 

Blog_20171109_3

Figure 3: Clustered Index Scan Operator Properties

 

Other commonly used operators include nested loop, bitmap, and hash match. Nested Loops perform inner join, left outer join, left semi join, and left anti semi join logical operations. These operators are potentially very deceptive because although each iteration of a loop might be very efficient, there may be millions of iterations. Bitmap operators may speed up query execution by eliminating rows with key values that cannot produce any join records before passing rows through another operator such as a Parallelism operator. Hash Match operators build hash tables by computing a hash value for each row from its input. These are usually very bad and frequently indicate that the query and its joins should be rewritten.

The plan for the query below (originally covered in Part 2)….

Blog_20171102_4

XEvents Statement Completed Sample Output

 

…is shown in Figure 4 and the properties for the nonclustered index scan are shown in Figure 5. The highlighted portions illuminate some important aspects of the query plan. The first one highlights the missing index recommendation and illustrates the key and included columns for the suggested index that would make this query run faster. The second highlight emphasizes the hash match, and the last one illustrates why the nonclustered index scan occurred. Specifically, the EventClass column was used by the index scan operator to look up records in the table with values of 166 because no appropriate index existed on the CompletedEventText table. The output columns, RowNumber and QueryChecksum, were returned by the index scan operator. It is important to note that the missing index recommendation specifies the same columns that are shown in the Output List in Figure 5. Generally speaking, index and table scan operators do not produce missing index recommendations, which will be discussed in more detail later in this series. Note: Prior to SQL Server 2014, the author had never observed an index scan operator that produced a missing index recommendation. This appears to be a function of the new Cardinality Estimator that was implemented in SQL Server 2014 and requires that the database compatibility level be set to SQL Server 2014 or later to use it. Otherwise, the old estimator that goes back to SQL Server 7.0 is used. Further information regarding the cardinality estimator can be found at https://blogs.technet.microsoft.com/dataplatform/2017/03/22/sql-server-2016-new-features-to-deal-with-the-new-ce/.

Blog_20171109_5

Figure 4: Query Plan with Hash Match and Index Scan Operators

 

 

Blog_20171109_6

Figure 5: Index Scan Operator Properties from Query Plan with Hash Match and Index Scan

 

The next article in this blog series will cover the query optimizer.   Until then….happy tuning and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 2

— by Jeffry Schwartz

Review

In Part 1 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

Part 2 covers the following topics:

  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events

Determining Queries that Need Indices the Most

It is simple in a test or development environment using a significant subset of production data to obtain an actual (and useful) query plan of a troublesome query. Business transactions often require many queries to complete so determining which ones are the most troublesome can be difficult. The author has observed countless customers who were certain they knew exactly where the performance problems were only to find out the problems were elsewhere. Therefore, collecting the appropriate performance data on a production server is critical in determining the queries that need tuning most. Examples of query performance criteria are listed below:

  • Individually or collectively run too long
  • Individually perform too much work, e.g., reads or CPU
  • Collectively perform too much work, e.g., 200,000 executions that each execute 50,000 reads -> 10 BILLION reads in total

Table 1 shows examples of queries that potentially need tuning based upon the number of executions, total reads, total duration, total CPU time, and average reads per execution. This kind of report immediately focuses attention on the queries that might benefit the most from either index or query tuning. The five queries highlighted in Table 1 underscore these criteria. The ones highlighted in yellow were the worst offenders because their executions collectively performed the most reads with the worst one totaling 3.5 BILLION reads. The ones highlighted in light green and orange accounted for the most CPU time as well as the longest total duration. The one highlighted in slate ran the most times, and the ones highlighted in gray performed the most reads per execution. This information is vital when determining where query and index tuning should be focused. Table 2 provides additional information in that it shows whether the large numbers of reads were consistent regardless of the specified input parameters or if they were parameter-dependent. For example, the 80th percentile of the third query was 453,515, which means that 80 percent of the queries performed 453,515 reads or fewer. The 100th percentile represents the maximum of 1.724 billion reads and tells the analyst that 20 percent of the queries performed at least 453,515 reads. This data informs the analyst that the specific parameters of the execution that performed 1.724 billion reads should be used for tuning. The read behavior of the last two queries, highlighted in light green and light orange, indicates that the choice of parameters does not make any notable difference in execution behavior. The query highlighted in light blue exhibits similar performance. This kind of information is invaluable because the author has observed too many tuning efforts on development or test machines that could not reproduce the performance of the query in the production environment. Sometimes, this results from a dramatic difference in the sizes of the respective databases, but other times it results from an inappropriate choice of parameters.

Blog_20171102_1

Table 1: Summary Query Metrics Example

 

Blog_20171102_2

Table 2: Query Reads Distribution Example

 

Extended Events (XEvents)

Extended Events are very useful for capturing comprehensive workload performance information because they are lightweight and can be tailored to capture only the most important information. They were introduced in SQL Server 2008, but didn’t mature until SQL Server 2012. A full treatment of how to capture important query performance data using XEvents is beyond scope of this blog series. However, further information can be viewed at https://blog.sqlrx.com/2016/01/28/sql-server-20122014-extended-events-for-developers-part-1. XEvents use the concept of a session that must be configured, started, and stopped to capture data. This can be accomplished using SSMS or T-SQL. The basic steps involved with capturing and reporting XEvents data are listed below:

  1. Create Event Session specifying types of events to capture based upon their duration, CPU time consumption, database ID, etc.
  2. Start session
  3. Stop session
  4. Either use SSMS or T-SQL XML query to extract the captured data (SSMS easiest way to get started)

An example of some T-SQL session specification code is shown in Figure 1. Important note: The Duration filter is in MICROSECONDS. Use the following T-SQL command to start the session: ALTER EVENT SESSION WorstQueries ON SERVER STATE = START and this command to stop the session: ALTER EVENT SESSION WorstQueries ON SERVER STATE = STOP. Once the session has been stopped, open WorstQueries.xel using SSMS or use T-SQL to report or load the data into a database table. Note: XEL files can be viewed in real-time, if necessary. Figure 2 displays a completed statement sample viewed via SSMS.

Blog_20171102_3

Figure 1: Code that Creates WorstQueries Session

 

Blog_20171102_4

Figure 2: XEvents Statement Completed Sample Output

 

The next article in this blog series will cover query/execution plans.   The fun stuff is coming up!!  Until then….happy tuning and please let us know if you have any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!