Tag Archives: Performance Tuning

Rx for Demystifying Index Tuning Decisions – Part 8

— by Jeffry Schwartz

Review

Due to the holiday week, this part will be shorter than usual. In Parts 1 through 4 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices
  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices
  • Overlapping Indices
  • Overview of Dynamic Management Functions and a detailed discussion of the sys.dm_db_index_usage_stats DMV
  • Detailed discussion of the sys.dm_db_index_operational_stats DMF (and how to use it)

 

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

https://blog.sqlrx.com/2017/11/02/rx-for-demystifying-index-tuning-decisions-part-2/

https://blog.sqlrx.com/2017/11/09/rx-for-demystifying-index-tuning-decisions-part-3/

https://blog.sqlrx.com/2017/11/16/rx-for-demystifying-index-tuning-decisions-part-4/

https://blog.sqlrx.com/2017/11/30/rx-for-demystifying-index-tuning-decisions-part-5/

https://blog.sqlrx.com/2017/11/30/rx-for-demystifying-index-tuning-decisions-part-6/

https://blog.sqlrx.com/2017/12/07/rx-for-demystifying-index-tuning-decisions-part-7/

 

Part 8 contains discusses how to determine missing indices and their estimated impact (if implemented) as well as incorporating missing index recommendations either into existing indices or into as few new indices as possible.

Missing Index Information

SQL Server monitors missing index warnings (same ones as shown in query plans) and records the estimates of the relative cost of not having the recommended index as queries execute. I/O plays a dominant role in these statistics because the expected I/O amounts affect estimated cost and improvement factors. Execution counts also affect these numbers. The impact shown in Figure 9 is quite high and indicates that the query could make good use of the recommended index if it were implemented. In this example, the key lookup accounts for all the work performed by the query and is no doubt the reason for the missing index recommendation.

Blog_20171214_1

Missing Index DM Views

Four views are used to derive the missing index statistics recommendations:

  • sys.dm_db_missing_index_columns
  • sys.dm_db_missing_index_details
  • sys.dm_db_missing_index_group_stats
  • sys.dm_db_missing_index_groups

These four DMV/DMFs are dependent upon the missing index warnings that would appear in SSMS query plans. Although Table 13 does not show the actual index recommendations, it does show the numerics involved. Improvement Measure and User Impact are frequently good indicators of potentially useful indices. The number of query executions, referred to as user seeks in the DMV, is also a good indicator of an index that would benefit multiple query executions. This metric can also be useful for locating specific queries that might need this index because the counts can sometimes limit the search candidates. The reader should note that the names of some tables are repeated multiple times, indicating that multiple index recommendations were suggested by SQL Server.
Blog_20171214_2

Table 14 summarizes the recommendations by table and shows another way this data can be viewed. The number of recommended indices for the jkl_abc.dbo.dly_st_mtc_smy, jkl_abc.dbo.hr_st_mtc_smy, and jkl_def.dbo.nglentUS2017 tables are instructive because they are 28, 52, and 51, respectively. Clearly, these cannot be implemented without creating significant amounts of overhead and repeating the excessive indexing problems cited earlier in this blog series. Clearly, as indicated by the Total Improvement Measure and the Recommended Index Count, the jkl_abc.dbo.dly_st_mtc_smy table needs a great deal of indexing tuning help. Resolving the jkl_def.dbo.KDS_spofsvc table’s indexing issues should be simpler to resolve because SQL Server only recommended three indices.

Blog_20171214_3

It is also important to realize that these recommendations are driven by the queries, so very often many similar recommendations will be provided by SQL Server since it does not compare existing indices with recommended ones. The recommendations often make extensive use of included columns even to the extent that sometimes almost all of the columns in the table are recommended for inclusion. SQL Server treats every index as if it were new, even if an existing index could be modified slightly to accommodate the recommendation. The duplication issue is shown clearly in Table 15. All of these recommendations are for the same table and only show those recommendations that began with CID as the first key. Most of these recommendations are not worth implementing given the low improvement measure, but the 274,170 one might be worth considering. The table also shows that 35 indices already exist on this table, so adding more indices is not advisable unless absolutely necessary. Even then, every opportunity to drop an existing index to make room for a new one should be exercised. Using a cross-tabular format like the one below makes comparing the various recommendations much easier. In addition, the reader should note the operator in parentheses (= or <). The equality ones make things much easier because as long as both CID and CO are present, the order is not as critical. Therefore, an existing index that had CO as the first key already might be useful if the CID key were added. Using the color-coded cross-tab format also makes it much easier to identify the included columns that are common to or different from those of the other missing index recommendations.

Blog_20171214_4

Two other important points should be made regarding these recommendations and the DMVs. The cited DMVs also provide the data for the Database Tuning Advisor, although Microsoft says the tool is more comprehensive and “much better.” It is often tempting to assume that any poorly performing query will generate some sort of warning, especially those that perform full scans. As cited previously, versions of SQL Server prior to SQL Server 2014 never seemed to generate any missing index recommendations for queries that performed full table or index scans. Under certain conditions, the new optimizer seems to generate missing index recommendations for some full scans. However, if the query makes extensive use of numerous nested loops, recommendations will often still not be produced by SQL Server. For further information regarding missing index DMVs, please consult the following links:

http://blogs.msdn.com/b/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx and http://www.mssqltips.com/tip.asp?tip=1634.

 

Database Engine Tuning Advisor (DTA)

This tool replays a single, previously captured workload, and enables the analyst to determine how this workload might perform on a new server, with a new version of SQL Server, or using a revised index structure. Unfortunately, the analyst does not have a great deal of control over how the workload is replayed and it uses a single T-SQL script or trace table replay quite often. The analyst can choose to manually or automatically implement the recommendations. Since it uses the missing index DMVs, it often implements many of the indices with the highest factors without regard to existing indices, i.e., no consideration is given to whether functionality would be duplicated or whether any existing index could be adjusted to satisfy the query’s need. This often results in significant duplication of indices, i.e., the overlapping indices that were discussed earlier. Since it often favors covering indices, it frequently implements indices that duplicate large portions of the table in order to make the index a covering one. The new indices use the _DTA prefix to distinguish them from user-generated indices.

Alternatives to Database Engine Tuning Advisor

Several methods exist for replaying a workload. One method is to replay the workload using Profiler and reload the database after each test. Other methods must be used when the databases being tested are huge, therefore requiring many hours to reload. One method is to use a read-only database, but this method does not work as well as it once did. On earlier releases of SQL Server, a query that changed the database would execute up to the point at which the database change was attempted. However, attempting this method on SQL Server 2016 does not work because the query fails immediately. Another method is to run the inquiry-only queries in parallel while controlling thread levels, and run the update queries sequentially utilizing a begin tran/rollback tran for each query that changes the database. This causes the updates to take longer, not only because the queries are run sequentially, but also because work is required to roll the transaction out. Fortunately, the work performed by the query is recorded separately from the rollback portion, so before-and-after query performance comparison is still easy to do. A variation of the last method is to execute only the queries that work with specific tables. This allows the testing to be more surgical.

Conclusion

Index tuning can be difficult, but it definitely can be accomplished. It is essential to understand how queries and indices interact as well as how to read and understand Query Plans, Index Usage and Operational Stats metrics, and Missing Index recommendations and DMVs. Capturing performance data from SQL Server instance that has been running for as long as possible is critical to insure that index usage is fully understood. The longer the instance has been running, the more comprehensive these metrics will be. Using some form of SQL trace replay is the ONLY way to perform comprehensive index tuning because it is surprising how many times index usage will be counterintuitive. Therefore, the analyst MUST see it in action! Significant index redesign can often be completed with 2-5 weeks’ worth of effort, and the results can be quite dramatic as shown in this series.

This concludes the Rx for Demystifying Index Tuning Decisions blog series. Hopefully, the reader has found the series enlightening and useful. Please let us know if you have any questions or suggestions for additional articles. Best of luck with all your tuning efforts and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

 

Rx for Demystifying Index Tuning Decisions – Part 7

— by Jeffry Schwartz

Review

In Parts 1 through 6 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices
  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices
  • Overlapping Indices
  • Overview of Dynamic Management Functions and a detailed discussion of the sys.dm_db_index_usage_stats DMV

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

https://blog.sqlrx.com/2017/11/02/rx-for-demystifying-index-tuning-decisions-part-2/

https://blog.sqlrx.com/2017/11/09/rx-for-demystifying-index-tuning-decisions-part-3/

https://blog.sqlrx.com/2017/11/16/rx-for-demystifying-index-tuning-decisions-part-4/

https://blog.sqlrx.com/2017/11/30/rx-for-demystifying-index-tuning-decisions-part-5/

https://blog.sqlrx.com/2017/11/30/rx-for-demystifying-index-tuning-decisions-part-6/

 

Part 7 contains a detailed discussion of the sys.dm_db_index_operational_stats DMF (and how to use it).

sys.dm_db_index_operational_stats DMF

The sys.dm_db_index_operational_stats DMF requires four parameters and returns one row per index. It provides additional information regarding how the seeks and scans from the sys.dm_db_index_usage_stats DMV are actually implemented within SQL Server. For example, this DMF records how many range and table scans or single record retrievals were performed against an index or heap. Note: although heaps are not indices, their activity is recorded by this DMF using an index ID of zero. Other important information regarding leaf and non-leaf level page and row lock and latch wait times is also provided. Finally, page split information for both the leaf and non-leaf levels is provided via this DMF as well.

Four parameters are required:

{ database_id | NULL | 0 | DEFAULT } (use db_id() for current db)

{ object_id | NULL | 0 | DEFAULT }

{ index_id | NULL | -1 | DEFAULT }

{ partition_number | NULL | 0 | DEFAULT }

 

Use NULL parameters to obtain information for all available entities for a given level, e.g., databases or tables. The following command will return information for all databases, tables, indices, and partitions that are attached to a particular SQL Server instance: select * from sys.dm_db_index_operational_stats (NULL, NULL, NULL, NULL).

Combining the outputs of sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats enable the analyst to determine whether the majority of usage seeks actually resulted index range scans or single record lookups. This combination can also be used to identify the approximate percentage of heap or clustered single record accesses that resulted from RID/Key lookups because these are usually performed in loops of single record accesses. Another excellent source of information involves row, page, page latch, and page I/O latch wait times because these can point to specific tables and indices (and indirectly, keys) that are involved in application delays. Finally, invaluable information involving page splits is also available via this DMF. Note: all of this information is available for both leaf and non-leaf levels. The lists below are provided to aid the reader in addressing specific performance problems.

To analyze common table or index partition access pattern use

  • leaf_insert_count
  • leaf_delete_count
  • leaf_update_count
  • leaf_ghost_count
  • range_scan_count
  • singleton_lookup_count

To identify overhead & contention caused by latching and locking

  • row_lock_count and page_lock_count

These values Indicate how many times Database Engine tried to acquire row and page locks (overhead)

  • row_lock_wait_in_ms and page_lock_wait_in_ms

These values Indicate whether lock contention exists on index or heap, and significance of contention

  • page_latch_wait_count and page_latch_wait_in_ms

These values Indicate whether latch contention exists on index or heap, and significance of contention

To analyze statistics of physical I/Os on an index or heap partition

  • page_io_latch_wait_count and page_io_latch_wait_in_ms

These values Indicate how many physical I/Os were issued to bring index or heap pages into memory and how much waiting was involved

Table 12 demonstrates the kind of report that is available using this data. The highlighted SQL Server tables in this example show that the ix_ci_RowNumEvent index is used almost exclusively for single record lookups, whereas the ix_RowNumEvent index is only ranged scanned. ix_CETRowNum and ix_Checksum are used similarly. This kind of report can be extended easily to include various wait types as well.

Blog_20171207_1

Table 12: Index Operational Stats Summary

 

The next and final article in this blog series will cover determining missing indices and their estimated impact if implemented as well as incorporating missing index recommendations, either into existing indices or as few indices as possible. Until then…good luck with your tuning efforts and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 6

— by Jeffry Schwartz

Review

In Parts 1 through 5 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices
  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices
  • Overlapping Indices

 

Part 6 contains an overview of dynamic management functions as well as a detailed discussion of the sys.dm_db_index_usage_stats DMV (and how to use it).

 

Dynamic Management Views and Functions

Dynamic Management Views and Functions are “Designed to give you a window into what’s going on inside SQL Server.” There are two types: DMV – Pure view, i.e., no parameters required and DMF – Table-valued function, i.e., parameters required. The parameters usually specify database, table, index, partition, etc. They provide a significant amount of information regarding the system, databases, performance, and internal workings of SQL Server. Most DMVs and DMFs are simple to use, but one must reconcile the numeric IDs with static views that contain textual names.

Most values are accumulated from last SQL Server instance restart. To determine interval-specific values one must calculate differences between individual sample records. However, one must be certain to difference records with same database ID, object ID, index ID, and possibly, partition ID. This data is perfect for periodic or intermittent sampling because no data is lost during the sampling process. Note: sys.dm_db_index_operational_stats may be the exception if a table is used intermittently because when tables haven’t been used for a period of time they are removed from the cache and their metrics are cleared. Capture rates can range from every 30 seconds to a few times per day;   the need for granular analysis normally dictates the collection frequency.

Although several methods for collecting data exist, two general usage scenarios are instructive.

Method 1

  1. Capture a snapshot before monitored activities begin and store the results in a SQL table or a spreadsheet
  2. Execute the workload (whether natural production or test)
  3. Capture a snapshot again and compare value differences
  4. Load into spreadsheets for further analysis, if necessary

Method 2

  1. Capture a snapshot every <n> minutes and store results in a flat file
  2. After the workload and capture processes are complete, load data into SQL tables
  3. Use SQL Server to compare incremental value differences and store results
  4. Extract interval data to spreadsheets for further analysis

Several general and static views are required for converting numeric DMV and DMF identifiers into understandable text

sys.databases

Lists all databases and their IDs so proper associations can be made

sys.partitions

Only way to decode HOBT (Heap or Binary Tree) IDs returned by lock-specific information, e.g., blocked process records and sys.dm_os_waiting_tasks

sys.configurations

Provides information regarding OS and SQL Server configurations

Several database-specific views are needed to convert various database-specific IDs into understandable text. Each of these must be interrogated for each database separately.

sys.objects

Lists all database objects such as tables, views, stored procedures, etc.

sys.indexes

Lists all indices and their associated table IDs

Does not provide row counts as sysindexes does

sys.filegroups

Lists all file groups and their IDs

sys.database_files

Lists all physical database files and their IDs

sys.schemas

Lists all database schemas

Index Related Dynamic Management Views and Functions

Two DMV/DMFs are used to obtain index-related performance information and they are BOTH needed to obtain an accurate and comprehensive perspective regarding index usage. The differences between physical and logical index access metrics were discussed earlier in this series. sys.dm_db_index_usage_stats returns information regarding the query code usage of tables and indices, e.g., inserts, updates, deletes, random accesses, and sequential accesses. These metrics most closely match Query Plan operators, and the information is retained much longer than operational (physical) stats, so it is quite useful for long-term evaluation of index usage, i.e., whether indices are used a great deal or not at all. sys.dm_db_index_operational_stats “returns current low-level I/O, locking, latching, and access method activity for each partition of a table or index in the database.” These metrics track actual index and table operations, e.g., singleton lookups and range scans. Information for certain structures may be deallocated when table no longer in cache, so this is not a reliable source of long-term index usage.

sys.dm_db_index_usage_stats DMV

The sys.dm_db_index_usage_stats DMV requires no input parameters and returns one row per index, the number of seeks, scans, lookups, and updates for user and system queries. As discussed earlier in this series, seeks are random or pseudo-random (single record or range of records as long as a key is used). Scans are fully sequential. The time of the last seek, scan, lookup, and update for user queries is also provided by this function. The metrics are divided into two categories, system and user. The System category is comprised of work generated by maintenance activities, e.g., statistics updates. The User category is comprised of insert, update, delete, and select operations. The metrics report user statements, not record counts. Therefore, one insert statement can result in millions of rows being inserted, so the metric will show one, not millions. This DMV helps determine index and table usage patterns and is particularly useful for identifying indices that are seldom, if ever, used by queries, especially if SQL Server instance has been running for a very long time, e.g., months. As cited previously, these metrics are much more trustworthy for long-term evaluations than operational stats because of possible cache removal issues. Table 8 illustrates the type of information that is available from this function. Each of the highlighted indices was usually accessed sequentially, i.e., a full scan. Since one of the tables contained 71 million rows and the other contained almost 34 million rows, these statistics clearly demonstrate tuning opportunities. Reminder: random accesses can be either single record lookups or a filtered range scan. Table 10 shows data from an actual customer’s SQL Server instance that had been up for 271 days, i.e., approximately nine months. This view highlights the indices that are fully scanned and those that are hardly accessed. The highlighted row in Table 9 shows an index that is updated, but never used. It is easy to become distracted by all the update activity and lose sight of the fact that the index had not been used for inquiry purposes in 271 days.

Blog_20171130_1b

Table 8: Index Usage Stats Summary

 

Blog_20171130_2b

Table 9: Index Usage To-Date Summary (271 Days)

 

The following code uses sys.dm_db_index_usage_stats to list rarely-used indices for a specific database with resolved names and its output appears in Table 10.

declare @dbid int = db_id()

select objectname=object_name(inxusage.object_id), inxusage.object_id, indexname=sysinx.name, sysinx.index_id, user_seeks, user_scans, user_lookups, user_updates

from sys.dm_db_index_usage_stats inxusage,

sys.indexes sysinx

where database_id = @dbid and       objectproperty(inxusage.object_id,‘IsUserTable’) = 1 and   sysinx.object_id = inxusage.object_id and

sysinx.index_id = inxusage.index_id

order by (user_seeks + user_scans + user_lookups + user_updates) asc

Blog_20171130_3b

Table 10: Example – Unused Tbl1 Indices

 

Table 11 illustrates the kind of overall summary report that this view when combined with static index information and the missing index DMVs that will be discussed later can generate. This table shows clearly which SQL Server tables have several indices, how many are duplicated or unused, and how many others were suggested by SQL Server. The row counts are often invaluable for this kind of overall analysis, which can direct an index tuning study.

Blog_20171130_4b

Table 11: Index/Table Overall Summary

 

The next article in this blog series will cover the sys.dm_db_index_operational_stats DMF in detail.   Until then…good luck with your tuning efforts and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 5

— by Jeffry Schwartz

Review

Due to the holiday week, this part will be shorter than usual. In Parts 1 through 4 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices
  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices

 

Part 5 covers the following topic:

  • Overlapping Indices

Overlapping Indices

Overlapping indices duplicate the functionality of one another in some way. The easiest way to understand this concept is to see an actual example as shown in Table 6. The simplest example is Inx4, which completely duplicates Inx3. Since the table contained almost 17 million records, this is a very undesirable situation. Table 7 highlights the fact that SQL Server only used Inx3 ONCE, so this provides even more incentive to eliminate Inx3. This data also indicates that Inx1 can be removed because it was never used. Note: the usage data was obtained using two different index DMVs/DMFs, which will be discussed later.

Blog_20171130_1a

Table 6: Overlapping Index Examples

 

Blog_20171130_2a

Table 7: Overlapping Index Usage

 

Examining the index layouts further we can see clearly that Inx1 through Inx4 could be consolidated into one index that would handle all usage. The composite index is shown below. It may be possible to exclude CurrCode and ECID from the composite index, but more research would need to be conducted before that choice could be committed to safely. Specifically, since the number of User Lookups was quite high on the table, some queries do not have all the columns they require and two of these may be CurrCode and ECID. Query plans are invaluable in answering these kinds of questions.

Blog_20171130_3a

At first glance, it appears that Inx7 and Inx8 might be combined as well. However, since the secondary keys are different and both are used for range scanning only (see Table 7), there is no certain way to combine these and also handle the queries that use them properly. Therefore, it is best to leave these alone.

The information above was determined using the following system views: sys.all_columns, sys.databases, sys.dm_db_partition_stats, sys.index_columns, sys.indexes, and sys.tables.

The next article in this blog series will cover dynamic management views.   Until then….happy tuning and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 4

— by Jeffry Schwartz

Review

In Parts 1 through 3 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

https://blog.sqlrx.com/2017/11/02/rx-for-demystifying-index-tuning-decisions-part-2/

https://blog.sqlrx.com/2017/11/09/rx-for-demystifying-index-tuning-decisions-part-3/

Part 4 covers the following topics:

  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices

Query Optimizer

Overview

The query optimizer determines the appropriate execution plan for a query. “The SQL Server Query Optimizer is a cost-based optimizer.   It analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans and selects the plan with the lowest cost of the choices considered.   Indeed, given that the Query Optimizer cannot consider every possible plan for every query, it actually has to do a cost-based balancing act, considering both the cost of finding potential plans and the costs of plans themselves.”

“Cost-assessment of each plan – While the Query Optimizer DOES NOT GENERATE EVERY POSSIBLE EXECUTION PLAN, it assesses the resource and time cost of each plan it does generate. The plan that the Query Optimizer deems to have the lowest cost of those it’s assessed is selected, and passed along to the Execution Engine.” Source: http://www.simple-talk.com/sql/sql-training/the-sql-server-query-optimizer. Therefore, the best plan is NOT GUARANTEED and too many choices yield indeterminate results.

Statistics

Statistics contain information about distribution of values in one or more columns of table or indexed view. The Query Optimizer uses statistics to estimate the number of rows that a particular query operator will generate. Outdated statistics can cause a query to use a suboptimal index or perform a full scan, so it is prudent to insure that they are up-to-date as much as possible. Large numbers of insertions or key modifications can cause them to be skewed and inaccurate with respect to the current data because even when statistics are updated automatically they may lag behind the actual data. The AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are automatically on by default in Tempdb and often on by default in many other databases. Please see the following article for a complete discussion of statistics and when SQL Server determines that they are sufficiently out-of-date to warrant updating: https://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx.

Out-of-date statistics can be determined with very little overhead by using the stats_date (object_id, index_id) function and the sys.indexes view. Since creating or updating statistics can be expensive depending upon table size, sample size, and frequency, several options are available to keep the overhead at a reasonable level and these are shown below.

  • FULLSCAN (most accurate and expensive to create)
  • SAMPLE <n> PERCENT or ROWS
  • RESAMPLE (uses last settings)
  • ALL or COLUMNS or INDEX

Balancing the need for updated statistics and the costs of creating or updating them can cause customers to turn off all automatic options and then update them manually using the preceding commands. However, sometimes these good intentions can cause surprisingly out-of-date statistics as shown in Table 4 below. This customer did not intend for this situation to occur, but as the reader can see clearly from the example, the statistics of many large tables were not updated for over a year. Obviously, this is an undesirable situation, especially since many thousands of records were inserted into some of these tables daily.

Blog_20171116_1

Table 4: Extremely Outdated Statistics

 

Query Plan Consequences of Too Many Indices

Larger numbers of indices create exponentially more query plan possibilities. When too many choices exist, the Optimizer will give up partway through and just pick the best plan thus far. As more indices are added the problem worsens and compilation times, i.e., processor times, increase to a point. This can be illustrated best by reviewing an actual customer example. In this case, one table had 144 indices attached to it and several others had between 20 and 130 indices. The queries were quite complex with as many as fifteen joins, many of which were outer joins. Query and index tuning were impossible because query performance was often counterintuitive and sometimes nonsensical. Adding an index that addressed a specific query need often made the query run worse one time and better the next. Note: Cached query plan issues, e.g., parameter sniffing or plan reuse were not problems in this case. The only solution was to tear down ENTIRE indexing structure and rebuild it with SQL Server’s guidance and nine days’ worth of production queries. Table 5 summarizes the results of the index restructuring project. The performance of 98 percent of the queries was comparable to or better than it was when the large numbers of indices were present.

Blog_20171116_2

Table 5: Index Optimization Project Results

 

How Did Things Get This Bad?

The original application was built in Microsoft Access and each office used its own version of the database. The company decided to consolidate all of the Access databases into a single corporate-wide SQL Server database and implemented most of the indices that had been used within Access. No formal or cohesive indexing strategy was ever implemented. Desperation combined with bad luck and extremely complex queries created a perfect environment for this kind of problem to arise. The list below summarizes the key points regarding how they got themselves into this mess.

  • 24/7 application because of 3 worldwide time zones and replication
  • Many queries were at least 7,000 characters long with 15 inner and outer joins
  • Performance tanked after more than 100 separate databases were combined into a single SQL Server database
  • Application did not scale and they could not return to separate databases
  • Ran Database Engine Tuning Advisor repeatedly, and blindly added ALL indices it recommended (more later on DTA)
  • Performance worsened, so they added more indices based upon experimentation and hope
  • Database crashed and after restoration from an older database backup (problems occurred during some of the restorations) performance worsened
  • So many indices existed that they did not know which ones still were needed, so they scripted all indices from before and implemented them with new names just to be sure all requisite indices were present in the database
  • They had no idea how to determine unused, unnecessary, or fully or partially duplicated indices

Index Insertion Operation

Every record insertion creates overhead maintenance work because all non-subset indices are updated for every insert. Page splits may be involved if a page has been filled already. Upper index levels may be affected and may split as well. A page split is defined as follows: “When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations.” Locks are held while all this happens, and fillfactor and insertion rate affect frequency of this activity. Further research can be conducted via http://msdn.microsoft.com/en-us/library/ms177459.aspx.

Consequences of Too Much Overhead

Increased overhead leads to lower transaction throughput, increased blocking, and increased deadlocking. These worsen if queries do not use the Nolock query hint because Page and Row locks are acquired and potentially held longer. Use of Nolock can greatly reduce these problems for selection queries — do not forget to add the Nolock hint to join clauses. Research transaction level isolation for further information. In high volume record insertion cases, page latch wait time can become a major constraining factor.

The next article in this blog series will cover how to detect indices whose functionality is duplicated by other indices.   Until then….happy tuning and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 3

— by Jeffry Schwartz

Review

In Parts 1 and 2 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

https://blog.sqlrx.com/2017/11/02/rx-for-demystifying-index-tuning-decisions-part-2/

Part 3 covers the following topic:

  • Using query plans to determine relationships between queries and indices

Query or Execution Plans

Query plans have already been mentioned as being useful for query tuning. They can be invaluable for determining which part of a query causes performance issues, which indices are used, and HOW the indices are accessed, e.g., randomly or sequentially. A full discussion of query plans is beyond the scope of this presentation, but graphical plans are used most often. Graphical plans come in two forms: Estimated (predicted behavior) and Actual (actual behavior). XML plans contain additional information, but they are difficult to read even with an XML editor. Querying the XML data can provide invaluable information, but the queries to extract this information can consume a great deal of CPU time, especially when many complex plans must be examined, so it is best to peruse this kind of data on a server that is not used for production work whenever possible.

Interpreting Graphical Query Plans

It is important for any database analyst to have a basic understanding of how to read query plans. These plans are graphical flow diagrams of how the query will be executed and what information will flow from one operator to another. The diagrams flow from bottom right to top left, and quite often the most crucial operators are near the right and sometimes near the bottom as well. Each node in the tree structure is represented by an icon and specifies the logical operator that will be used to execute that portion of the query. A popup appears displaying details about the operator when the mouse pointer is placed over an operator. This information is also available, and often easier to read, via the properties window (F4 in SSMS or right-click the node and click Properties). Each node is related to a parent node by arrows whose width is proportional to the number of rows returned by the operator. Note: The actual number of rows is used when available. Otherwise, the estimated number of rows is used. Sometimes, these are vastly different and these differences often indicate that statistics are not up-to-date or sufficiently comprehensive.

Large operators may run on parallel threads because the single-threaded version would run too long. For parallel queries that involve multiple threads/CPUs the node properties will display information about the operating system threads used. Parallelized nodes have graphical indicators, two arrows on a beige circle, to inform the viewer that the operator will be divided into multiple pieces for execution as shown in Figure 1 and Figure 2.

Blog_20171109_1

Figure 1: Query Plan Parallelism Operator

It is important to know what the most common data access operators are, as well as how to interpret them. An index seek operator indicates the retrieval of one or more rows using keys. Clustered or NonClustered indicate the type of index that is accessed, and a NonClustered Index Seek could possibly indicate the use of a covering index. Key and RID Lookups indicate a lookup of a row using the row ID or clustering key (heap or clustered index) to obtain information from the record that is not contained in the nonclustered index that was used for primary access. Key and RID Lookup operators are most often associated with a nested loop operator and a nonclustered seek operator. The Key and RID Lookup operators usually produce output, but not always. When they are present, the output column list indicates a possible opportunity to add those columns to the nonclustered index to create a covering index for this query. When the output column is missing, these operators are used to effect a join. A simple way to determine whether output columns exist is to view the Output List that is displayed for the operator popup like the list shown in Figure 3. Index Scan (Clustered or NonClustered) operators scan ALL rows of the index even if a where clause is used. Scan operators read ALL rows from a table or clustered index regardless of the where clause. All scans should be avoided a much as possible unless the tables are small. Examples of a clustered index scan operator and its properties are shown in Figure 2 and Figure 3.

 

Blog_20171109_2

Figure 2: Clustered Index Scan Operator Graphic

 

Blog_20171109_3

Figure 3: Clustered Index Scan Operator Properties

 

Other commonly used operators include nested loop, bitmap, and hash match. Nested Loops perform inner join, left outer join, left semi join, and left anti semi join logical operations. These operators are potentially very deceptive because although each iteration of a loop might be very efficient, there may be millions of iterations. Bitmap operators may speed up query execution by eliminating rows with key values that cannot produce any join records before passing rows through another operator such as a Parallelism operator. Hash Match operators build hash tables by computing a hash value for each row from its input. These are usually very bad and frequently indicate that the query and its joins should be rewritten.

The plan for the query below (originally covered in Part 2)….

Blog_20171102_4

XEvents Statement Completed Sample Output

 

…is shown in Figure 4 and the properties for the nonclustered index scan are shown in Figure 5. The highlighted portions illuminate some important aspects of the query plan. The first one highlights the missing index recommendation and illustrates the key and included columns for the suggested index that would make this query run faster. The second highlight emphasizes the hash match, and the last one illustrates why the nonclustered index scan occurred. Specifically, the EventClass column was used by the index scan operator to look up records in the table with values of 166 because no appropriate index existed on the CompletedEventText table. The output columns, RowNumber and QueryChecksum, were returned by the index scan operator. It is important to note that the missing index recommendation specifies the same columns that are shown in the Output List in Figure 5. Generally speaking, index and table scan operators do not produce missing index recommendations, which will be discussed in more detail later in this series. Note: Prior to SQL Server 2014, the author had never observed an index scan operator that produced a missing index recommendation. This appears to be a function of the new Cardinality Estimator that was implemented in SQL Server 2014 and requires that the database compatibility level be set to SQL Server 2014 or later to use it. Otherwise, the old estimator that goes back to SQL Server 7.0 is used. Further information regarding the cardinality estimator can be found at https://blogs.technet.microsoft.com/dataplatform/2017/03/22/sql-server-2016-new-features-to-deal-with-the-new-ce/.

Blog_20171109_5

Figure 4: Query Plan with Hash Match and Index Scan Operators

 

 

Blog_20171109_6

Figure 5: Index Scan Operator Properties from Query Plan with Hash Match and Index Scan

 

The next article in this blog series will cover the query optimizer.   Until then….happy tuning and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 2

— by Jeffry Schwartz

Review

In Part 1 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

Part 2 covers the following topics:

  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events

Determining Queries that Need Indices the Most

It is simple in a test or development environment using a significant subset of production data to obtain an actual (and useful) query plan of a troublesome query. Business transactions often require many queries to complete so determining which ones are the most troublesome can be difficult. The author has observed countless customers who were certain they knew exactly where the performance problems were only to find out the problems were elsewhere. Therefore, collecting the appropriate performance data on a production server is critical in determining the queries that need tuning most. Examples of query performance criteria are listed below:

  • Individually or collectively run too long
  • Individually perform too much work, e.g., reads or CPU
  • Collectively perform too much work, e.g., 200,000 executions that each execute 50,000 reads -> 10 BILLION reads in total

Table 1 shows examples of queries that potentially need tuning based upon the number of executions, total reads, total duration, total CPU time, and average reads per execution. This kind of report immediately focuses attention on the queries that might benefit the most from either index or query tuning. The five queries highlighted in Table 1 underscore these criteria. The ones highlighted in yellow were the worst offenders because their executions collectively performed the most reads with the worst one totaling 3.5 BILLION reads. The ones highlighted in light green and orange accounted for the most CPU time as well as the longest total duration. The one highlighted in slate ran the most times, and the ones highlighted in gray performed the most reads per execution. This information is vital when determining where query and index tuning should be focused. Table 2 provides additional information in that it shows whether the large numbers of reads were consistent regardless of the specified input parameters or if they were parameter-dependent. For example, the 80th percentile of the third query was 453,515, which means that 80 percent of the queries performed 453,515 reads or fewer. The 100th percentile represents the maximum of 1.724 billion reads and tells the analyst that 20 percent of the queries performed at least 453,515 reads. This data informs the analyst that the specific parameters of the execution that performed 1.724 billion reads should be used for tuning. The read behavior of the last two queries, highlighted in light green and light orange, indicates that the choice of parameters does not make any notable difference in execution behavior. The query highlighted in light blue exhibits similar performance. This kind of information is invaluable because the author has observed too many tuning efforts on development or test machines that could not reproduce the performance of the query in the production environment. Sometimes, this results from a dramatic difference in the sizes of the respective databases, but other times it results from an inappropriate choice of parameters.

Blog_20171102_1

Table 1: Summary Query Metrics Example

 

Blog_20171102_2

Table 2: Query Reads Distribution Example

 

Extended Events (XEvents)

Extended Events are very useful for capturing comprehensive workload performance information because they are lightweight and can be tailored to capture only the most important information. They were introduced in SQL Server 2008, but didn’t mature until SQL Server 2012. A full treatment of how to capture important query performance data using XEvents is beyond scope of this blog series. However, further information can be viewed at https://blog.sqlrx.com/2016/01/28/sql-server-20122014-extended-events-for-developers-part-1. XEvents use the concept of a session that must be configured, started, and stopped to capture data. This can be accomplished using SSMS or T-SQL. The basic steps involved with capturing and reporting XEvents data are listed below:

  1. Create Event Session specifying types of events to capture based upon their duration, CPU time consumption, database ID, etc.
  2. Start session
  3. Stop session
  4. Either use SSMS or T-SQL XML query to extract the captured data (SSMS easiest way to get started)

An example of some T-SQL session specification code is shown in Figure 1. Important note: The Duration filter is in MICROSECONDS. Use the following T-SQL command to start the session: ALTER EVENT SESSION WorstQueries ON SERVER STATE = START and this command to stop the session: ALTER EVENT SESSION WorstQueries ON SERVER STATE = STOP. Once the session has been stopped, open WorstQueries.xel using SSMS or use T-SQL to report or load the data into a database table. Note: XEL files can be viewed in real-time, if necessary. Figure 2 displays a completed statement sample viewed via SSMS.

Blog_20171102_3

Figure 1: Code that Creates WorstQueries Session

 

Blog_20171102_4

Figure 2: XEvents Statement Completed Sample Output

 

The next article in this blog series will cover query/execution plans.   The fun stuff is coming up!!  Until then….happy tuning and please let us know if you have any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 1

— by Jeffry Schwartz

Overview

Indices are one of the most troubling and mysterious areas for DBAs and developers. Most understand that adding a badly needed index can yield dramatic query performance improvements, but how does one know what is needed? Many believe more indices are better, regardless of the number. Most fear deleting ANY existing indices, even when the indices partially or fully duplicate functionality because they fear execution times will skyrocket if anything is changed. Others implement the recommendations of the Database Engine Tuning Advisor (DTA) without question while not understanding what the ramifications of those actions are.

Why does so much mystery surround indexing? One reason is that many DBAs possess insufficient knowledge especially regarding the following issues:

  • Their design, construction, and interaction with queries
  • Measurement of their usage both from a query and a SQL Server perspective
  • Determining viable missing index candidates and how to implement any recommendations SQL Server might produce

Index usage and tuning metrics became available on SQL Server 2005 with Dynamic Management Views and Functions, which will be discussed later. However, the meanings and significance of index DMV/DMF metrics are still not well understood by many despite only minor additions over the years. Specifically, the following list contains a synopsis of the topics that the author has observed to be the most salient index-related issues:

  1. Identifying
    • Queries that need an index to function efficiently
    • Which indices, if any, are used by a query (and how they are used, e.g., randomly or sequentially)
    • Tables (and their indices) that merit evaluation and potential adjustment
    • Indices that duplicate functionality of others
  2. Understanding when
    • A new index is truly needed and what improvement can be anticipated
    • An index can be deleted without harming performance
    • An index should be adjusted or added despite the query not producing any missing index warnings
  3. Understanding why having too many indices results in
    • Inserts and updates taking too long and/or creating blocking
    • Suboptimal query plans being generated because there are too many index choices
  4. Knowing Database Engine Tuning Advisor (DTA) pros & cons

This blog series will help the reader with these issues by covering topics such as:

  • Common types of indices and how they function (clustered, nonclustered, & covering)
  • Detecting queries that need rewriting or require indexing help
  • How queries use various types of indices, e.g., seeks, scans, and lookups
  • Using query plans to determine relationships between queries and indices
  • Statistics and how they affect index usage and query performance
  • Query Optimizer
  • Consequences of too many indices
  • Detecting indices whose functionality is duplicated by other indices
  • Monitoring actual index usage from both a logical, i.e., query, and a physical perspective
  • Determining which indices are hardly, if ever, used
  • Determining missing indices and their estimated impact if implemented
  • Incorporating missing index recommendations either into existing indices or as few indices as possible

Indexing Overview

Two basic types of indices exist in SQL Server: clustered index and NonClustered index. A clustered index (CI) IS the data table with a binary tree attached to it. It contains leaf and non-leaf levels as shown in the Figure 1 diagrams below. The leaf levels contain the actual data and the non-leaf levels point to the leaves in the binary tree. A nonclustered index only contains index-related entries, i.e., keys and included columns. The same levels that exist in a clustered index also exist in NonClustered (NC) indices whose leaf levels point to the table (heap) or a clustered index. Keys exist throughout the entire binary tree. Although clustered index keys are contained in all NonClustered indices, they cannot be used directly by a query. Therefore, one should keep clustered index keys as small as possible. If the analyst KNOWS that a LOT of range scans will be performed based on the clustered index key, making the keys incorporate the range scan columns will help keep the scanned entries physically closer at least within a page, especially immediately after the index is reorganized or rebuilt. Duplicates in clustered indices are not allowed PHYSICALLY, so in non-unique situations SQL Server will automatically add a duplicate resolver (uniqueifier in the Microsoft literature) that adds four bytes to every key entry in the table, including the non-leaf levels). Clustered indices are often preferred by SQL Server for sizable sequential or semi-sequential access except when the NonClustered index is very narrow and the data record is fairly wide. Examples of clustered index and nonclustered index definitions are shown below:

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Blog_20171026_1

Figure 1: Physical Index Layouts

A covering index is a NonClustered index that contains ALL the data needed by a particular query so the table or clustered index need not be accessed at all. If the values are only present in the query select list and not used in any where or join clauses, the columns can be “included” in a NonClustered index. Included columns are ONLY allowed in NonClustered indices and ONLY appear in index leaf levels. They do not exist in upper binary tree levels, which can be 3 or more levels depending upon the number of records in the table. An example of a covering index is shown below using included columns to satisfy the query’s needs.

CREATE NONCLUSTERED INDEX ix_CombinedIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

When SQL Server makes a missing index recommendation that uses included columns, e.g., when a query plan is obtained in SSMS, it will often ignore a NonClustered index that does not have the recommended included columns, especially if the clustered index has some of the keys. For example, a developer with whom the author worked was tuning a query and the missing index recommendation displayed in the query plan recommended a new index that contained three key columns and three included columns. The developer implemented the index without the included columns. When he obtained the query plan again, he was surprised to discover that SQL Server ignored the new index. When he added the three recommended included columns, SQL Server used the index. Missing index recommendations and query plans will be discussed later in this blog series.

Finally, a “filtered” index only contains specific entries, e.g., all records where the payment date is zero. The simplest way to think of this kind of index is to consider it to be an index with a where clause, which is demonstrated below:

CREATE NONCLUSTERED INDEX ix_FilteredIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

where [Metric43] > 100000000

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Index Access Methods

LOGICAL index access methods can be thought of as coming from a query perspective, i.e., how a query might “view” the access operation or the operators that appear in a query plan. SQL Server records three types of LOGICAL index access methods: Seek (random access retrieval of one or more records), Scan (linear search, i.e., reading a data table or clustered index from beginning to end), and Key/RID Lookup (Index accessed first either randomly or sequentially, but some requested data is missing from the index, so the table or clustered index must be accessed for remaining columns to effect a join or produce the requested query output). It is important for the reader to understand that a Seek may result in a single record access or the access of a range of records. Another important issue involves what causes SQL Server to perform a Seek or a Scan operation. Obviously, SQL Server will perform a scan if a suitable index does not exist, but not so obviously, it will also perform a scan if the table is small enough or an appropriate index would return more than approximately 25 percent of the records in the table. Clearly, the scan is not a bad thing in the first case and in the second case, the optimizer has determined that the most efficient way to accomplish the task to scan the table. The second case may also occur when the table grows sufficiently large and the query filter specification has become sufficiently vague to cause the scan to occur. For example, this may occur when a query requests all the orders or parts needed by manufacturing from the beginning of the year. When the company is smaller or the interval is only a few months, this type of request may be satisfied with a seek operator, but as the table grows or the number of records for the same period increases, the scan threshold is exceeded and SQL Server will perform a scan instead of a seek. When this occurs, queries that generally perform well suddenly perform poorly, usually to everyone’s consternation.

PHYSICAL index access methods can be thought of as SQL Server’s internal perspective, i.e., how SQL Server actually performs the access operations requested by a query. SQL Server records two types of PHYSICAL index access methods: Singleton Lookup (single record lookup via index, i.e., logical seek) and Range Scan (multiple record lookup via index or table, i.e., logical seek OR logical scan).

Using these two perspectives, an analyst can often determine how a table and its index structure are used by a group of queries.

Why Begin Index Tuning with Queries?

Indices not used in a vacuum; their existence is driven by business work as represented by database queries. Queries determine what indices are required, so an analyst must determine which queries (and tables) need the greatest indexing help. Therefore, index tuning should NOT be performed on a one-off basis. Analysts should resist the temptation to add an index to solve ONE problem. Therefore, it is important for an analyst NOT to work on just ONE index to solve a SINGLE query problem unless query performance is critical and abominable. One must ALWAYS be aware of other queries’ usage of any current or proposed index. Ideally, one index accommodates numerous queries, so index tuning should be performed as a comprehensive solution.

Determining an Appropriate Strategy for Index Tuning

The following contains the list of objectives and considerations for any index tuning project:

  1. Must queries perform optimally because volumes are very high?
  2. Will ANY reduction in work improve performance?
  3. Sometimes indices cannot remedy the problem because the queries are written so poorly that the real solution is to rewrite the query, probably using Common Table Expressions (CTEs) and possibly, adding or modifying some indices.

The author once worked with a customer whose developers questioned why he suggested investigating a particular query for tuning when it ONLY performed 54,000 reads per execution. The answer was that since the query was executed approximately 140,000 times daily, even a reduction of 1,000 reads would translate into a 140,000 read reduction in overall database work. Several queries fell into this category and since they were used in a hospital emergency room environment, speed was critical. Other queries performed millions or billions of reads during each execution. Clearly, any performance improvement would have been noticeable in this situation. These kinds of queries often prove to be very difficult to tune because they are quite complex, e.g., the author has tuned queries that employed fifteen inner and left joins. Others utilized many large tables and used many joins, some of which were contained within views. Regardless, the numbers and sizes of the tables have a direct bearing upon how difficult a query is to tune. Sometimes performance improvements can be realized by adding or modifying indices, but many times the query itself has to be rewritten using either CTEs or temporary tables (or both). Sometimes very large queries have to be broken into pieces to determine exactly what portions cause the query to perform poorly, and CTEs are ideal for this kind of analysis and tuning.

How the queries and indices interact is a major consideration, e.g., is the clustered index used because a more appropriate index does not exist? Does the query perform a series of single record lookups (even if there are many thousands of them) or does it scan some of the larger tables in their entirety? How many indices exist on each of the tables? This kind of issue is shown in Table 1 below. Are there too many index choices for the optimizer? Are any indices no used at all or very seldom? All of these issues factor into index and query tuning strategies. The ideal scenario is to have as few indices as possible without creating a need that has to be addressed by a full table clustered index scan, but finding this balance is often difficult.

Blog_20171026_2

Table 1: Total Existing Index Count by Table and Database

The next article in this blog series will cover how to determine queries that need indices the most and extended events. Until then….enjoy life and let us know if you have any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Get Index Column Info with Includes for One or Many Tables

— by Lori Brown @SQLSupahStah

I was recently working with one of my clients on some low hanging fruit type of query tuning. We had checked the cache for plans with missing index warnings in them and were trying to see if we could tweak and existing index or add a new index to speed things up. If you ever work with missing indexes, you surely have seen it recommend crazy things, duplicates or existing indexes or it wants something that can be added to an existing index.

The bottom line for missing index recommendations is that you should NEVER, EVER create a missing index unless you know for absolutely sure that the index does not exist and that it will really help a known performance issue. For instance, you can have missing index warnings on small tables that only return a couple of rows to the query. Those are usually not worth working on since scanning a small table can usually be done very quickly by SQL. To this day, I still find many databases that are way over indexed with indexes that were implemented simply because someone found a missing index warning and did not do their homework or they ran the dreaded Database Tuning Advisor which shoved a bunch of duplicate indexes into tables.

If you are wondering how to get a list of missing indexes, please check out Jeff Schwartz’s blog post on how to do this. (https://blog.sqlrx.com/2017/06/02/query-tuning-and-missing-index-recommendations/ ) This will give you a place to start. It is better if you know what query is throwing the missing index warning so it is a good idea to collect those either in a trace or extended events. Jeff builds on his first post and in his second post on the subject (https://blog.sqlrx.com/2017/07/20/handling-multiple-missing-index-recommendations-for-the-same-table/ ) also goes over the fun of having multiple missing index recommendations for a single table and how to deal with them.

Here’s a handy set of links for some of Jeff’s great index tuning work that you should really check out:

https://blog.sqlrx.com/2016/01/28/sql-server-20122014-extended-events-for-developers-part-1/

https://blog.sqlrx.com/2017/06/02/query-tuning-and-missing-index-recommendations/

https://blog.sqlrx.com/2017/07/20/handling-multiple-missing-index-recommendations-for-the-same-table/

https://blog.sqlrx.com/2017/08/10/how-indexing-affects-deletion-queries/

One of the things that I usually need when performance tuning is to know information about the existing indexes on specific tables. I always want to know what columns are in the indexes along with the included columns so that I can compare the existing indexes to the missing recommendations. This way I can better figure out if a recommendation is something that can be added to an existing index (like an included column) or if I really need to create a brand new index if it does not exist at all.

Like most DBA’s, I keep a toolkit with all kinds of handy scripts. However, I did not have one that would give me index included columns. I also wanted the query to be able to return info from one or many tables at the same time. This would be useful when dealing with things with lots of joins. I know that there are a few bloggers who have posted something similar but I wanted to have the ability to filter on one or multiple tables. So, here is what I came up with:

/********************************

Returns index columns with included columns

plus other needed index info for tables

in @tablelist variable

*********************************/

 

DECLARE @tablelist VARCHAR(1000)

DECLARE @sqlstr NVARCHAR(MAX)

 

SET @tablelist = ‘InvoiceLines,OrderLines,StockItemHoldings’ — comma delimited list of tables, can be one or multiples EX: ‘mytable’ or ‘mytable,nothertable,thirdtable’

 

— Query the tables

IF @tablelist <> OR @tablelist <> ‘?,?,?’

BEGIN

SET @tablelist = REPLACE(QUOTENAME(@tablelist,””), ‘,’, ”’,”’) — Add quotes so the IN clause will work

 

SET @sqlstr = ‘SELECT SCHEMA_NAME(o.schema_id) AS SchemaName

       ,o.name AS TableName

       ,i.name AS IndexName

       ,i.type_desc AS IndexType

       ,c.name AS ColumnName

       ,ic.is_included_column

       ,ic.index_column_id

       ,t.name

       ,c.max_length

       ,c.precision

       ,c.scale

FROM sys.indexes i

JOIN sys.index_columns ic ON i.index_id = ic.index_id

JOIN sys.columns c ON ic.column_id = c.column_id AND ic.object_id = c.object_id

JOIN sys.objects o ON o.object_id = i.object_id AND c.object_id = i.object_id

JOIN sys.types t on t.user_type_id = c.user_type_id

WHERE o.name IN (‘+@tablelist+‘)

ORDER BY SchemaName, TableName, IndexName, index_column_id’

 

EXEC sp_executesql @sqlstr

–PRINT @sqlstr

END

All you have to provide is a comma separated list of the table(s) you are interested in for the @tablelist variable and it will do the rest. The output looks like this:

Blog_20170928_1

I found several bloggers who had made queries that would concatenate the columns together but truthfully I found those hard to read so I settled for a list with some extra info on the data and index types. Hope this is useful to someone out there.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

 

How Indexing Affects Deletion Queries

— by Jeff Schwartz

The Problem

Many articles concerning SQL Server discuss how record insertion overhead increases with each additional index. They discuss b-tree manipulations and page splits in addition to leaf and non-leaf levels. However, few discuss the fact that deletion overhead increases as well, especially when large numbers of records are deleted by individual queries. Recently, I was working with a client who regularly purged large numbers of records from tables that ranged in size from large to gigantic. For example, one table contained over 6.5 billion records. I added an index (4th overall) to one table expressly for the purpose of expediting the large deletion process, and the deletion run ran longer, despite using the new index! To determine how the numbers of indices and records to be deleted interact, I conducted an experiment to test several combinations. The specifics of the tests and their corresponding results are summarized below.

Test Table Creation & Load

To determine deletion/index behavior, a generic table was constructed and filled with 20 million records. Each record contained an identity column, an ID column, a text column, and 47 metric columns whose random values ranged between 1 and 1,000,000,000. The large number of table columns was used to insure SQL Server would choose an index option when appropriate. To minimize duplication of column values and create a uniform distribution of values, the ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase formula was used to generate values that were as random as possible, where @ModBase was set to one billion. Three indices were created initially: a clustered index that used the identity column as its only key, a nonclustered index that used DupID as its only column, and a nonclustered index that used Metric43 and Metric14 as keys with Metric01, Metric02, Metric03, and Metric04 as included columns. The scripts for the creation, loading, and initial indexing of the table are shown below.

— ##############################################################

— Create test table

— ##############################################################

drop table FewDuplicates;

CREATE TABLE FewDuplicates (

IDCol bigint identity (20000000,1),

DupID bigint,

MyText varchar(10),

Metric01 bigint, Metric02 bigint, Metric03 bigint, Metric04 bigint,

Metric05 bigint, Metric06 bigint, Metric07 bigint, Metric08 bigint,

Metric09 bigint, Metric10 bigint, Metric11 bigint, Metric12 bigint,

Metric13 bigint, Metric14 bigint, Metric15 bigint, Metric16 bigint,

Metric17 bigint, Metric18 bigint, Metric19 bigint, Metric20 bigint,

Metric21 bigint, Metric22 bigint, Metric23 bigint, Metric24 bigint,

Metric25 bigint, Metric26 bigint, Metric27 bigint, Metric28 bigint,

Metric29 bigint, Metric30 bigint, Metric31 bigint, Metric32 bigint,

Metric33 bigint, Metric34 bigint, Metric35 bigint, Metric36 bigint,

Metric37 bigint, Metric38 bigint, Metric39 bigint, Metric40 bigint,

Metric41 bigint, Metric42 bigint, Metric43 bigint, Metric44 bigint,

Metric45 bigint, Metric46 bigint, Metric47 bigint

)

— ##############################################################

— Load original table

— ##############################################################

declare @DupID bigint = 1

declare @NumRecs bigint = 20000000

declare @ModBase bigint = 1000000000

truncate table FewDuplicates

set nocount on

while (@DupID <= @NumRecs)

begin

insert into [dbo].[FewDuplicates] (

[DupID], [MyText]

,[Metric01], [Metric02], [Metric03], [Metric04], [Metric05], [Metric06]

,[Metric07], [Metric08], [Metric09], [Metric10], [Metric11], [Metric12]

,[Metric13], [Metric14], [Metric15], [Metric16], [Metric17], [Metric18]

,[Metric19], [Metric20], [Metric21], [Metric22], [Metric23], [Metric24]

,[Metric25], [Metric26], [Metric27], [Metric28], [Metric29], [Metric30]

,[Metric31], [Metric32], [Metric33], [Metric34], [Metric35], [Metric36]

,[Metric37], [Metric38], [Metric39], [Metric40], [Metric41], [Metric42]

,[Metric43], [Metric44], [Metric45], [Metric46], [Metric47]

)

VALUES (

@DupID,‘my text’,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

 ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

)

set @DupID += 1

end — group option loop

set nocount off

— ##############################################################

— Create indices on the test table

— ##############################################################

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX ix_CombinedIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Queries that Perform Varying Numbers of Record Deletions

To illustrate the issue, six queries were created that deleted various numbers of records and they are listed in Table 1. The deletions were designed to use the ix_CombinedIndex index, thereby emulating the client situation in which the optimal index for deletions was used. The only differences involved the numbers of records deleted and these are highlighted in the table below for easier comparison. The selected values were chosen so that small, medium, and large numbers of records would be deleted.

Blog_20170810_1

Table 1: All Six Deletion Queries

The plan for Query #4 with three indices is shown below in Figure 1 and it is fairly simple, as one would expect. Things become more complicated with additional indices, as discussed in the next section.

Blog_20170810_2

Figure 1: Query Plan for Deletion Query #4 with Three Indices

Indices that Cause Additional Deletion Overhead

To examine the effects of additional indices on the deletion queries, up to eight were added to the table using different keys. None of the indices was used for direct data access and the index definitions are shown below, followed by the commands to delete them. Only 4 of these indices were used for the 7-index test, whereas all 8 were used for the 11-index test.

— create indices

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber1 ON [dbo].[FewDuplicates]

(

[Metric01],

[Metric02]

)

INCLUDE (

[Metric03],

[Metric04],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber2 ON [dbo].[FewDuplicates]

(

[Metric03],

[Metric04]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber3 ON [dbo].[FewDuplicates]

(

[Metric05],

[Metric06]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber4 ON [dbo].[FewDuplicates]

(

[Metric07],

[Metric08]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber5 ON [dbo].[FewDuplicates]

(

[Metric09],

[Metric10]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber6 ON [dbo].[FewDuplicates]

(

[Metric11],

[Metric12]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber7 ON [dbo].[FewDuplicates]

(

[Metric13],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric44]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber8 ON [dbo].[FewDuplicates]

(

[Metric15],

[Metric16]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

— drop indices

drop INDEX ix_AddedIndexNumber1 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber2 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber3 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber4 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber5 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber6 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber7 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber8 ON [dbo].[FewDuplicates]

Figure 2 shows the plan for the same query that was shown in Figure 1, but this time with 11 indices on the table instead of 3. Clearly, the plan is MUCH more complex and the effects of the additional indices are obvious. This suggests what the test metrics will ultimately confirm: work grows demonstrably as the number of indices increases.

Blog_20170810_3

Blog_20170810_4

Figure 2: Query Plan for Deletion Query #4 with 11 Indices

Test Results

A total of 18 tests were conducted. Three index configurations were used with the following numbers of indices: 3, 7, and 11. Queries #1 – #6 were run against each configuration. As shown in Figure 3 through Figure 6, most of the metrics are comparable until the 100,000-record level is exceeded, at which point great divergence occurs. One of the most interesting findings involves writing, whose curve shape differs completely from those of the other metrics. These figures graphically illustrate why many deletions of small-to-moderate numbers of records are not often noticed as the number of indices increases. However, it also illustrates clearly how the size of the deletion and the number of indices can combine to negate the improvement provided by a specialty deletion index.

Blog_20170810_5

Figure 3: Deletion Queries – Duration (Seconds)

Blog_20170810_6

Figure 4: Deletion Queries – CPU (Seconds)

Blog_20170810_7

Figure 5: Deletion Queries – Reads

Blog_20170810_8

Figure 6: Deletion Queries – Writes

Conclusion

This article illustrated a situation in which the amount of work performed by deletion queries increased dramatically as the number of records to be deleted increased beyond approximately 100,000. Although this particular value is obviously applicable only to the test case, the overall message is clear – as the number of records to be deleted increases well beyond 100,000, the work performed by the query increases dramatically. In addition, this situation worsens considerably as the number of indices spanning a table increases. Therefore, although the overhead associated with deleting low numbers of records may not be noticed as indices are added, the performance of queries that delete large numbers of records will degrade noticeably as the number of indices increases. In some cases, the overhead may negate any improvement that might be gained by adding an index whose purpose is to expedite the deletion process. Therefore, before considering adding an index to improve deletion performance, insure that the batch of deleted records is not too large and the number of indices on the table is small.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!