Rx for Demystifying Index Tuning Decisions - Part 1

Oct 26

Overview

Indices are one of the most troubling and mysterious areas for DBAs and developers. Most understand that adding a badly needed index can yield dramatic query performance improvements, but how does one know what is needed? Many believe more indices are better, regardless of the number. Most fear deleting ANY existing indices, even when the indices partially or fully duplicate functionality because they fear execution times will skyrocket if anything is changed. Others implement the recommendations of the Database Engine Tuning Advisor (DTA) without question while not understanding what the ramifications of those actions are.

Why does so much mystery surround indexing? One reason is that many DBAs possess insufficient knowledge especially regarding the following issues:

Their design, construction, and interaction with queries
Measurement of their usage both from a query and a SQL Server perspective
Determining viable missing index candidates and how to implement any recommendations SQL Server might produce

Index usage and tuning metrics became available on SQL Server 2005 with Dynamic Management Views and Functions, which will be discussed later. However, the meanings and significance of index DMV/DMF metrics are still not well understood by many despite only minor additions over the years. Specifically, the following list contains a synopsis of the topics that the author has observed to be the most salient index-related issues:

Identifying
- Queries that need an index to function efficiently
- Which indices, if any, are used by a query (and how they are used, e.g., randomly or sequentially)
- Tables (and their indices) that merit evaluation and potential adjustment
- Indices that duplicate functionality of others
Understanding when
- A new index is truly needed and what improvement can be anticipated
- An index can be deleted without harming performance
- An index should be adjusted or added despite the query not producing any missing index warnings
Understanding why having too many indices results in
- Inserts and updates taking too long and/or creating blocking
- Suboptimal query plans being generated because there are too many index choices
Knowing Database Engine Tuning Advisor (DTA) pros & cons

This blog series will help the reader with these issues by covering topics such as:

Common types of indices and how they function (clustered, nonclustered, & covering)
Detecting queries that need rewriting or require indexing help
How queries use various types of indices, e.g., seeks, scans, and lookups
Using query plans to determine relationships between queries and indices
Statistics and how they affect index usage and query performance
Query Optimizer
Consequences of too many indices
Detecting indices whose functionality is duplicated by other indices
Monitoring actual index usage from both a logical, i.e., query, and a physical perspective
Determining which indices are hardly, if ever, used
Determining missing indices and their estimated impact if implemented
Incorporating missing index recommendations either into existing indices or as few indices as possible

Indexing Overview

Two basic types of indices exist in SQL Server: clustered index and NonClustered index. A clustered index (CI) IS the data table with a binary tree attached to it. It contains leaf and non-leaf levels as shown in the Figure 1 diagrams below. The leaf levels contain the actual data and the non-leaf levels point to the leaves in the binary tree. A nonclustered index only contains index-related entries, i.e., keys and included columns. The same levels that exist in a clustered index also exist in NonClustered (NC) indices whose leaf levels point to the table (heap) or a clustered index. Keys exist throughout the entire binary tree. Although clustered index keys are contained in all NonClustered indices, they cannot be used directly by a query. Therefore, one should keep clustered index keys as small as possible. If the analyst KNOWS that a LOT of range scans will be performed based on the clustered index key, making the keys incorporate the range scan columns will help keep the scanned entries physically closer at least within a page, especially immediately after the index is reorganized or rebuilt. Duplicates in clustered indices are not allowed PHYSICALLY, so in non-unique situations SQL Server will automatically add a duplicate resolver (uniqueifier in the Microsoft literature) that adds four bytes to every key entry in the table, including the non-leaf levels). Clustered indices are often preferred by SQL Server for sizable sequential or semi-sequential access except when the NonClustered index is very narrow and the data record is fairly wide. Examples of clustered index and nonclustered index definitions are shown below:

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Figure 1: Physical Index Layouts

A covering index is a NonClustered index that contains ALL the data needed by a particular query so the table or clustered index need not be accessed at all. If the values are only present in the query select list and not used in any where or join clauses, the columns can be “included” in a NonClustered index. Included columns are ONLY allowed in NonClustered indices and ONLY appear in index leaf levels. They do not exist in upper binary tree levels, which can be 3 or more levels depending upon the number of records in the table. An example of a covering index is shown below using included columns to satisfy the query’s needs.

CREATE NONCLUSTERED INDEX ix_CombinedIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

When SQL Server makes a missing index recommendation that uses included columns, e.g., when a query plan is obtained in SSMS, it will often ignore a NonClustered index that does not have the recommended included columns, especially if the clustered index has some of the keys. For example, a developer with whom the author worked was tuning a query and the missing index recommendation displayed in the query plan recommended a new index that contained three key columns and three included columns. The developer implemented the index without the included columns. When he obtained the query plan again, he was surprised to discover that SQL Server ignored the new index. When he added the three recommended included columns, SQL Server used the index. Missing index recommendations and query plans will be discussed later in this blog series.

Finally, a “filtered” index only contains specific entries, e.g., all records where the payment date is zero. The simplest way to think of this kind of index is to consider it to be an index with a where clause, which is demonstrated below:

CREATE NONCLUSTERED INDEX ix_FilteredIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

where [Metric43] > 100000000

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Index Access Methods

LOGICAL index access methods can be thought of as coming from a query perspective, i.e., how a query might “view” the access operation or the operators that appear in a query plan. SQL Server records three types of LOGICAL index access methods: Seek (random access retrieval of one or more records), Scan (linear search, i.e., reading a data table or clustered index from beginning to end), and Key/RID Lookup (Index accessed first either randomly or sequentially, but some requested data is missing from the index, so the table or clustered index must be accessed for remaining columns to effect a join or produce the requested query output). It is important for the reader to understand that a Seek may result in a single record access or the access of a range of records. Another important issue involves what causes SQL Server to perform a Seek or a Scan operation. Obviously, SQL Server will perform a scan if a suitable index does not exist, but not so obviously, it will also perform a scan if the table is small enough or an appropriate index would return more than approximately 25 percent of the records in the table. Clearly, the scan is not a bad thing in the first case and in the second case, the optimizer has determined that the most efficient way to accomplish the task to scan the table. The second case may also occur when the table grows sufficiently large and the query filter specification has become sufficiently vague to cause the scan to occur. For example, this may occur when a query requests all the orders or parts needed by manufacturing from the beginning of the year. When the company is smaller or the interval is only a few months, this type of request may be satisfied with a seek operator, but as the table grows or the number of records for the same period increases, the scan threshold is exceeded and SQL Server will perform a scan instead of a seek. When this occurs, queries that generally perform well suddenly perform poorly, usually to everyone’s consternation.

PHYSICAL index access methods can be thought of as SQL Server’s internal perspective, i.e., how SQL Server actually performs the access operations requested by a query. SQL Server records two types of PHYSICAL index access methods: Singleton Lookup (single record lookup via index, i.e., logical seek) and Range Scan (multiple record lookup via index or table, i.e., logical seek OR logical scan).

Using these two perspectives, an analyst can often determine how a table and its index structure are used by a group of queries.

Why Begin Index Tuning with Queries?

Indices not used in a vacuum; their existence is driven by business work as represented by database queries. Queries determine what indices are required, so an analyst must determine which queries (and tables) need the greatest indexing help. Therefore, index tuning should NOT be performed on a one-off basis. Analysts should resist the temptation to add an index to solve ONE problem. Therefore, it is important for an analyst NOT to work on just ONE index to solve a SINGLE query problem unless query performance is critical and abominable. One must ALWAYS be aware of other queries’ usage of any current or proposed index. Ideally, one index accommodates numerous queries, so index tuning should be performed as a comprehensive solution.

Determining an Appropriate Strategy for Index Tuning

The following contains the list of objectives and considerations for any index tuning project:

Must queries perform optimally because volumes are very high?
Will ANY reduction in work improve performance?
Sometimes indices cannot remedy the problem because the queries are written so poorly that the real solution is to rewrite the query, probably using Common Table Expressions (CTEs) and possibly, adding or modifying some indices.

The author once worked with a customer whose developers questioned why he suggested investigating a particular query for tuning when it ONLY performed 54,000 reads per execution. The answer was that since the query was executed approximately 140,000 times daily, even a reduction of 1,000 reads would translate into a 140,000 read reduction in overall database work. Several queries fell into this category and since they were used in a hospital emergency room environment, speed was critical. Other queries performed millions or billions of reads during each execution. Clearly, any performance improvement would have been noticeable in this situation. These kinds of queries often prove to be very difficult to tune because they are quite complex, e.g., the author has tuned queries that employed fifteen inner and left joins. Others utilized many large tables and used many joins, some of which were contained within views. Regardless, the numbers and sizes of the tables have a direct bearing upon how difficult a query is to tune. Sometimes performance improvements can be realized by adding or modifying indices, but many times the query itself has to be rewritten using either CTEs or temporary tables (or both). Sometimes very large queries have to be broken into pieces to determine exactly what portions cause the query to perform poorly, and CTEs are ideal for this kind of analysis and tuning.

How the queries and indices interact is a major consideration, e.g., is the clustered index used because a more appropriate index does not exist? Does the query perform a series of single record lookups (even if there are many thousands of them) or does it scan some of the larger tables in their entirety? How many indices exist on each of the tables? This kind of issue is shown in Table 1 below. Are there too many index choices for the optimizer? Are any indices no used at all or very seldom? All of these issues factor into index and query tuning strategies. The ideal scenario is to have as few indices as possible without creating a need that has to be addressed by a full table clustered index scan, but finding this balance is often difficult.

Table 1: Total Existing Index Count by Table and Database

The next article in this blog series will cover how to determine queries that need indices the most and extended events. Until then….enjoy life and let us know if you have any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!