Tag Archives: SQL Development

How Indexing Affects Deletion Queries

— by Jeff Schwartz

The Problem

Many articles concerning SQL Server discuss how record insertion overhead increases with each additional index. They discuss b-tree manipulations and page splits in addition to leaf and non-leaf levels. However, few discuss the fact that deletion overhead increases as well, especially when large numbers of records are deleted by individual queries. Recently, I was working with a client who regularly purged large numbers of records from tables that ranged in size from large to gigantic. For example, one table contained over 6.5 billion records. I added an index (4th overall) to one table expressly for the purpose of expediting the large deletion process, and the deletion run ran longer, despite using the new index! To determine how the numbers of indices and records to be deleted interact, I conducted an experiment to test several combinations. The specifics of the tests and their corresponding results are summarized below.

Test Table Creation & Load

To determine deletion/index behavior, a generic table was constructed and filled with 20 million records. Each record contained an identity column, an ID column, a text column, and 47 metric columns whose random values ranged between 1 and 1,000,000,000. The large number of table columns was used to insure SQL Server would choose an index option when appropriate. To minimize duplication of column values and create a uniform distribution of values, the ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase formula was used to generate values that were as random as possible, where @ModBase was set to one billion. Three indices were created initially: a clustered index that used the identity column as its only key, a nonclustered index that used DupID as its only column, and a nonclustered index that used Metric43 and Metric14 as keys with Metric01, Metric02, Metric03, and Metric04 as included columns. The scripts for the creation, loading, and initial indexing of the table are shown below.

— ##############################################################

— Create test table

— ##############################################################

drop table FewDuplicates;

CREATE TABLE FewDuplicates (

IDCol bigint identity (20000000,1),

DupID bigint,

MyText varchar(10),

Metric01 bigint, Metric02 bigint, Metric03 bigint, Metric04 bigint,

Metric05 bigint, Metric06 bigint, Metric07 bigint, Metric08 bigint,

Metric09 bigint, Metric10 bigint, Metric11 bigint, Metric12 bigint,

Metric13 bigint, Metric14 bigint, Metric15 bigint, Metric16 bigint,

Metric17 bigint, Metric18 bigint, Metric19 bigint, Metric20 bigint,

Metric21 bigint, Metric22 bigint, Metric23 bigint, Metric24 bigint,

Metric25 bigint, Metric26 bigint, Metric27 bigint, Metric28 bigint,

Metric29 bigint, Metric30 bigint, Metric31 bigint, Metric32 bigint,

Metric33 bigint, Metric34 bigint, Metric35 bigint, Metric36 bigint,

Metric37 bigint, Metric38 bigint, Metric39 bigint, Metric40 bigint,

Metric41 bigint, Metric42 bigint, Metric43 bigint, Metric44 bigint,

Metric45 bigint, Metric46 bigint, Metric47 bigint

)

— ##############################################################

— Load original table

— ##############################################################

declare @DupID bigint = 1

declare @NumRecs bigint = 20000000

declare @ModBase bigint = 1000000000

truncate table FewDuplicates

set nocount on

while (@DupID <= @NumRecs)

begin

insert into [dbo].[FewDuplicates] (

[DupID], [MyText]

,[Metric01], [Metric02], [Metric03], [Metric04], [Metric05], [Metric06]

,[Metric07], [Metric08], [Metric09], [Metric10], [Metric11], [Metric12]

,[Metric13], [Metric14], [Metric15], [Metric16], [Metric17], [Metric18]

,[Metric19], [Metric20], [Metric21], [Metric22], [Metric23], [Metric24]

,[Metric25], [Metric26], [Metric27], [Metric28], [Metric29], [Metric30]

,[Metric31], [Metric32], [Metric33], [Metric34], [Metric35], [Metric36]

,[Metric37], [Metric38], [Metric39], [Metric40], [Metric41], [Metric42]

,[Metric43], [Metric44], [Metric45], [Metric46], [Metric47]

)

VALUES (

@DupID,‘my text’,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

 ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

)

set @DupID += 1

end — group option loop

set nocount off

— ##############################################################

— Create indices on the test table

— ##############################################################

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX ix_CombinedIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Queries that Perform Varying Numbers of Record Deletions

To illustrate the issue, six queries were created that deleted various numbers of records and they are listed in Table 1. The deletions were designed to use the ix_CombinedIndex index, thereby emulating the client situation in which the optimal index for deletions was used. The only differences involved the numbers of records deleted and these are highlighted in the table below for easier comparison. The selected values were chosen so that small, medium, and large numbers of records would be deleted.

Blog_20170810_1

Table 1: All Six Deletion Queries

The plan for Query #4 with three indices is shown below in Figure 1 and it is fairly simple, as one would expect. Things become more complicated with additional indices, as discussed in the next section.

Blog_20170810_2

Figure 1: Query Plan for Deletion Query #4 with Three Indices

Indices that Cause Additional Deletion Overhead

To examine the effects of additional indices on the deletion queries, up to eight were added to the table using different keys. None of the indices was used for direct data access and the index definitions are shown below, followed by the commands to delete them. Only 4 of these indices were used for the 7-index test, whereas all 8 were used for the 11-index test.

— create indices

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber1 ON [dbo].[FewDuplicates]

(

[Metric01],

[Metric02]

)

INCLUDE (

[Metric03],

[Metric04],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber2 ON [dbo].[FewDuplicates]

(

[Metric03],

[Metric04]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber3 ON [dbo].[FewDuplicates]

(

[Metric05],

[Metric06]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber4 ON [dbo].[FewDuplicates]

(

[Metric07],

[Metric08]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber5 ON [dbo].[FewDuplicates]

(

[Metric09],

[Metric10]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber6 ON [dbo].[FewDuplicates]

(

[Metric11],

[Metric12]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber7 ON [dbo].[FewDuplicates]

(

[Metric13],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric44]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber8 ON [dbo].[FewDuplicates]

(

[Metric15],

[Metric16]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

— drop indices

drop INDEX ix_AddedIndexNumber1 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber2 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber3 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber4 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber5 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber6 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber7 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber8 ON [dbo].[FewDuplicates]

Figure 2 shows the plan for the same query that was shown in Figure 1, but this time with 11 indices on the table instead of 3. Clearly, the plan is MUCH more complex and the effects of the additional indices are obvious. This suggests what the test metrics will ultimately confirm: work grows demonstrably as the number of indices increases.

Blog_20170810_3

Blog_20170810_4

Figure 2: Query Plan for Deletion Query #4 with 11 Indices

Test Results

A total of 18 tests were conducted. Three index configurations were used with the following numbers of indices: 3, 7, and 11. Queries #1 – #6 were run against each configuration. As shown in Figure 3 through Figure 6, most of the metrics are comparable until the 100,000-record level is exceeded, at which point great divergence occurs. One of the most interesting findings involves writing, whose curve shape differs completely from those of the other metrics. These figures graphically illustrate why many deletions of small-to-moderate numbers of records are not often noticed as the number of indices increases. However, it also illustrates clearly how the size of the deletion and the number of indices can combine to negate the improvement provided by a specialty deletion index.

Blog_20170810_5

Figure 3: Deletion Queries – Duration (Seconds)

Blog_20170810_6

Figure 4: Deletion Queries – CPU (Seconds)

Blog_20170810_7

Figure 5: Deletion Queries – Reads

Blog_20170810_8

Figure 6: Deletion Queries – Writes

Conclusion

This article illustrated a situation in which the amount of work performed by deletion queries increased dramatically as the number of records to be deleted increased beyond approximately 100,000. Although this particular value is obviously applicable only to the test case, the overall message is clear – as the number of records to be deleted increases well beyond 100,000, the work performed by the query increases dramatically. In addition, this situation worsens considerably as the number of indices spanning a table increases. Therefore, although the overhead associated with deleting low numbers of records may not be noticed as indices are added, the performance of queries that delete large numbers of records will degrade noticeably as the number of indices increases. In some cases, the overhead may negate any improvement that might be gained by adding an index whose purpose is to expedite the deletion process. Therefore, before considering adding an index to improve deletion performance, insure that the batch of deleted records is not too large and the number of indices on the table is small.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Handling Multiple Missing Index Recommendations for the Same Table

— By Jeff Schwartz

The Problem

Many articles concerning SQL Server missing index recommendations demonstrate the mechanics for obtaining them and often highlight whether the suggested key columns are used in equality or inequality relationships. Most of these examples emphasize a single index that can be implemented to improve performance. However, real-life situations often involve multiple or many suggested indices. For example, during a recent customer study, I observed 28 recommended indices for one table and 52 for another. Clearly, metrics such as improvement measure and user impact frequently can be used to determine the most important recommendations, but sometimes there are either too many similar recommendations – or – improvement measures are almost the same for several proposed indices, which makes determining an optimal index difficult. A large number of recommendations also often results in index proliferation, i.e., missing index recommendations are implemented piecemeal with no overall strategy resulting in tables with 14, 17, or 24 indices as the author observed in a recent customer performance study. The size of the table compounds this problem because it is especially desirable to limit the number of indices on these tables. For example, the table that had 24 indices on it contained over 30 million records.

The following examples of proposed indices that inspired this article illustrate the multiple recommendation phenomenon – RecIndex1: Keys (DateVal), Included Columns (Metric, ReptCat, LocationID, Total_Amount) and RecIndex2: Keys (LocationID, DateVal) Included Columns (Metric, ReptCat, Total_Amount). Clearly, without additional information, it is difficult to determine whether these recommendations must remain separate or could be combined into a single index. In this situation, knowing whether the proposed key columns are used in equality or inequality where clauses can be critical. This article will discuss how to use knowledge of equality and inequality relationships to determine an appropriate course of action.

Test Table Creation & Load

To determine missing index recommendation behavior, a generic table was constructed and filled with 20 million records. Each record contained an identity column, an ID column, a text column, and 47 metric columns whose values ranged between 1 and 10,000,000. The large number of table columns was used to insure SQL Server would choose an index option when appropriate. Six queries that incorporated various column combinations were executed (some of which differed only in column ordering). To minimize duplication of column values and skewing of query plans, the ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000 formula was used to generate values that were as random as possible. Two indices were created: a clustered index that used the identity column as its only key and a second nonclustered index that used DupID as its only column. The scripts for the creation, loading, and initial indexing of the table are shown below.

— ##############################################################

— Create test table

— ##############################################################

DROP TABLE FewDuplicates;

CREATE TABLE FewDuplicates (

IDCol bigint identity (20000000,1),

DupID bigint,

MyText varchar(10),

Metric01 bigint, Metric02 bigint, Metric03 bigint, Metric04 bigint,

Metric05 bigint, Metric06 bigint, Metric07 bigint, Metric08 bigint,

Metric09 bigint, Metric10 bigint, Metric11 bigint, Metric12 bigint,

Metric13 bigint, Metric14 bigint, Metric15 bigint, Metric16 bigint,

Metric17 bigint, Metric18 bigint, Metric19 bigint, Metric20 bigint,

Metric21 bigint, Metric22 bigint, Metric23 bigint, Metric24 bigint,

Metric25 bigint, Metric26 bigint, Metric27 bigint, Metric28 bigint,

Metric29 bigint, Metric30 bigint, Metric31 bigint, Metric32 bigint,

Metric33 bigint, Metric34 bigint, Metric35 bigint, Metric36 bigint,

Metric37 bigint, Metric38 bigint, Metric39 bigint, Metric40 bigint,

Metric41 bigint, Metric42 bigint, Metric43 bigint, Metric44 bigint,

Metric45 bigint, Metric46 bigint, Metric47 bigint

)

 

— ##############################################################

— Load original table

— ##############################################################

declare @DupID bigint = 1

declare @NumRecs bigint = 20000000

 

truncate table FewDuplicates

set nocount on

while (@DupID <= @NumRecs)

begin

insert into [dbo].[FewDuplicates] (

[DupID], [MyText],

[Metric01], [Metric02], [Metric03], [Metric04], [Metric05], [Metric06], [Metric07], [Metric08], [Metric09], [Metric10], [Metric11], [Metric12], [Metric13], [Metric14], [Metric15], [Metric16], [Metric17], [Metric18], [Metric19], [Metric20], [Metric21], [Metric22], [Metric23], [Metric24], [Metric25], [Metric26], [Metric27], [Metric28], [Metric29], [Metric30], [Metric31], [Metric32], [Metric33], [Metric34], [Metric35], [Metric36], [Metric37], [Metric38], [Metric39], [Metric40], [Metric41], [Metric42],

[Metric43], [Metric44], [Metric45], [Metric46], [Metric47]

)

VALUES (

@DupID,‘my text’,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000

)

set @DupID += 1

end — group option loop

set nocount off

 

— ##############################################################

— Create indices on the test table

— ##############################################################

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

 

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Queries that Create Missing Index Recommendations

To illustrate the issue, two queries were created that were guaranteed to generate missing index recommendations as well as mimic the behavior of the ones cited in The Problem section. They are listed in Table 1 and the differences are highlighted for easier comparison. The query plans for the two queries are displayed in Table 2 and Table 3. Both queries performed full clustered index scans and generated missing index recommendations. The recommendations are shown below in two pieces within each table. The most important points are that Query Plan #1 specifies Metric14 first and Metric43 second, whereas Query Plan #2 specifies Metric43 alone with Metric14 as an included column. At first glance, these appear to be contradictory and potentially incompatible differences.

Blog_20170720_1

Blog_20170720_2

Blog_20170720_3

Table 4 summarizes the metrics, columns, and full index definitions suggested by SQL Server. Given the very large improvement measure values, the key ordering of the proposed indices, and the somewhat different included columns, it may be tempting to implement both indices despite the fact that the table contained 20 million records. However, more detailed analysis using the data in Table 5 shows that a single index that is constructed properly can accommodate both recommendations and, therefore, both queries. The most important metrics are displayed in column_usage because one uses two equality comparisons and the other only uses one. Therefore, if we specify the equality column used in both queries first and specify the equality/included column second, both queries will be satisfied. VERY IMPORTANT NOTE: Although the key and included column ordering appear obvious because of the column names used in this example table, i.e., suffixes in numerical order, when normal column names like DateVal or LocationID are used, ordering is much less obvious. As cited in my previous blog entitled Query Tuning and Missing Index Recommendations, when ordering is not crucial, e.g., when only equality operations or included columns are specified, SQL Server uses the ordering of the columns in the table itself rather than the ordering specified in the query.

In most cases SQL Server attempts to create covering indices, which are defined to be indices that contain all the columns of a particular query. Please reference the following web page for further information regarding covering indices: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns. In the author’s experience, implementing the keys of a suggested index wtihout the corresponding included columns often results in SQL Server ignoring the new index. Therefore, the included columns are vital to any missing index strategy. Clearly, a point of diminshing returns exists when the number of included columns approaches the total number of columns in the table (especially very large tables), but as long as the number of columns is reasonable, included columns should always be considered. [The queries to obtain the data shown in Table 4 and Table 5 are provided in Table 6.]

Blog_20170720_4

Blog_20170720_5

SELECT avg_total_user_cost * avg_user_impact * (user_seeks + user_scans)

as [Improvement Measure],

avg_user_impact as [Avg User Impact], user_seeks as Executions,

[statement] as TableName,

equality_columns as [Equality Usage], inequality_columns as [Inequality Usage], included_columns as [Included Columns],

‘Create Nonclustered Index ix_SQLRx_’ + PARSENAME([statement],1) + ‘_’ + CONVERT(varchar, group_handle) + ‘_’ + CONVERT(varchar, g.index_handle) + ‘ ON ‘ +

[statement] +‘ (‘ + ISNULL(replace(equality_columns,‘ ‘,),) +

CASE WHEN equality_columns IS NOT NULL AND inequality_columns IS NOT NULL THEN ‘,’ ELSE END +

ISNULL (replace(inequality_columns,‘ ‘,),) +‘)’ +

CASE WHEN included_columns IS NOT NULL THEN ‘ INCLUDE (‘ + included_columns + ‘)’ ELSE END AS [Create Index Statement]

FROM       sys.dm_db_missing_index_groups g

INNER JOIN   sys.dm_db_missing_index_group_stats s ON

s.group_handle = g.index_group_handle

INNER JOIN   sys.dm_db_missing_index_details d ON d.index_handle = g.index_handle

ORDER BY avg_total_user_cost * avg_user_impact * (user_seeks + user_scans) DESC;

 

SELECT statement AS [Table], column_id , column_name, column_usage,

migs.user_seeks as Executions, migs.avg_user_impact as [Avg User Impact]

FROM sys.dm_db_missing_index_details AS mid

CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)

INNER JOIN sys.dm_db_missing_index_groups AS mig ON mig.index_handle = mid.index_handle

INNER JOIN sys.dm_db_missing_index_group_stats AS migs ON mig.index_group_handle = migs.group_handle

ORDER BY mig.index_group_handle, mig.index_handle, column_id

Blog_20170720_8

Table 7 contains the composite index that satisfies both recommendations and queries. It is important to note that Metric43 is the FIRST key and Metric14 is the second. If this ordering is not followed, Query #1 will perform a full scan. Reviewing both queries demonstrates the following:

  1. The key ordering enables the equality operators to apply in both cases.
  2. Having Metric14 as the second key satisfies both the equality condition and the included column condition.
  3. The other columns specified by the queries are supplied so the table data need never be referenced.

 

CREATE NONCLUSTERED INDEX ix_CombinedIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Blog_20170720_9

Table 8 and Table 9 contain the updated query plans, which illustrate the facts that the clustered index scans have been replaced by index seek operators and the clustered index is not accessed to satisfy either query. Note also that parallelism was present in the query plans shown in Table 2 and Table 3, but is absent in the new query plans displayed in Table 8 and Table 9.

Blog_20170720_6

Blog_20170720_7

Conclusion

This article illustrated a situation in which multiple missing index recommendations first appeared to necessitate separate indices, resulting in a single index implementation. Careful examination of the key relationships used in the queries and specified by the missing index recommendations enabled the author to develop one composite recommendation that enabled optimal query performance.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!