Author Archives: sqlrxadmin

SQLRx (www.sqlrx.com) is a Data Management service of Integrated Services, Inc. of Dallas, Texas (www.isi85.com) that specializes in SQL Server performance tuning and RemoteDBA services.

Rx for Demystifying Index Tuning Decisions – Part 4

— by Jeffry Schwartz

Review

In Parts 1 through 3 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events
  • Using query plans to determine relationships between queries and indices

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

https://blog.sqlrx.com/2017/11/02/rx-for-demystifying-index-tuning-decisions-part-2/

https://blog.sqlrx.com/2017/11/09/rx-for-demystifying-index-tuning-decisions-part-3/

Part 4 covers the following topics:

  • Query Optimizer
  • Statistics and how they affect index usage and query performance
  • Consequences of too many indices

Query Optimizer

Overview

The query optimizer determines the appropriate execution plan for a query. “The SQL Server Query Optimizer is a cost-based optimizer.   It analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans and selects the plan with the lowest cost of the choices considered.   Indeed, given that the Query Optimizer cannot consider every possible plan for every query, it actually has to do a cost-based balancing act, considering both the cost of finding potential plans and the costs of plans themselves.”

“Cost-assessment of each plan – While the Query Optimizer DOES NOT GENERATE EVERY POSSIBLE EXECUTION PLAN, it assesses the resource and time cost of each plan it does generate. The plan that the Query Optimizer deems to have the lowest cost of those it’s assessed is selected, and passed along to the Execution Engine.” Source: http://www.simple-talk.com/sql/sql-training/the-sql-server-query-optimizer. Therefore, the best plan is NOT GUARANTEED and too many choices yield indeterminate results.

Statistics

Statistics contain information about distribution of values in one or more columns of table or indexed view. The Query Optimizer uses statistics to estimate the number of rows that a particular query operator will generate. Outdated statistics can cause a query to use a suboptimal index or perform a full scan, so it is prudent to insure that they are up-to-date as much as possible. Large numbers of insertions or key modifications can cause them to be skewed and inaccurate with respect to the current data because even when statistics are updated automatically they may lag behind the actual data. The AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are automatically on by default in Tempdb and often on by default in many other databases. Please see the following article for a complete discussion of statistics and when SQL Server determines that they are sufficiently out-of-date to warrant updating: https://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx.

Out-of-date statistics can be determined with very little overhead by using the stats_date (object_id, index_id) function and the sys.indexes view. Since creating or updating statistics can be expensive depending upon table size, sample size, and frequency, several options are available to keep the overhead at a reasonable level and these are shown below.

  • FULLSCAN (most accurate and expensive to create)
  • SAMPLE <n> PERCENT or ROWS
  • RESAMPLE (uses last settings)
  • ALL or COLUMNS or INDEX

Balancing the need for updated statistics and the costs of creating or updating them can cause customers to turn off all automatic options and then update them manually using the preceding commands. However, sometimes these good intentions can cause surprisingly out-of-date statistics as shown in Table 4 below. This customer did not intend for this situation to occur, but as the reader can see clearly from the example, the statistics of many large tables were not updated for over a year. Obviously, this is an undesirable situation, especially since many thousands of records were inserted into some of these tables daily.

Blog_20171116_1

Table 4: Extremely Outdated Statistics

 

Query Plan Consequences of Too Many Indices

Larger numbers of indices create exponentially more query plan possibilities. When too many choices exist, the Optimizer will give up partway through and just pick the best plan thus far. As more indices are added the problem worsens and compilation times, i.e., processor times, increase to a point. This can be illustrated best by reviewing an actual customer example. In this case, one table had 144 indices attached to it and several others had between 20 and 130 indices. The queries were quite complex with as many as fifteen joins, many of which were outer joins. Query and index tuning were impossible because query performance was often counterintuitive and sometimes nonsensical. Adding an index that addressed a specific query need often made the query run worse one time and better the next. Note: Cached query plan issues, e.g., parameter sniffing or plan reuse were not problems in this case. The only solution was to tear down ENTIRE indexing structure and rebuild it with SQL Server’s guidance and nine days’ worth of production queries. Table 5 summarizes the results of the index restructuring project. The performance of 98 percent of the queries was comparable to or better than it was when the large numbers of indices were present.

Blog_20171116_2

Table 5: Index Optimization Project Results

 

How Did Things Get This Bad?

The original application was built in Microsoft Access and each office used its own version of the database. The company decided to consolidate all of the Access databases into a single corporate-wide SQL Server database and implemented most of the indices that had been used within Access. No formal or cohesive indexing strategy was ever implemented. Desperation combined with bad luck and extremely complex queries created a perfect environment for this kind of problem to arise. The list below summarizes the key points regarding how they got themselves into this mess.

  • 24/7 application because of 3 worldwide time zones and replication
  • Many queries were at least 7,000 characters long with 15 inner and outer joins
  • Performance tanked after more than 100 separate databases were combined into a single SQL Server database
  • Application did not scale and they could not return to separate databases
  • Ran Database Engine Tuning Advisor repeatedly, and blindly added ALL indices it recommended (more later on DTA)
  • Performance worsened, so they added more indices based upon experimentation and hope
  • Database crashed and after restoration from an older database backup (problems occurred during some of the restorations) performance worsened
  • So many indices existed that they did not know which ones still were needed, so they scripted all indices from before and implemented them with new names just to be sure all requisite indices were present in the database
  • They had no idea how to determine unused, unnecessary, or fully or partially duplicated indices

Index Insertion Operation

Every record insertion creates overhead maintenance work because all non-subset indices are updated for every insert. Page splits may be involved if a page has been filled already. Upper index levels may be affected and may split as well. A page split is defined as follows: “When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations.” Locks are held while all this happens, and fillfactor and insertion rate affect frequency of this activity. Further research can be conducted via http://msdn.microsoft.com/en-us/library/ms177459.aspx.

Consequences of Too Much Overhead

Increased overhead leads to lower transaction throughput, increased blocking, and increased deadlocking. These worsen if queries do not use the Nolock query hint because Page and Row locks are acquired and potentially held longer. Use of Nolock can greatly reduce these problems for selection queries — do not forget to add the Nolock hint to join clauses. Research transaction level isolation for further information. In high volume record insertion cases, page latch wait time can become a major constraining factor.

The next article in this blog series will cover how to detect indices whose functionality is duplicated by other indices.   Until then….happy tuning and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Bob Ward to speak at DFW BI & Analytics Community Meeting on 12/6

DFW_BI_Community

Come see Bob Ward from Microsoft speak about “Fast and Furious: SQL 2017” at the DFW BI & Analytics Community Meeting on 12/6, 11:30-1:00 at Noah’s in Irving.  You can register on the “Event” section of the ISI website at www.isi85.com .

The DFW Business Intelligence and Analytics Community welcomes Bob Ward of the Microsoft Database Systems Group. You may already know him from his blog SQL Server According to Bob.   If you were unable to attend 2017 PASS Summit, Bob will be giving the same groundbreaking presentation he delivered there.

Is your database slowing down your business? Are you using Linux as a platform but can’t find the right database that meets your performance needs without busting your budget? Built on the industry recognized “it just runs faster” SQL Server 2016 engine, SQL Server 2017 provides new capabilities for performance, value, and choice. In this session, we show you features and a “behind the scenes look” at what powers SQL Server 2017 for Linux and Windows. And if you are a database expert and want to tune the SQL Server engine to its maximum turbocharged capacity, we show you the tools, diagnostics, and techniques that make SQL Server 2017 the ultimate database platform choice to run your applications at a fast and furious speed.

Rx for Demystifying Index Tuning Decisions – Part 3

— by Jeffry Schwartz

Review

In Parts 1 and 2 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning
  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

https://blog.sqlrx.com/2017/11/02/rx-for-demystifying-index-tuning-decisions-part-2/

Part 3 covers the following topic:

  • Using query plans to determine relationships between queries and indices

Query or Execution Plans

Query plans have already been mentioned as being useful for query tuning. They can be invaluable for determining which part of a query causes performance issues, which indices are used, and HOW the indices are accessed, e.g., randomly or sequentially. A full discussion of query plans is beyond the scope of this presentation, but graphical plans are used most often. Graphical plans come in two forms: Estimated (predicted behavior) and Actual (actual behavior). XML plans contain additional information, but they are difficult to read even with an XML editor. Querying the XML data can provide invaluable information, but the queries to extract this information can consume a great deal of CPU time, especially when many complex plans must be examined, so it is best to peruse this kind of data on a server that is not used for production work whenever possible.

Interpreting Graphical Query Plans

It is important for any database analyst to have a basic understanding of how to read query plans. These plans are graphical flow diagrams of how the query will be executed and what information will flow from one operator to another. The diagrams flow from bottom right to top left, and quite often the most crucial operators are near the right and sometimes near the bottom as well. Each node in the tree structure is represented by an icon and specifies the logical operator that will be used to execute that portion of the query. A popup appears displaying details about the operator when the mouse pointer is placed over an operator. This information is also available, and often easier to read, via the properties window (F4 in SSMS or right-click the node and click Properties). Each node is related to a parent node by arrows whose width is proportional to the number of rows returned by the operator. Note: The actual number of rows is used when available. Otherwise, the estimated number of rows is used. Sometimes, these are vastly different and these differences often indicate that statistics are not up-to-date or sufficiently comprehensive.

Large operators may run on parallel threads because the single-threaded version would run too long. For parallel queries that involve multiple threads/CPUs the node properties will display information about the operating system threads used. Parallelized nodes have graphical indicators, two arrows on a beige circle, to inform the viewer that the operator will be divided into multiple pieces for execution as shown in Figure 1 and Figure 2.

Blog_20171109_1

Figure 1: Query Plan Parallelism Operator

It is important to know what the most common data access operators are, as well as how to interpret them. An index seek operator indicates the retrieval of one or more rows using keys. Clustered or NonClustered indicate the type of index that is accessed, and a NonClustered Index Seek could possibly indicate the use of a covering index. Key and RID Lookups indicate a lookup of a row using the row ID or clustering key (heap or clustered index) to obtain information from the record that is not contained in the nonclustered index that was used for primary access. Key and RID Lookup operators are most often associated with a nested loop operator and a nonclustered seek operator. The Key and RID Lookup operators usually produce output, but not always. When they are present, the output column list indicates a possible opportunity to add those columns to the nonclustered index to create a covering index for this query. When the output column is missing, these operators are used to effect a join. A simple way to determine whether output columns exist is to view the Output List that is displayed for the operator popup like the list shown in Figure 3. Index Scan (Clustered or NonClustered) operators scan ALL rows of the index even if a where clause is used. Scan operators read ALL rows from a table or clustered index regardless of the where clause. All scans should be avoided a much as possible unless the tables are small. Examples of a clustered index scan operator and its properties are shown in Figure 2 and Figure 3.

 

Blog_20171109_2

Figure 2: Clustered Index Scan Operator Graphic

 

Blog_20171109_3

Figure 3: Clustered Index Scan Operator Properties

 

Other commonly used operators include nested loop, bitmap, and hash match. Nested Loops perform inner join, left outer join, left semi join, and left anti semi join logical operations. These operators are potentially very deceptive because although each iteration of a loop might be very efficient, there may be millions of iterations. Bitmap operators may speed up query execution by eliminating rows with key values that cannot produce any join records before passing rows through another operator such as a Parallelism operator. Hash Match operators build hash tables by computing a hash value for each row from its input. These are usually very bad and frequently indicate that the query and its joins should be rewritten.

The plan for the query below (originally covered in Part 2)….

Blog_20171102_4

XEvents Statement Completed Sample Output

 

…is shown in Figure 4 and the properties for the nonclustered index scan are shown in Figure 5. The highlighted portions illuminate some important aspects of the query plan. The first one highlights the missing index recommendation and illustrates the key and included columns for the suggested index that would make this query run faster. The second highlight emphasizes the hash match, and the last one illustrates why the nonclustered index scan occurred. Specifically, the EventClass column was used by the index scan operator to look up records in the table with values of 166 because no appropriate index existed on the CompletedEventText table. The output columns, RowNumber and QueryChecksum, were returned by the index scan operator. It is important to note that the missing index recommendation specifies the same columns that are shown in the Output List in Figure 5. Generally speaking, index and table scan operators do not produce missing index recommendations, which will be discussed in more detail later in this series. Note: Prior to SQL Server 2014, the author had never observed an index scan operator that produced a missing index recommendation. This appears to be a function of the new Cardinality Estimator that was implemented in SQL Server 2014 and requires that the database compatibility level be set to SQL Server 2014 or later to use it. Otherwise, the old estimator that goes back to SQL Server 7.0 is used. Further information regarding the cardinality estimator can be found at https://blogs.technet.microsoft.com/dataplatform/2017/03/22/sql-server-2016-new-features-to-deal-with-the-new-ce/.

Blog_20171109_5

Figure 4: Query Plan with Hash Match and Index Scan Operators

 

 

Blog_20171109_6

Figure 5: Index Scan Operator Properties from Query Plan with Hash Match and Index Scan

 

The next article in this blog series will cover the query optimizer.   Until then….happy tuning and please contact us with any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 2

— by Jeffry Schwartz

Review

In Part 1 of this blog series the author discussed various aspects of SQL Server indices such as:

  • Types of indices and their variations, e.g., clustered, nonclustered, filtered, and covering
  • Physical versus logical index access methods, e.g., seeks, scans, key/RID lookups, singleton lookups, and range scans
  • Why index tuning should begin with queries
  • Determining an Appropriate Strategy for Index Tuning

https://blog.sqlrx.com/2017/10/26/rx-for-demystifying-index-tuning-decisions-part-1/

Part 2 covers the following topics:

  • Determining Queries that Need Indices the Most
  • Capturing query performance information using Extended Events

Determining Queries that Need Indices the Most

It is simple in a test or development environment using a significant subset of production data to obtain an actual (and useful) query plan of a troublesome query. Business transactions often require many queries to complete so determining which ones are the most troublesome can be difficult. The author has observed countless customers who were certain they knew exactly where the performance problems were only to find out the problems were elsewhere. Therefore, collecting the appropriate performance data on a production server is critical in determining the queries that need tuning most. Examples of query performance criteria are listed below:

  • Individually or collectively run too long
  • Individually perform too much work, e.g., reads or CPU
  • Collectively perform too much work, e.g., 200,000 executions that each execute 50,000 reads -> 10 BILLION reads in total

Table 1 shows examples of queries that potentially need tuning based upon the number of executions, total reads, total duration, total CPU time, and average reads per execution. This kind of report immediately focuses attention on the queries that might benefit the most from either index or query tuning. The five queries highlighted in Table 1 underscore these criteria. The ones highlighted in yellow were the worst offenders because their executions collectively performed the most reads with the worst one totaling 3.5 BILLION reads. The ones highlighted in light green and orange accounted for the most CPU time as well as the longest total duration. The one highlighted in slate ran the most times, and the ones highlighted in gray performed the most reads per execution. This information is vital when determining where query and index tuning should be focused. Table 2 provides additional information in that it shows whether the large numbers of reads were consistent regardless of the specified input parameters or if they were parameter-dependent. For example, the 80th percentile of the third query was 453,515, which means that 80 percent of the queries performed 453,515 reads or fewer. The 100th percentile represents the maximum of 1.724 billion reads and tells the analyst that 20 percent of the queries performed at least 453,515 reads. This data informs the analyst that the specific parameters of the execution that performed 1.724 billion reads should be used for tuning. The read behavior of the last two queries, highlighted in light green and light orange, indicates that the choice of parameters does not make any notable difference in execution behavior. The query highlighted in light blue exhibits similar performance. This kind of information is invaluable because the author has observed too many tuning efforts on development or test machines that could not reproduce the performance of the query in the production environment. Sometimes, this results from a dramatic difference in the sizes of the respective databases, but other times it results from an inappropriate choice of parameters.

Blog_20171102_1

Table 1: Summary Query Metrics Example

 

Blog_20171102_2

Table 2: Query Reads Distribution Example

 

Extended Events (XEvents)

Extended Events are very useful for capturing comprehensive workload performance information because they are lightweight and can be tailored to capture only the most important information. They were introduced in SQL Server 2008, but didn’t mature until SQL Server 2012. A full treatment of how to capture important query performance data using XEvents is beyond scope of this blog series. However, further information can be viewed at https://blog.sqlrx.com/2016/01/28/sql-server-20122014-extended-events-for-developers-part-1. XEvents use the concept of a session that must be configured, started, and stopped to capture data. This can be accomplished using SSMS or T-SQL. The basic steps involved with capturing and reporting XEvents data are listed below:

  1. Create Event Session specifying types of events to capture based upon their duration, CPU time consumption, database ID, etc.
  2. Start session
  3. Stop session
  4. Either use SSMS or T-SQL XML query to extract the captured data (SSMS easiest way to get started)

An example of some T-SQL session specification code is shown in Figure 1. Important note: The Duration filter is in MICROSECONDS. Use the following T-SQL command to start the session: ALTER EVENT SESSION WorstQueries ON SERVER STATE = START and this command to stop the session: ALTER EVENT SESSION WorstQueries ON SERVER STATE = STOP. Once the session has been stopped, open WorstQueries.xel using SSMS or use T-SQL to report or load the data into a database table. Note: XEL files can be viewed in real-time, if necessary. Figure 2 displays a completed statement sample viewed via SSMS.

Blog_20171102_3

Figure 1: Code that Creates WorstQueries Session

 

Blog_20171102_4

Figure 2: XEvents Statement Completed Sample Output

 

The next article in this blog series will cover query/execution plans.   The fun stuff is coming up!!  Until then….happy tuning and please let us know if you have any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Rx for Demystifying Index Tuning Decisions – Part 1

— by Jeffry Schwartz

Overview

Indices are one of the most troubling and mysterious areas for DBAs and developers. Most understand that adding a badly needed index can yield dramatic query performance improvements, but how does one know what is needed? Many believe more indices are better, regardless of the number. Most fear deleting ANY existing indices, even when the indices partially or fully duplicate functionality because they fear execution times will skyrocket if anything is changed. Others implement the recommendations of the Database Engine Tuning Advisor (DTA) without question while not understanding what the ramifications of those actions are.

Why does so much mystery surround indexing? One reason is that many DBAs possess insufficient knowledge especially regarding the following issues:

  • Their design, construction, and interaction with queries
  • Measurement of their usage both from a query and a SQL Server perspective
  • Determining viable missing index candidates and how to implement any recommendations SQL Server might produce

Index usage and tuning metrics became available on SQL Server 2005 with Dynamic Management Views and Functions, which will be discussed later. However, the meanings and significance of index DMV/DMF metrics are still not well understood by many despite only minor additions over the years. Specifically, the following list contains a synopsis of the topics that the author has observed to be the most salient index-related issues:

  1. Identifying
    • Queries that need an index to function efficiently
    • Which indices, if any, are used by a query (and how they are used, e.g., randomly or sequentially)
    • Tables (and their indices) that merit evaluation and potential adjustment
    • Indices that duplicate functionality of others
  2. Understanding when
    • A new index is truly needed and what improvement can be anticipated
    • An index can be deleted without harming performance
    • An index should be adjusted or added despite the query not producing any missing index warnings
  3. Understanding why having too many indices results in
    • Inserts and updates taking too long and/or creating blocking
    • Suboptimal query plans being generated because there are too many index choices
  4. Knowing Database Engine Tuning Advisor (DTA) pros & cons

This blog series will help the reader with these issues by covering topics such as:

  • Common types of indices and how they function (clustered, nonclustered, & covering)
  • Detecting queries that need rewriting or require indexing help
  • How queries use various types of indices, e.g., seeks, scans, and lookups
  • Using query plans to determine relationships between queries and indices
  • Statistics and how they affect index usage and query performance
  • Query Optimizer
  • Consequences of too many indices
  • Detecting indices whose functionality is duplicated by other indices
  • Monitoring actual index usage from both a logical, i.e., query, and a physical perspective
  • Determining which indices are hardly, if ever, used
  • Determining missing indices and their estimated impact if implemented
  • Incorporating missing index recommendations either into existing indices or as few indices as possible

Indexing Overview

Two basic types of indices exist in SQL Server: clustered index and NonClustered index. A clustered index (CI) IS the data table with a binary tree attached to it. It contains leaf and non-leaf levels as shown in the Figure 1 diagrams below. The leaf levels contain the actual data and the non-leaf levels point to the leaves in the binary tree. A nonclustered index only contains index-related entries, i.e., keys and included columns. The same levels that exist in a clustered index also exist in NonClustered (NC) indices whose leaf levels point to the table (heap) or a clustered index. Keys exist throughout the entire binary tree. Although clustered index keys are contained in all NonClustered indices, they cannot be used directly by a query. Therefore, one should keep clustered index keys as small as possible. If the analyst KNOWS that a LOT of range scans will be performed based on the clustered index key, making the keys incorporate the range scan columns will help keep the scanned entries physically closer at least within a page, especially immediately after the index is reorganized or rebuilt. Duplicates in clustered indices are not allowed PHYSICALLY, so in non-unique situations SQL Server will automatically add a duplicate resolver (uniqueifier in the Microsoft literature) that adds four bytes to every key entry in the table, including the non-leaf levels). Clustered indices are often preferred by SQL Server for sizable sequential or semi-sequential access except when the NonClustered index is very narrow and the data record is fairly wide. Examples of clustered index and nonclustered index definitions are shown below:

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Blog_20171026_1

Figure 1: Physical Index Layouts

A covering index is a NonClustered index that contains ALL the data needed by a particular query so the table or clustered index need not be accessed at all. If the values are only present in the query select list and not used in any where or join clauses, the columns can be “included” in a NonClustered index. Included columns are ONLY allowed in NonClustered indices and ONLY appear in index leaf levels. They do not exist in upper binary tree levels, which can be 3 or more levels depending upon the number of records in the table. An example of a covering index is shown below using included columns to satisfy the query’s needs.

CREATE NONCLUSTERED INDEX ix_CombinedIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

When SQL Server makes a missing index recommendation that uses included columns, e.g., when a query plan is obtained in SSMS, it will often ignore a NonClustered index that does not have the recommended included columns, especially if the clustered index has some of the keys. For example, a developer with whom the author worked was tuning a query and the missing index recommendation displayed in the query plan recommended a new index that contained three key columns and three included columns. The developer implemented the index without the included columns. When he obtained the query plan again, he was surprised to discover that SQL Server ignored the new index. When he added the three recommended included columns, SQL Server used the index. Missing index recommendations and query plans will be discussed later in this blog series.

Finally, a “filtered” index only contains specific entries, e.g., all records where the payment date is zero. The simplest way to think of this kind of index is to consider it to be an index with a where clause, which is demonstrated below:

CREATE NONCLUSTERED INDEX ix_FilteredIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

where [Metric43] > 100000000

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Index Access Methods

LOGICAL index access methods can be thought of as coming from a query perspective, i.e., how a query might “view” the access operation or the operators that appear in a query plan. SQL Server records three types of LOGICAL index access methods: Seek (random access retrieval of one or more records), Scan (linear search, i.e., reading a data table or clustered index from beginning to end), and Key/RID Lookup (Index accessed first either randomly or sequentially, but some requested data is missing from the index, so the table or clustered index must be accessed for remaining columns to effect a join or produce the requested query output). It is important for the reader to understand that a Seek may result in a single record access or the access of a range of records. Another important issue involves what causes SQL Server to perform a Seek or a Scan operation. Obviously, SQL Server will perform a scan if a suitable index does not exist, but not so obviously, it will also perform a scan if the table is small enough or an appropriate index would return more than approximately 25 percent of the records in the table. Clearly, the scan is not a bad thing in the first case and in the second case, the optimizer has determined that the most efficient way to accomplish the task to scan the table. The second case may also occur when the table grows sufficiently large and the query filter specification has become sufficiently vague to cause the scan to occur. For example, this may occur when a query requests all the orders or parts needed by manufacturing from the beginning of the year. When the company is smaller or the interval is only a few months, this type of request may be satisfied with a seek operator, but as the table grows or the number of records for the same period increases, the scan threshold is exceeded and SQL Server will perform a scan instead of a seek. When this occurs, queries that generally perform well suddenly perform poorly, usually to everyone’s consternation.

PHYSICAL index access methods can be thought of as SQL Server’s internal perspective, i.e., how SQL Server actually performs the access operations requested by a query. SQL Server records two types of PHYSICAL index access methods: Singleton Lookup (single record lookup via index, i.e., logical seek) and Range Scan (multiple record lookup via index or table, i.e., logical seek OR logical scan).

Using these two perspectives, an analyst can often determine how a table and its index structure are used by a group of queries.

Why Begin Index Tuning with Queries?

Indices not used in a vacuum; their existence is driven by business work as represented by database queries. Queries determine what indices are required, so an analyst must determine which queries (and tables) need the greatest indexing help. Therefore, index tuning should NOT be performed on a one-off basis. Analysts should resist the temptation to add an index to solve ONE problem. Therefore, it is important for an analyst NOT to work on just ONE index to solve a SINGLE query problem unless query performance is critical and abominable. One must ALWAYS be aware of other queries’ usage of any current or proposed index. Ideally, one index accommodates numerous queries, so index tuning should be performed as a comprehensive solution.

Determining an Appropriate Strategy for Index Tuning

The following contains the list of objectives and considerations for any index tuning project:

  1. Must queries perform optimally because volumes are very high?
  2. Will ANY reduction in work improve performance?
  3. Sometimes indices cannot remedy the problem because the queries are written so poorly that the real solution is to rewrite the query, probably using Common Table Expressions (CTEs) and possibly, adding or modifying some indices.

The author once worked with a customer whose developers questioned why he suggested investigating a particular query for tuning when it ONLY performed 54,000 reads per execution. The answer was that since the query was executed approximately 140,000 times daily, even a reduction of 1,000 reads would translate into a 140,000 read reduction in overall database work. Several queries fell into this category and since they were used in a hospital emergency room environment, speed was critical. Other queries performed millions or billions of reads during each execution. Clearly, any performance improvement would have been noticeable in this situation. These kinds of queries often prove to be very difficult to tune because they are quite complex, e.g., the author has tuned queries that employed fifteen inner and left joins. Others utilized many large tables and used many joins, some of which were contained within views. Regardless, the numbers and sizes of the tables have a direct bearing upon how difficult a query is to tune. Sometimes performance improvements can be realized by adding or modifying indices, but many times the query itself has to be rewritten using either CTEs or temporary tables (or both). Sometimes very large queries have to be broken into pieces to determine exactly what portions cause the query to perform poorly, and CTEs are ideal for this kind of analysis and tuning.

How the queries and indices interact is a major consideration, e.g., is the clustered index used because a more appropriate index does not exist? Does the query perform a series of single record lookups (even if there are many thousands of them) or does it scan some of the larger tables in their entirety? How many indices exist on each of the tables? This kind of issue is shown in Table 1 below. Are there too many index choices for the optimizer? Are any indices no used at all or very seldom? All of these issues factor into index and query tuning strategies. The ideal scenario is to have as few indices as possible without creating a need that has to be addressed by a full table clustered index scan, but finding this balance is often difficult.

Blog_20171026_2

Table 1: Total Existing Index Count by Table and Database

The next article in this blog series will cover how to determine queries that need indices the most and extended events. Until then….enjoy life and let us know if you have any questions.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Log Connections to SQL Instance

— by Ginger Keys

If you ever have a need to monitor connections to your SQL server, and any related information about the connections such as database, logins, etc., there are some DMVs that can give you tons of information. Previously you might have used the sys.sysprocesses table to derive much of this information, but this is being deprecated in the most recent versions of SQL server.

Instead, you can collect valuable information from these DMVs:

sys.dm_exec_sessions   https://msdn.microsoft.com/en-us/library/ms176013.aspx

sys.dm_exec_connections   https://msdn.microsoft.com/en-us/library/ms181509.aspx

sys.dm_exec_requests   https://msdn.microsoft.com/en-us/library/ms177648.aspx

In order to capture and retain connection information for my SQL server, I will create a small database and a table to hold some basic information. Of course you can alter the script to include more, less, or different data than what I am demonstrating below, to better fit your specific information needs.

I will create a database and a table, then insert data from two of the DMVs listed above.

Step 1 – Create a table to hold login activity

— Create a database

USE master

GO

CREATE DATABASE [Connections]

ON PRIMARY

( NAME = N’Connections’, FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\Connections.mdf’ ,

SIZE = 1024MB , FILEGROWTH = 512MB )

LOG ON

( NAME = N’Connections_log’, FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\Connections_log.ldf’ ,

SIZE = 1024MB , FILEGROWTH = 512MB)

GO

 

— Create table to hold Login info

USE [Connections]

GO

CREATE TABLE [dbo].[LoginActivity]

(

host_name [nvarchar](128) NULL,

program_name [nvarchar](128) NULL,

login_name [nvarchar](128) NOT NULL,

client_net_address [nvarchar](48) NULL,

DatabaseName [nvarchar](128) NOT NULL,

login_time [datetime] NOT NULL,

status [nvarchar](30) NOT NULL,

date_time[datetime] NOT NULL,

) ON [PRIMARY]

GO

 

Step 2 – Insert Data into Table

If you need to retain or archive this connection information, you can create a database which will hold the information, or export the results to a spreadsheet or other file. Otherwise you can simply select the information from the DMV below if you only need to see current data.

 

USE Connections

GO

INSERT INTO LoginActivity

(host_name,

program_name,

login_name,

client_net_address,

DatabaseName,

login_time,

status,

date_time)

— run the following select statement by itself to see connection info if you don’t want to save the output

SELECT

s.host_name,

s.program_name,

s.login_name,

c.client_net_address,

d.name AS DatabaseName,

s.login_time,

s.status,

GETDATE() AS date_time

FROM sys.dm_exec_sessions s

JOIN sys.dm_exec_connections c ON s.session_id = c.session_id

JOIN sys.databases d ON d.database_id = s.database_id

–where d.name = ‘ABCompany’ –can specify databases if needed

WHERE GETDATE() >= DATEADD(hh,-10, GETDATE()) –date range can be adjusted

 

Step 3 – View/Save Output Results

After inserting the data into my table, I can see the current connections from the last 10 hours (as per my insert statement). On a production server, this list would be far greater.

SELECT * FROM LoginActivity

Blog_20171005_1

From the columns I have included in my table:

Host_name – will give you the name of the workstation connecting – shows NULL for internal sessions.

Program_name – tells you the name of the client program or application connecting.

Client_net_address – provides the host address of each client connecting

Login_name, DatabaseName, and login_time – self-explanatory.

date_time – is the current day and time the query is run

Status – gives the status of the session, which will be running, sleeping, dormant, or preconnect.

This information can also be output to a text or excel file if preferred.

Blog_20171005_2

Conclusion

Being able to see users or applications making connections to your SQL Server can be useful or necessary for many reasons. The steps outlined above provide a general guideline for deriving connection information that can be altered to fit your organizational needs.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Get Index Column Info with Includes for One or Many Tables

— by Lori Brown @SQLSupahStah

I was recently working with one of my clients on some low hanging fruit type of query tuning. We had checked the cache for plans with missing index warnings in them and were trying to see if we could tweak and existing index or add a new index to speed things up. If you ever work with missing indexes, you surely have seen it recommend crazy things, duplicates or existing indexes or it wants something that can be added to an existing index.

The bottom line for missing index recommendations is that you should NEVER, EVER create a missing index unless you know for absolutely sure that the index does not exist and that it will really help a known performance issue. For instance, you can have missing index warnings on small tables that only return a couple of rows to the query. Those are usually not worth working on since scanning a small table can usually be done very quickly by SQL. To this day, I still find many databases that are way over indexed with indexes that were implemented simply because someone found a missing index warning and did not do their homework or they ran the dreaded Database Tuning Advisor which shoved a bunch of duplicate indexes into tables.

If you are wondering how to get a list of missing indexes, please check out Jeff Schwartz’s blog post on how to do this. (https://blog.sqlrx.com/2017/06/02/query-tuning-and-missing-index-recommendations/ ) This will give you a place to start. It is better if you know what query is throwing the missing index warning so it is a good idea to collect those either in a trace or extended events. Jeff builds on his first post and in his second post on the subject (https://blog.sqlrx.com/2017/07/20/handling-multiple-missing-index-recommendations-for-the-same-table/ ) also goes over the fun of having multiple missing index recommendations for a single table and how to deal with them.

Here’s a handy set of links for some of Jeff’s great index tuning work that you should really check out:

https://blog.sqlrx.com/2016/01/28/sql-server-20122014-extended-events-for-developers-part-1/

https://blog.sqlrx.com/2017/06/02/query-tuning-and-missing-index-recommendations/

https://blog.sqlrx.com/2017/07/20/handling-multiple-missing-index-recommendations-for-the-same-table/

https://blog.sqlrx.com/2017/08/10/how-indexing-affects-deletion-queries/

One of the things that I usually need when performance tuning is to know information about the existing indexes on specific tables. I always want to know what columns are in the indexes along with the included columns so that I can compare the existing indexes to the missing recommendations. This way I can better figure out if a recommendation is something that can be added to an existing index (like an included column) or if I really need to create a brand new index if it does not exist at all.

Like most DBA’s, I keep a toolkit with all kinds of handy scripts. However, I did not have one that would give me index included columns. I also wanted the query to be able to return info from one or many tables at the same time. This would be useful when dealing with things with lots of joins. I know that there are a few bloggers who have posted something similar but I wanted to have the ability to filter on one or multiple tables. So, here is what I came up with:

/********************************

Returns index columns with included columns

plus other needed index info for tables

in @tablelist variable

*********************************/

 

DECLARE @tablelist VARCHAR(1000)

DECLARE @sqlstr NVARCHAR(MAX)

 

SET @tablelist = ‘InvoiceLines,OrderLines,StockItemHoldings’ — comma delimited list of tables, can be one or multiples EX: ‘mytable’ or ‘mytable,nothertable,thirdtable’

 

— Query the tables

IF @tablelist <> OR @tablelist <> ‘?,?,?’

BEGIN

SET @tablelist = REPLACE(QUOTENAME(@tablelist,””), ‘,’, ”’,”’) — Add quotes so the IN clause will work

 

SET @sqlstr = ‘SELECT SCHEMA_NAME(o.schema_id) AS SchemaName

       ,o.name AS TableName

       ,i.name AS IndexName

       ,i.type_desc AS IndexType

       ,c.name AS ColumnName

       ,ic.is_included_column

       ,ic.index_column_id

       ,t.name

       ,c.max_length

       ,c.precision

       ,c.scale

FROM sys.indexes i

JOIN sys.index_columns ic ON i.index_id = ic.index_id

JOIN sys.columns c ON ic.column_id = c.column_id AND ic.object_id = c.object_id

JOIN sys.objects o ON o.object_id = i.object_id AND c.object_id = i.object_id

JOIN sys.types t on t.user_type_id = c.user_type_id

WHERE o.name IN (‘+@tablelist+‘)

ORDER BY SchemaName, TableName, IndexName, index_column_id’

 

EXEC sp_executesql @sqlstr

–PRINT @sqlstr

END

All you have to provide is a comma separated list of the table(s) you are interested in for the @tablelist variable and it will do the rest. The output looks like this:

Blog_20170928_1

I found several bloggers who had made queries that would concatenate the columns together but truthfully I found those hard to read so I settled for a list with some extra info on the data and index types. Hope this is useful to someone out there.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

 

Find Tables That Have Special Features Enabled

Find out if any of tables in your database have special features enabled using the queries below.  These features need to be understood and carefully managed.

— CDC Enabled Tables

select distinct t.name AS CDCTables

from sys.tables t

where t.is_tracked_by_cdc = 1

 

— File Tables — SQL 2012 +

select distinct t.name AS FileTables

from sys.tables t

where t.is_filetable = 1

 

— Temporal Tables — SQL 2016 +

select distinct t.name AS TemporalTables

from sys.tables t

where t.temporal_type > 0

 

— Stretch Enabled Tables — SQL 2016 +

select distinct t.name AS StretchTables

from sys.tables t

where t.is_remote_data_archive_enabled > 0

 

— External Tables — SQL 2016 +

select distinct t.name AS ExternalTables

from sys.tables t

where t.is_external > 0

https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server

https://docs.microsoft.com/en-us/sql/relational-databases/blob/filetables-sql-server

https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables

https://docs.microsoft.com/en-us/sql/sql-server/stretch-database/enable-stretch-database-for-a-table

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Find Out Who Changed the Database Recovery Model

— By Lori Brown @SQLSupahStah

I ran into a situation where we were working on a migration and had been directed to put all databases into FULL recovery model in anticipation of using log shipping to push databases to the new server. Once we are ready to go live on the new server the plan was to ship the last transaction logs and then restore them WITH RECOVERY in an effort to make the final cutover as quick as possible. Of course this means that we had to make sure that all databases were having regular log backups, which we did. Things were going along nicely until we started receiving log backup failure notifications.

Upon checking things, we found that one of the databases had been changed to SIMPLE recovery model. You can find this type of information in the default trace or you can simply scroll through the SQL error logs until you find the entry that you are looking for. If you have a busy instance that has a lot of entries in the error log, this can be a bit time consuming so I came up with a set of queries that will grab the error log entry and attempt to tie it to the info in the default trace so that it was easier to identify WHO was the culprit who made an unauthorized change to the database properties.

 

DECLARE @tracefile VARCHAR(500)

DECLARE @ProcessInfoSPID VARCHAR(20)

 

CREATE TABLE [dbo].[#SQLerrorlog](

[LogDate] DATETIME NULL,

[ProcessInfo] VARCHAR(10) NULL,

[Text] VARCHAR(MAX) NULL

)

 

/*

Valid parameters for sp_readerrorlog

1 – Error log: 0 = current, 1 = Archive #1, 2 = Archive #2, etc…

2 – Log file type: 1 or NULL = error log, 2 = SQL Agent log

3 – Search string 1

4 – Search string 2

 

Change parameters to meet your needs

*/

— Read error log looking for the words RECOVERY

–and either FULL, SIMPLE or BULK_LOGGED indicating a change from prior state

INSERT INTO #SQLerrorlog

EXEC sp_readerrorlog 0, 1, ‘RECOVERY’, ‘FULL’

 

INSERT INTO #SQLerrorlog

EXEC sp_readerrorlog 0, 1, ‘RECOVERY’, ‘SIMPLE’

 

INSERT INTO #SQLerrorlog

EXEC sp_readerrorlog 0, 1, ‘RECOVERY’, ‘BULK_LOGGED’

 

UPDATE #SQLerrorlog

SET ProcessInfo = SUBSTRING(ProcessInfo,5,20)

FROM #SQLerrorlog

WHERE ProcessInfo LIKE ‘spid%’

 

— Get path of default trace file

SELECT @tracefile = CAST(value AS VARCHAR(500))

FROM sys.fn_trace_getinfo(DEFAULT)

WHERE traceid = 1

AND property = 2

 

— Get objects altered from the default trace

SELECT IDENTITY(int, 1, 1) AS RowNumber, *

INTO #temp_trc

FROM sys.fn_trace_gettable(@tracefile, default) g — default = read all trace files

WHERE g.EventClass = 164

 

SELECT t.DatabaseID, t.DatabaseName, t.NTUserName, t.NTDomainName,

t.HostName, t.ApplicationName, t.LoginName, t.SPID, t.StartTime, l.Text

FROM #temp_trc t

JOIN #SQLerrorlog l ON t.SPID = l.ProcessInfo

WHERE t.StartTime > GETDATE()-1 — filter by time within the last 24 hours

ORDER BY t.StartTime DESC

 

DROP TABLE #temp_trc

DROP TABLE #SQLerrorlog

GO

 

You can find more on the following:

sp_readerrorlog is an undocumented procedure that actually uses xp_readerrorlog – https://www.mssqltips.com/sqlservertip/1476/reading-the-sql-server-log-files-using-tsql/

sys.fn_trace_getinfo – https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-trace-getinfo-transact-sql

sys.fn_trace_gettable – https://docs.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-trace-gettable-transact-sql

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRxSupport@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. We love to talk tech with anyone in our SQL family!

Links for new DMV’s in SQL 2017

Since we at SQLRX have been super busy this week and still need to do something for a blog post, I thought that doing a link round up of new dynamic management views that are going to be available in SQL 2017 would be a quick and good idea. I am really interested in sys.dm_db_log_info since it give VLF info and can be used for monitoring and alerting.

Enjoy!

Change:

A column on sys.dm_db_file_space_usage has been changed: https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-file-space-usage-transact-sql

Brand new dmv’s:

sys.dm_db_log_stats https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-log-stats-transact-sql

sys.dm_db_log_info https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-log-info-transact-sql

sys.dm_db_stats_histogram https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-stats-histogram-transact-sql

sys.dm_tran_version_store_space_usage https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-tran-version-store-space-usage

sys.dm_os_host_info https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-host-info-transact-sql

Blog_20170901_1

Yee Haw!!!!

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!