Category Archives: Expert

Handling Multiple Missing Index Recommendations for the Same Table

— By Jeff Schwartz

The Problem

Many articles concerning SQL Server missing index recommendations demonstrate the mechanics for obtaining them and often highlight whether the suggested key columns are used in equality or inequality relationships. Most of these examples emphasize a single index that can be implemented to improve performance. However, real-life situations often involve multiple or many suggested indices. For example, during a recent customer study, I observed 28 recommended indices for one table and 52 for another. Clearly, metrics such as improvement measure and user impact frequently can be used to determine the most important recommendations, but sometimes there are either too many similar recommendations – or – improvement measures are almost the same for several proposed indices, which makes determining an optimal index difficult. A large number of recommendations also often results in index proliferation, i.e., missing index recommendations are implemented piecemeal with no overall strategy resulting in tables with 14, 17, or 24 indices as the author observed in a recent customer performance study. The size of the table compounds this problem because it is especially desirable to limit the number of indices on these tables. For example, the table that had 24 indices on it contained over 30 million records.

The following examples of proposed indices that inspired this article illustrate the multiple recommendation phenomenon – RecIndex1: Keys (DateVal), Included Columns (Metric, ReptCat, LocationID, Total_Amount) and RecIndex2: Keys (LocationID, DateVal) Included Columns (Metric, ReptCat, Total_Amount). Clearly, without additional information, it is difficult to determine whether these recommendations must remain separate or could be combined into a single index. In this situation, knowing whether the proposed key columns are used in equality or inequality where clauses can be critical. This article will discuss how to use knowledge of equality and inequality relationships to determine an appropriate course of action.

Test Table Creation & Load

To determine missing index recommendation behavior, a generic table was constructed and filled with 20 million records. Each record contained an identity column, an ID column, a text column, and 47 metric columns whose values ranged between 1 and 10,000,000. The large number of table columns was used to insure SQL Server would choose an index option when appropriate. Six queries that incorporated various column combinations were executed (some of which differed only in column ordering). To minimize duplication of column values and skewing of query plans, the ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000 formula was used to generate values that were as random as possible. Two indices were created: a clustered index that used the identity column as its only key and a second nonclustered index that used DupID as its only column. The scripts for the creation, loading, and initial indexing of the table are shown below.

— ##############################################################

— Create test table

— ##############################################################

DROP TABLE FewDuplicates;

CREATE TABLE FewDuplicates (

IDCol bigint identity (20000000,1),

DupID bigint,

MyText varchar(10),

Metric01 bigint, Metric02 bigint, Metric03 bigint, Metric04 bigint,

Metric05 bigint, Metric06 bigint, Metric07 bigint, Metric08 bigint,

Metric09 bigint, Metric10 bigint, Metric11 bigint, Metric12 bigint,

Metric13 bigint, Metric14 bigint, Metric15 bigint, Metric16 bigint,

Metric17 bigint, Metric18 bigint, Metric19 bigint, Metric20 bigint,

Metric21 bigint, Metric22 bigint, Metric23 bigint, Metric24 bigint,

Metric25 bigint, Metric26 bigint, Metric27 bigint, Metric28 bigint,

Metric29 bigint, Metric30 bigint, Metric31 bigint, Metric32 bigint,

Metric33 bigint, Metric34 bigint, Metric35 bigint, Metric36 bigint,

Metric37 bigint, Metric38 bigint, Metric39 bigint, Metric40 bigint,

Metric41 bigint, Metric42 bigint, Metric43 bigint, Metric44 bigint,

Metric45 bigint, Metric46 bigint, Metric47 bigint

)

 

— ##############################################################

— Load original table

— ##############################################################

declare @DupID bigint = 1

declare @NumRecs bigint = 20000000

 

truncate table FewDuplicates

set nocount on

while (@DupID <= @NumRecs)

begin

insert into [dbo].[FewDuplicates] (

[DupID], [MyText],

[Metric01], [Metric02], [Metric03], [Metric04], [Metric05], [Metric06], [Metric07], [Metric08], [Metric09], [Metric10], [Metric11], [Metric12], [Metric13], [Metric14], [Metric15], [Metric16], [Metric17], [Metric18], [Metric19], [Metric20], [Metric21], [Metric22], [Metric23], [Metric24], [Metric25], [Metric26], [Metric27], [Metric28], [Metric29], [Metric30], [Metric31], [Metric32], [Metric33], [Metric34], [Metric35], [Metric36], [Metric37], [Metric38], [Metric39], [Metric40], [Metric41], [Metric42],

[Metric43], [Metric44], [Metric45], [Metric46], [Metric47]

)

VALUES (

@DupID,‘my text’,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000,

ABS(cast(CHECKSUM(NewId()) as bigint)) % 10000000

)

set @DupID += 1

end — group option loop

set nocount off

 

— ##############################################################

— Create indices on the test table

— ##############################################################

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

 

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Queries that Create Missing Index Recommendations

To illustrate the issue, two queries were created that were guaranteed to generate missing index recommendations as well as mimic the behavior of the ones cited in The Problem section. They are listed in Table 1 and the differences are highlighted for easier comparison. The query plans for the two queries are displayed in Table 2 and Table 3. Both queries performed full clustered index scans and generated missing index recommendations. The recommendations are shown below in two pieces within each table. The most important points are that Query Plan #1 specifies Metric14 first and Metric43 second, whereas Query Plan #2 specifies Metric43 alone with Metric14 as an included column. At first glance, these appear to be contradictory and potentially incompatible differences.

Blog_20170720_1

Blog_20170720_2

Blog_20170720_3

Table 4 summarizes the metrics, columns, and full index definitions suggested by SQL Server. Given the very large improvement measure values, the key ordering of the proposed indices, and the somewhat different included columns, it may be tempting to implement both indices despite the fact that the table contained 20 million records. However, more detailed analysis using the data in Table 5 shows that a single index that is constructed properly can accommodate both recommendations and, therefore, both queries. The most important metrics are displayed in column_usage because one uses two equality comparisons and the other only uses one. Therefore, if we specify the equality column used in both queries first and specify the equality/included column second, both queries will be satisfied. VERY IMPORTANT NOTE: Although the key and included column ordering appear obvious because of the column names used in this example table, i.e., suffixes in numerical order, when normal column names like DateVal or LocationID are used, ordering is much less obvious. As cited in my previous blog entitled Query Tuning and Missing Index Recommendations, when ordering is not crucial, e.g., when only equality operations or included columns are specified, SQL Server uses the ordering of the columns in the table itself rather than the ordering specified in the query.

In most cases SQL Server attempts to create covering indices, which are defined to be indices that contain all the columns of a particular query. Please reference the following web page for further information regarding covering indices: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns. In the author’s experience, implementing the keys of a suggested index wtihout the corresponding included columns often results in SQL Server ignoring the new index. Therefore, the included columns are vital to any missing index strategy. Clearly, a point of diminshing returns exists when the number of included columns approaches the total number of columns in the table (especially very large tables), but as long as the number of columns is reasonable, included columns should always be considered. [The queries to obtain the data shown in Table 4 and Table 5 are provided in Table 6.]

Blog_20170720_4

Blog_20170720_5

SELECT avg_total_user_cost * avg_user_impact * (user_seeks + user_scans)

as [Improvement Measure],

avg_user_impact as [Avg User Impact], user_seeks as Executions,

[statement] as TableName,

equality_columns as [Equality Usage], inequality_columns as [Inequality Usage], included_columns as [Included Columns],

‘Create Nonclustered Index ix_SQLRx_’ + PARSENAME([statement],1) + ‘_’ + CONVERT(varchar, group_handle) + ‘_’ + CONVERT(varchar, g.index_handle) + ‘ ON ‘ +

[statement] +‘ (‘ + ISNULL(replace(equality_columns,‘ ‘,),) +

CASE WHEN equality_columns IS NOT NULL AND inequality_columns IS NOT NULL THEN ‘,’ ELSE END +

ISNULL (replace(inequality_columns,‘ ‘,),) +‘)’ +

CASE WHEN included_columns IS NOT NULL THEN ‘ INCLUDE (‘ + included_columns + ‘)’ ELSE END AS [Create Index Statement]

FROM       sys.dm_db_missing_index_groups g

INNER JOIN   sys.dm_db_missing_index_group_stats s ON

s.group_handle = g.index_group_handle

INNER JOIN   sys.dm_db_missing_index_details d ON d.index_handle = g.index_handle

ORDER BY avg_total_user_cost * avg_user_impact * (user_seeks + user_scans) DESC;

 

SELECT statement AS [Table], column_id , column_name, column_usage,

migs.user_seeks as Executions, migs.avg_user_impact as [Avg User Impact]

FROM sys.dm_db_missing_index_details AS mid

CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)

INNER JOIN sys.dm_db_missing_index_groups AS mig ON mig.index_handle = mid.index_handle

INNER JOIN sys.dm_db_missing_index_group_stats AS migs ON mig.index_group_handle = migs.group_handle

ORDER BY mig.index_group_handle, mig.index_handle, column_id

Blog_20170720_8

Table 7 contains the composite index that satisfies both recommendations and queries. It is important to note that Metric43 is the FIRST key and Metric14 is the second. If this ordering is not followed, Query #1 will perform a full scan. Reviewing both queries demonstrates the following:

  1. The key ordering enables the equality operators to apply in both cases.
  2. Having Metric14 as the second key satisfies both the equality condition and the included column condition.
  3. The other columns specified by the queries are supplied so the table data need never be referenced.

 

CREATE NONCLUSTERED INDEX ix_CombinedIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Blog_20170720_9

Table 8 and Table 9 contain the updated query plans, which illustrate the facts that the clustered index scans have been replaced by index seek operators and the clustered index is not accessed to satisfy either query. Note also that parallelism was present in the query plans shown in Table 2 and Table 3, but is absent in the new query plans displayed in Table 8 and Table 9.

Blog_20170720_6

Blog_20170720_7

Conclusion

This article illustrated a situation in which multiple missing index recommendations first appeared to necessitate separate indices, resulting in a single index implementation. Careful examination of the key relationships used in the queries and specified by the missing index recommendations enabled the author to develop one composite recommendation that enabled optimal query performance.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

 

 

 

Common Criteria Compliance and LCK_M_SCH_M

–By Lori Brown

If you have ever wondered what happens when you mistakenly enable Common Criteria compliance in SQL, well wonder no more!!

Blog_20170427_1

We have a client who has no idea how or when Common Criteria was enabled on their production system. All they know is that performance has been slowly degrading. After collecting performance data, we found that there were high LCK_M_SCH_M waits which is a schema modification lock that prevents access to a table while a DDL operation occurs. We also found blocked process records where a LOGIN_STATS table in the master database was waiting a lot. This table is used to hold login statistics. When there are a lot of logins and outs there can be contention in this table.

When you enable Common Criteria compliance, something called Residual Information Protection (RIP) is enabled. RIP is an additional security measure for memory and it makes it so that in memory a specific bit pattern must be present before memory can be reallocated(overwritten) to a new resource or login. So with lots of logins and outs, there is a performance hit in memory because overwriting the memory allocation has to be done.

Keep in mind if you enable Common Criteria compliance, you can run into slowdowns from locking and memory. Make sure that your server is able to handle this well and that applications are designed to minimize the impact of high logins\logouts.

To disable Common Criteria compliance you can use sp_configure or the GUI.

sp_configure ‘common criteria compliance enabled’, 0

GO

RECONFIGURE

GO

However, it is not really disabled until you reboot the server (it actually says to reboot the server in MSDN). Restarting the instance will not work for this configuration change. I believe that this is because of the Residual Information Protection that secures memory. It stands to reason that without a reboot that flushes memory that RIP is still doing its work and causing performance issues.

Check out the documentation from MSDN on Common Criteria compliance here:

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/common-criteria-compliance-enabled-server-configuration-option

The biggest takeaway from this is to make sure to change your SQL configuration with good reason. Always know the side effects that can show up when you set any of the advanced configuration options.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

SQL 2016 Automatic Seeding of Availability Groups

–By Lori Brown

I came across something today that I did not know existed in AG’s, Automatic Seeding. This allows SQL to automatically create the secondary replicas for all databases in an AG. Pretty cool!! So if you set things up correctly when you set up your AG’s, you don’t have to worry about backing up databases and tlogs and taking them to the secondary replica, restoring and then getting the AG fully set up. It is a one stop shop. This is new in SQL 2016 only as far as I can tell.

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/automatically-initialize-always-on-availability-group

Blog_20170420_1

You have to set up your AG by script but that is not too difficult.

—Run On Primary

CREATE AVAILABILITY GROUP [<availability_group_name>]

FOR DATABASE db1

REPLICA ON ‘<*primary_server*>’

WITH (ENDPOINT_URL = N’TCP://<primary_server>.<fully_qualified_domain_name>:5022′,

FAILOVER_MODE = AUTOMATIC,

AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

BACKUP_PRIORITY = 50,

SECONDARY_ROLE(ALLOW_CONNECTIONS = NO),

SEEDING_MODE = AUTOMATIC),

N'<secondary_server>’ WITH (ENDPOINT_URL = N’TCP://<secondary_server>.<fully_qualified_domain_name>:5022′,

FAILOVER_MODE = AUTOMATIC,

AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

BACKUP_PRIORITY = 50,

SECONDARY_ROLE(ALLOW_CONNECTIONS = NO),

SEEDING_MODE = AUTOMATIC);

GO

Of course you have to be aware that if you set this on an AG with large databases, this could cause an issue since SQL would be pushing an entire database across the network. There is trace flag 9567 that can help compress the data stream for AG’s using Automatic Seeding but there are some side effects of increased processor load that you need to be aware of.

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/tune-compression-for-availability-group

We are setting up some new VM’s here at SQLRX and will be blogging later in much more depth on how this works.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

SQL 2016 – Partitioning An Existing Table With A Columnstore Index

— by Lori Brown @SQLSupahStah

I recently ran across a data warehouse that did have the largest table partitioned. We were migrating up to SQL 2016 and we wanted to make sure that the partitions were still working and in place. Since I felt I needed to refresh my table partitioning skills, I decided to conduct a small scale test of partitioning up an existing table by year and to make it more fun, I wanted to have a columnstore index present to see how interesting things could be.

I am using the Fact.Purchase table in the new WideWorldImportersDW database. WideWorldImporters replaces AdventureWorks and can be researched and downloaded from here: https://blogs.technet.microsoft.com/dataplatforminsider/2016/06/09/wideworldimporters-the-new-sql-server-sample-database/

I did a little querying in the Fact.Purchase table and found that the Date Key column is great to use to partition it by year. And, I found that the dates in the table are generally from 2013 through 2016.

First you have to add the filegroups that will be used for the table once it is partitioned. I have added 4 filegroups for years 2013 – 2016.

blog_20170224_1

Once that is done, I have to add an .NDF file for each filegroup and map the file to the correct filegroup.

blog_20170224_2

 

Next I need a partition function and a partition scheme.

USE [WideWorldImportersDW]

GO

CREATE PARTITION FUNCTION [PF_Purch_Year](date) AS RANGE RIGHT FOR VALUES (‘2014-01-01’, ‘2015-01-01’, ‘2016-01-01’)

GO

CREATE PARTITION SCHEME [PS_Purch_Year] AS PARTITION [PF_Purch_Year] TO (Purch2013, Purch2014, Purch2015, Purch2016)

GO

What I am basically doing is setting parameters or boundaries for objects that use the PF_PURCH_YEAR function to place date values older than 01/01/2014 into the Purch2013 filegroup, values between 01/01/2014 and 01/01/2015 into the Purch2014 filegroup, values between 01/01/2015 and 01/01/2016 into the Purch2015 filegroup and finally everything newer than or equal to 01/01/2016 into the Purch2016 filegroup.

blog_20170224_3

Now that I have everything set up, I have to move the data to it. However, I cannot do that with an existing clustered index. I have to drop the clustered index and rebuild it with the partition scheme. This does get a bit confusing if you are trying to move existing things as you will see.

I first dropped all my indexes. Of course I had scripted them all out for rebuild but ran into trouble when I tried to build them again.

USE [WideWorldImportersDW]

GO

DROP INDEX [CCX_Fact_Purchase] ON [Fact].[Purchase]

GO

ALTER TABLE [Fact].[Purchase] DROP CONSTRAINT [PK_Fact_Purchase]

GO

DROP INDEX [FK_Fact_Purchase_Date_Key] ON [Fact].[Purchase]

GO

DROP INDEX [FK_Fact_Purchase_Stock_Item_Key] ON [Fact].[Purchase]

GO

DROP INDEX [FK_Fact_Purchase_Supplier_Key] ON [Fact].[Purchase]

GO

 

Here we go trying to rebuild….this is where it gets interesting…

USE [WideWorldImportersDW]

GO

CREATE CLUSTERED COLUMNSTORE INDEX [CCX_Fact_Purchase] ON [Fact].[Purchase]

WITH (DROP_EXISTING = OFF)

ON PS_Purch_Year ([Date Key])

GO

I received an error:

Msg 35316, Level 16, State 1, Line 3

The statement failed because a columnstore index must be partition-aligned with the base table. Create the columnstore index using the same partition function and same (or equivalent) partition scheme as the base table. If the base table is not partitioned, create a nonpartitioned columnstore index.

 

Uh oh! This meant that the table had already been part of a different partition scheme. I did not think to look for that before I started. Ugh! I thought that it must be easy to change the base table partition alignment and it is but it is not super intuitive. Since I had already dropped all my indexes, I figured that I needed to recreate the original clustered index on the old partition since that would be how the base table is associated with anything. I rebuilt it using the old partition. So far so good. When I tried rebuilding it with DROP_EXISTING = ON I received the same error as before. After thinking about it for a bit, I dropped everything again but this time created a regular clustered index on the new partition to align the base table. This worked!

ALTER TABLE [Fact].[Purchase] ADD CONSTRAINT [PK_Fact_Purchase] PRIMARY KEY CLUSTERED

(

[Purchase Key] ASC,

[Date Key] ASC

)ON PS_Purch_Year ([Date Key])

GO

 

Now the base table is aligned with the partition scheme that I wanted it on but I need the clustered index to be the columnstore index. We have to drop and create again and everything falls into place.

ALTER TABLE [Fact].[Purchase] DROP CONSTRAINT [PK_Fact_Purchase]

GO

USE [WideWorldImportersDW]

GO

CREATE CLUSTERED COLUMNSTORE INDEX [CCX_Fact_Purchase] ON [Fact].[Purchase]

WITH (DROP_EXISTING = OFF)

ON PS_Purch_Year ([Date Key])

GO

ALTER TABLE [Fact].[Purchase] ADD CONSTRAINT [PK_Fact_Purchase] PRIMARY KEY NONCLUSTERED

(

[Purchase Key] ASC,

[Date Key] ASC

)ON PS_Purch_Year ([Date Key])

GO

CREATE NONCLUSTERED INDEX [FK_Fact_Purchase_Date_Key] ON [Fact].[Purchase]

(

[Date Key] ASC

)ON PS_Purch_Year ([Date Key])

GO

CREATE NONCLUSTERED INDEX [FK_Fact_Purchase_Stock_Item_Key] ON [Fact].[Purchase]

(

[Stock Item Key] ASC

)ON PS_Purch_Year ([Date Key])

GO

CREATE NONCLUSTERED INDEX [FK_Fact_Purchase_Supplier_Key] ON [Fact].[Purchase]

(

[Supplier Key] ASC

)ON PS_Purch_Year ([Date Key])

GO

 

Success!!!! Now I just needed to make sure that things are where I expect them to be. I pilfered the following queries from MSDN…

SELECT *

FROM sys.tables AS t

JOIN sys.indexes AS i

ON t.[object_id] = i.[object_id]

JOIN sys.partition_schemes ps

ON i.data_space_id = ps.data_space_id

WHERE t.name = ‘Purchase’;

GO

blog_20170224_4

blog_20170224_5

As you scroll across the above results I can see that all my indexes are in the PS_Purch_Year partition scheme.

SELECT t.name AS TableName, i.name AS IndexName, p.partition_number, p.partition_id, i.data_space_id,

f.function_id, f.type_desc, r.boundary_id, r.value AS BoundaryValue

FROM sys.tables AS t

JOIN sys.indexes AS i

ON t.object_id = i.object_id

JOIN sys.partitions AS p

ON i.object_id = p.object_id AND i.index_id = p.index_id

JOIN sys.partition_schemes AS s

ON i.data_space_id = s.data_space_id

JOIN sys.partition_functions AS f

ON s.function_id = f.function_id

LEFT JOIN sys.partition_range_values AS r

ON f.function_id = r.function_id and r.boundary_id = p.partition_number

WHERE t.name = ‘Purchase’

ORDER BY p.partition_number;

 

This query shows the boundaries.

blog_20170224_6

I also found a really handy query created by David Peter Hansen that gives you info on partitions on a table much more concisely. https://davidpeterhansen.com/view-partitions-in-sql-server/  As they say….go read the whole thing. J

blog_20170224_7

I hope this helps someone who wants to partition an existing table. I did this on my own system with a small table and not a production system so please be sure to test your process against a test system before you accidentally cause an issue in production.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

 

SSISDB – Package Execution Time in Minutes and Average Execution Time

–By Lori Brown   @SQLSupahStah

I have a client who has a lot of SSIS packages in the SSIS catalog that we are migrating to a new server. Of course there is lots of testing going on. So the question came up about how to more easily see package runtimes and comparing to the overall average runtime rather than using the Dashboard or the reports that come with Management Studio. After a little searching, I found that the SSISDB.internal.execution_info table contains what I needed.

— Filter by date and package name

DECLARE @begindate DATE = GETDATE() – 7

DECLARE @pkgname VARCHAR(100) = ‘Your Package Name— Ex: Patient_Update.dtsx

 

— Package execution time

SELECT folder_name, project_name, package_name,

CAST(start_time AS datetime) AS start_time,

DATEDIFF(MINUTE, start_time, end_time) AS ‘execution_time[min]’

FROM SSISDB.internal.execution_info

WHERE start_time >= @begindate

AND package_name = @pkgname — Comment out this line for all packages

ORDER BY start_time

 

— Average package execution time

SELECT folder_name, project_name, package_name,

AVG(DATEDIFF(MINUTE, start_time, end_time)) AS ‘Avg execution_time[min]’

FROM SSISDB.internal.execution_info

WHERE start_time >= @begindate

AND package_name = @pkgname — Comment out this line for all packages

GROUP BY folder_name,project_name,package_name

You can also use the catalog.executions view to get the same data but since I saw the internal table, I jumped in. This was tested on SQL 2016 but I think it will still work in earlier versions. Hope this helps someone out!

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Large MSDB From Hidden sysxmitqueue Table

— by Lori Brown @SQLSupahStah

Recently, I was checking on some of the servers that I manage and found that one had an 8GB msdb database. While that is not too bad it is kind of large so I figured that there must be something that needed to be cleaned up. I ran the Disk Usage by Top Tables report expecting to see a table that was huge to leap out at me. However, this time there was nothing. As a matter of fact, it looked like the largest table only had about 187MB of data in it. So, what in the world was going on?

I did some looking around and found that there is a hidden and undocumented table named sysxmitqueue that can sometimes become huge if you have event notifications set up. If there are errors in how the notification is set up, there will be lots of error messages in the hidden table and the transmission_queue view.

Run this query to find out of you have msdb bloat from the sysxmitqueue table:

USE msdb

GO

— check for size of sysxmitqueue table

SELECT object_name(i.object_id) AS TableName,

i.[name] AS IndexName,

(sum(a.total_pages)*8)/1024 AS TotalSpaceMB,

(sum(a.used_pages)*8)/1024 AS UsedSpaceMB,

(sum(a.data_pages)*8)/1024 AS DataSpaceMB

FROM sys.indexes i

INNER JOIN sys.partitions p

ON i.object_id = p.object_id

AND i.index_id = p.index_id

INNER JOIN sys.allocation_units a

ON p.partition_id = a.container_id

WHERE object_name(i.object_id) = ‘sysxmitqueue’

GROUP BY i.object_id, i.[name]

GO

In my case, I found that this table had almost 8GB of data in it. Bingo!! Gotta clean it up now. Once again after doing some research, I found that lots of people have had problems with this table. Some say to set a new broker like so:

— DON’T USE THIS IN PRODUCTION!!

ALTER DATABASE msdb SET NEW_BROKER WITH ROLLBACK IMMEDIATE

But that will cause problems if you do it in a production system because the service broker id will be different than is being used by anything else that has distributed conversations. If you do this, you could leave conversations stranded.

There is also a way to end conversations that takes a little time but will not cause issues if you are careful. I found that I could not query the sysxmitqueue at all but found that info from this table is also found in the sys.transmission_queue (https://msdn.microsoft.com/en-us/library/ms190336.aspx) view. You can query it easily.

— get list of conversations

SELECT DISTINCT conversation_handle, to_service_name

FROM sys.transmission_queue

I used the above query to get a list of the conversations that needed to be cleaned up. The to_service_name column is the service name you created while setting up event notifications. You can use values from the conversation_handle column to tell an “END CONVERSATION…” statement (https://msdn.microsoft.com/en-us/library/ms177521.aspx) which conversation exactly to end.

— end open conversations

END CONVERSATION ‘4E5EF6AE-90F8-E311-B9CF-842B2B514244’ WITH CLEANUP

If you have a lot of open conversations, running this manually can take a while but you won’t nuke anyone else’s conversations this way.

The last thing I need to do is figure out why my event notification is not ending conversations correctly. That will be another post later so….

blog_20161013_1

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

SQL Server Statistics and Trace Flag 2371

— by Lori Brown @SQLSupahStah

Statistics are pretty important to SQL Server. Statistics are small objects that hold information of the distribution of data in a table. The SQL optimizer uses this information to create efficient ways to get data from the table. So, if statistics are not taken care of SQL may have a hard time running queries. It really needs this info to do its job.

I am a fan of updating statistics at least once a day. I prefer to use sp_updatestats (https://msdn.microsoft.com/en-us/library/ms173804.aspx ) since it will by default skip tables that have not had any changes. Sp_updatestats will however update stats if there have been ANY changes to a table. So some people seem to think that it can incur too much overhead (CPU, IO) but it has served most databases well. However, (here comes the “But, but, but…”) if you don’t have a long enough window to run sp_updatestats or if the default sample of 25% is not enough to give an accurate distro of data, then you need to come up with other solutions…especially for large tables. Large tables are always a problem when trying to set up maintenance.

If you have AUTO_UPDATE_STATISTICS enabled on your database (as you should unless you have a prevailing reason not too) statistics are updated automatically when enough data has changed and if a table is large enough. Here is what triggers an automatic update of stats:

  • The table size has gone from 0 to >0 rows
  • The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr (hidden system column) of the leading column of the statistics object has changed by more than 500 since then
  • The table has more than 500 rows when the statistics when the statistics were gathered, and the colmodctr (hidden system column) of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered

If you think about this a bit, you can see how this is pretty good for small tables but large tables….those might run into a sticky wicket. When you have a really large table it can take a significant amount of changes before statistics are automatically updated.

blog_20160921_1

While researching some things in SQL 2016, I found that I have been unaware of trace flag 2371. In SQL 2016, TF 2371 is now on by default. However, as far back as SQL 2008 R2, you can set this trace flag and get the benefits.

So, what is it? Trace flag 2371 is used to lower the threshold that tells SQL to automatically update statistics. This is especially important on larger tables where statistics updates are not triggered fast enough because of the high threshold of changes that normally need to happen on a table before statistics are considered to be old by SQL. Old stats and SQL do not get along well. Apparently this was a HUGE problem for SAP and it is the reason that the trace flag exists. With TF 2371, the threshold to trigger an automatic statistics update goes dynamically down meaning there will be more frequent stats updates for larger tables which should really help SQL create efficient execution plans. Yay!

blog_20160921_2

By now you must be thinking…I am gonna turn on TF 2371 everywhere!!!

blog_20160921_3

But, I really would not recommend it. If you have not been having performance problems without TF 2371 then you probably don’t need it. If it ain’t broke, don’t fix it! Turning on trace flags means that you are telling SQL to not use the regular paths through the SQL engine. When you turn on a trace flag it is a global setting and cannot be disabled for specific databases. And, if you have to open a ticket with Microsoft, they are likely going to be asking questions about why you have it turned on. Okay…so the risk on this one seems pretty low but you really should have a reason to enable it.

Sadly the only way to know if statistics are the cause for slow queries seems to be to run into performance issues and then notice that performance improves when statistics are update. You can take the time to catch execution plans both before and after the stats update so that you can verify but that is about it.

Don’t forget that in SQL 2016, TF 2371 is enabled by default. That could be something that helps you decide to move to it.

I have to give a trackback to Kendra Little for an excellent post on statistics. You can get more on whys and ways to take care of your statistics from http://www.littlekendra.com/2016/04/18/updating-statistics-in-sql-server-maintenance-answers/

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Setting up Disconnected Log Shipping

— by Lori Brown @SQLSupahStah

I have a few clients who have set up SQL instances to hold copies of databases from either vendors or other companies with whom they are actively sharing data with. In these cases, since the source databases are on servers that are part of another company’s domain, the best way to make sure that data is kept up to date is to set up what I call disconnected log shipping. It is disconnected because the primary and secondary SQL instances have no access to each other so setting up regular log shipping will not work.

The owners of the primary instance, in each of the times I have worked on something like this, have set up regular uploads of transaction log backups to an FTP site. My task is to download the .trn files and apply them to the database on our secondary instance. So far, I have had good luck using WinSCP (https://winscp.net/eng/index.php ) and setting up some custom code to create my own log shipping complete with a monitoring job that will alert if logs have not been applied with a specified time. WinSCP is free (but you really should consider donating) and has a very good knowledge base that will show you how to use it fully.

For the record, I know that I can use SQL’s log shipping stored procedures to set up things too but each time I tried, something has gone wrong forcing me to come up with a workaround. For instance, in my latest attempt, the restore job would constantly fail saying that something was NULL that was expecting a date. I don’t have the error anymore so I am doing that one from memory. No matter how I changed things in the log shipping tables, I could not get past the error. After several days of no luck with the SQL log shipping procedures, I took a big step back and decided to just go around it.

After doing some research to see if anyone had posted a better way of setting up and monitoring disconnected log shipping, I found that almost everyone talks about using SQL’s functionality and there is just not much documented that covers how to set up your own solution. So, here is what I did.

WinSCP

First of all, I need to start with a disclaimer that I am NOT a WinSCP scripting expert. I was fortunate at one project where something like this had already been created so I used that for a template. For assistance on WinSCP, please look through their forums and online help.

Download and install WinSCP. I used all defaults and things are working great. Open WinSCP and use it to connect to the FTP site that contains the tlogs and full backups that need to be restored on your SQL instance.

blog_20160915_1

Make sure to put the host key in the cache because you will need it later. You can download your full database backups then so that you will be ready to restore and get log shipping set up.

You will need some folders set up to hold your .trn files, WinSCP scripts and other things. I set up a FTPImport folder and subfolders to hold scripts, archives, the downloaded tlogs and some WinSCP logs too.

blog_20160915_2

You can see that I have two scripts in the Scripts folder. Call_WinSCP.cmd is set up with the path to the WinSCP executable and provides paths to scripts and logs for troubleshooting. In mine I am only giving the path the scripts. I tried to provide comments on each command so that you will know what each line is going to do, so read through things carefully before trying to same in your server. I will eventually set up a job that will call the batch file and tell the batch file to use Get_SQLRXDB_TlogBackups.scp.

Call_WinSCP.cmd

@Echo Off

REM Script to call WinSCP from Current Folder

REM Parameter %1 is name of WinSCP script file to run

REM Parameter %2 is uniquifier for log file names (e.g. date string YYYYMMDDHHmm)

 

Echo Running WinSCP with command Script

 

Date /t

Time /t

 

Echo WinSCP Command /script=”E:\FTPImport\Scripts\%1.scp”.

Echo WinSCP Command /Log=”E:\FTPImport\WinSCPLogs\%2.log”

 

C:\”Program Files (x86)\WinSCP\winscp.com” /script=”E:\FTPImport\Scripts\%1.scp”

 

Date /t

Time /t

 

Get_SQLRXDB_TlogBackups.scp

# WINSCP script file to connect to FTP site and download current T Log Backups

 

# Set Script Options for this transfer

Option echo on

Option batch on

Option confirm off

Option transfer binary

 

# Import Folder for T-Log Backups = E:\FTPImport\TLogs

# Source Folder for T-Log Backups = company.ftpsite.com /LogShipping/SQLRXDB

# Account Name = MyFTPAccount

# Password = $tr0ngPwdH3r3

 

#echo Connect to server

Open sftp://MyFTPAccount:$tr0ngPwdH3r3@company.ftpsite.com -hostkey=”ssh-rsa 2048 xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx”

 

#echo change remote working directory to folder with the log backups and list contents

cd LogShipping/SQLRXDB

pwd

 

#echo Change Local working directory to folder for log backups and list contents

lcd E:\FTPImport\TLogs\SQLRXDB

lpwd

 

#echo copy and then delete the T-Log files from remote location

get -delete SQLRXDB_backup_*.trn

 

#echo Close Connection and exit

Close

Exit

Set up Log Shipping

Now that I have WinSCP ready, I just need to do some normal set up for Log Shipping. That means that I have to restore my database, set up a job that will download tlog backup files from the FTP site, set up a job that will restore the downloaded tlog backup files and finally, set up a job that will let someone know if log shipping is out of sync.

First, restore the downloaded backup of the database and leave it in norecovery so that the transaction log backups can be applied to it later.

— Restore database with norecovery

USE [master]

GO

RESTORE DATABASE [SQLRXDB]

FROM DISK = N’E:\FTPImport\FullBackups\SQLRXDB_full_backup.bak’ WITH FILE = 1,

MOVE N’SQLRXDB_Data’ TO N’E:\MSSQL\DATA\SQLRXDB.mdf’,

MOVE N’SQLRXDB_Log’ TO N’F:\MSSQL\LOGS\SQLRXDB_log.ldf’,

NORECOVERY, NOUNLOAD, REPLACE, STATS = 5

GO

Next, set up a “copy” job that is very similar to the one created by Log Shipping stored procedures. The copy job calls WinSCP using a CmdExec step and tells is to use the Get_SQLRXDB_TlogBackups.scp script. The command to call WinSCP is pretty straightforward.

Call E:\FTPImport\Scripts\Call_WinSCP.cmd Get_SQLRXDB_TLogBackups Get_TLogs_WinSCPSession > E:\FTPImport\WinSCPLogs\Call_WinSCP_JobStep.log

And, here is the job that I created. At this client, they want .trn files downloaded every hour.

— Create job to download tlogs from FTP site

USE [msdb]

GO

 

/****** Object: Job [SQLRXDB_GetTLogs]   ******/

BEGIN TRANSACTION

DECLARE @ReturnCode INT

SELECT @ReturnCode = 0

 

DECLARE @jobId BINARY(16)

EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N’SQLRXDB_GetTLogs’,

@enabled=0,

@notify_level_eventlog=0,

@notify_level_email=0,

@notify_level_netsend=0,

@notify_level_page=0,

@delete_level=0,

@description=N’Download SQLRXDB tlogs from FTP site’,

@category_name=N’Log Shipping’,

@owner_login_name=N’sa’, @job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/****** Object: Step [Get SQLRXDB Tlog Files]   ******/

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’Get SQLRXDB Tlog Files’,

@step_id=1,

@cmdexec_success_code=0,

@on_success_action=1,

@on_success_step_id=0,

@on_fail_action=2,

@on_fail_step_id=0,

@retry_attempts=0,

@retry_interval=0,

@os_run_priority=0, @subsystem=N’CmdExec’,

@command=N’Call E:\FTPImport\Scripts\Call_WinSCP.cmd Get_SQLRXDB_TLogBackups Get_TLogs_WinSCPSession > E:\FTPImport\WinSCPLogs\Call_WinSCP_JobStep.log’,

@output_file_name=N’E:\FTPImport\SQLRXDB_GetTLogs.log’,

@flags=32,

@proxy_name=N’RunCmdExec’

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N’SQLRXDB_GetTlogs’,

@enabled=1,

@freq_type=4,

@freq_interval=1,

@freq_subday_type=8,

@freq_subday_interval=1,

@freq_relative_interval=0,

@freq_recurrence_factor=0,

@active_start_date=20160813,

@active_end_date=99991231,

@active_start_time=500,

@active_end_time=235959

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

   IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:

 

GO

I have some code that I have been using for a long time to read the contents of a folder and do things with the files it finds. So, in this case I am using the code below to read the files that were downloaded from FTP and make a list so that some dynamic SQL can try to restore those files to the database. I put in some error handling code that will hopefully tell me if a specific file cannot be applied for troubleshooting purposes. In my case, the client wants to use the database for reporting so I am restoring the files and then placing the database in STANDBY. Then as a final thing so that only newer files will be worked on, I move the restored files to an archive folder.

SET NOCOUNT ON

 

— Variable declarations

DECLARE @CMD1 VARCHAR(5000)

DECLARE @CMD2 VARCHAR(5000)

DECLARE @CMD3 NVARCHAR(1000)

DECLARE @FilePath VARCHAR(200)

DECLARE @ArchivePath VARCHAR(200)

DECLARE @UndoFilePath VARCHAR(200)

DECLARE @File2Restore VARCHAR(128)

DECLARE @File2Move VARCHAR(128)

DECLARE @RestoreStmt NVARCHAR(1000)

DECLARE @error_msg VARCHAR(500)

DECLARE @ErrMsg VARCHAR(500))

 

— Create the #OriginalFileList temporary table to support the un-cleansed file list

CREATE TABLE #OriginalFileList (

Col1 VARCHAR(1000) NULL

)

 

— Create the #ParsedFileList temporary table to support the cleansed file list

CREATE TABLE #ParsedFileList (

PFLID INT PRIMARY KEY IDENTITY (1,1) NOT NULL,

DateTimeStamp DATETIME NOT NULL,

FileSize VARCHAR(50) NOT NULL,

FileName1 VARCHAR (255) NOT NULL

)

 

— Initialize the variables

SELECT @CMD1 =

SELECT @CMD2 =

SELECT @CMD3 =

SELECT @FilePath = ‘E:\FTPImport\TLogs\SQLRXDB\’

SELECT @ArchivePath = ‘E:\FTPImport\Archives\SQLRXDB\’

SELECT @UndoFilePath = ‘E:\MSSQL\Backup\ROLLBACK_UNDO_SQLRXDB.BAK’

 

— Build the string to capture the file names in the restore location

SELECT @CMD1 = ‘master.dbo.xp_cmdshell ‘ + char(39) + ‘dir ‘ + @FilePath + ‘\*.*’ + char(39)

 

— Build the string to populate the #OriginalFileList temporary table

SELECT @CMD2 = ‘INSERT INTO #OriginalFileList(Col1)’ + char(13) + ‘EXEC ‘ + @CMD1

 

— Execute the string to populate the #OriginalFileList table

EXEC (@CMD2)

 

— Delete unneeded data from the #OriginalFileList

DELETE FROM #OriginalFileList

WHERE COL1 IS NULL

 

DELETE FROM #OriginalFileList

WHERE COL1 LIKE ‘%Volume%’

 

DELETE FROM #OriginalFileList

WHERE COL1 LIKE ‘%Directory%’

 

DELETE FROM #OriginalFileList

WHERE COL1 LIKE ‘%<DIR>%’

 

DELETE FROM #OriginalFileList

WHERE COL1 LIKE ‘%bytes%’

 

— Populate the #ParsedFileList table with the final data

INSERT INTO #ParsedFileList (DateTimeStamp, FileSize, FileName1)

SELECT LTRIM(SUBSTRING (Col1, 1, 20)) AS ‘DateTimeStamp’,

LTRIM(SUBSTRING (Col1, 21, 18)) AS ‘FileSize’,

LTRIM(SUBSTRING (Col1, 40, 1000)) AS ‘FileName1’

FROM #OriginalFileList

 

 

DECLARE TLogs2Restore CURSOR FOR

SELECT FileName1 FROM #ParsedFileList ORDER BY FileName1 ASC

 

— Open the cursor

OPEN TLogs2Restore

 

FETCH NEXT FROM TLogs2Restore INTO @File2Restore

 

— Loop through

WHILE (@@FETCH_STATUS = 0) BEGIN

BEGIN TRY

SET @RestoreStmt = ‘RESTORE LOG [SQLRXDB] FROM DISK = N”’+@FilePath+@File2Restore+”‘ WITH STANDBY = ”’+@UndoFilePath+””

–print @RestoreStmt

EXEC sp_executesql @RestoreStmt

END TRY

BEGIN CATCH

SET @error_msg = error_message()

SET @ErrMsg = ‘Log file ‘+@File2Restore+‘ cannot be restored due to error: ‘+@error_msg

PRINT @ErrMsg

END CATCH

FETCH NEXT FROM TLogs2Restore INTO @File2Restore

 

END

 

— Close and deallocate the cursor

CLOSE TLogs2Restore

DEALLOCATE TLogs2Restore

 

 

— Move restored tlogs to archive folder

DECLARE TLogs2Archive CURSOR FOR

SELECT FileName1 FROM #ParsedFileList ORDER BY FileName1 ASC

 

— Open the cursor

OPEN TLogs2Archive

 

— Loop through

WHILE (1=1)

BEGIN

FETCH NEXT FROM TLogs2Archive INTO @File2Move

IF @@FETCH_STATUS < 0 BREAK

 

SET @CMD3 = ‘EXEC master.dbo.xp_cmdshell ”ROBOCOPY.EXE ‘+@FilePath+ ‘+@ArchivePath+’ ‘+@File2Move+‘ /MOV /XO /NP”’

–print @CMD3

EXEC sp_executesql @CMD3

 

END

 

— Close and deallocate the cursor

CLOSE TLogs2Archive

DEALLOCATE TLogs2Archive

 

— Drop the temporary tables

DROP TABLE #OriginalFileList

DROP TABLE #ParsedFileList

 

SET NOCOUNT OFF

GO

And, here is the above code put into a job….

— create job to restore downloaded tlogs and archive the tlogs when done

USE [msdb]

GO

 

/****** Object: Job [SQLRXDB_RestoreTlogs]   ******/

BEGIN TRANSACTION

DECLARE @ReturnCode INT

SELECT @ReturnCode = 0

 

DECLARE @jobId BINARY(16)

EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N’SQLRXDB_RestoreTlogs’,

@enabled=0,

@notify_level_eventlog=0,

@notify_level_email=0,

@notify_level_netsend=0,

@notify_level_page=0,

@delete_level=0,

@description=N’No description available.’,

@category_name=N’Log Shipping’,

@owner_login_name=N’sa’, @job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/****** Object: Step [Restore]   ******/

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’Restore’,

@step_id=1,

@cmdexec_success_code=0,

@on_success_action=1,

@on_success_step_id=0,

@on_fail_action=2,

@on_fail_step_id=0,

@retry_attempts=0,

@retry_interval=0,

@os_run_priority=0, @subsystem=N’TSQL’,

@command=N’SET NOCOUNT ON

 

— Variable declarations

DECLARE @CMD1 VARCHAR(5000)

DECLARE @CMD2 VARCHAR(5000)

DECLARE @CMD3 NVARCHAR(1000)

DECLARE @FilePath VARCHAR(200)

DECLARE @ArchivePath VARCHAR(200)

DECLARE @UndoFilePath VARCHAR(200)

DECLARE @File2Restore VARCHAR(128)

DECLARE @File2Move VARCHAR(128)

DECLARE @RestoreStmt NVARCHAR(1000)

DECLARE @error_msg VARCHAR(500)

DECLARE @ErrMsg VARCHAR(500))

 

— Create the #OriginalFileList temporary table to support the un-cleansed file list

CREATE TABLE #OriginalFileList (

Col1 VARCHAR(1000) NULL

)

 

— Create the #ParsedFileList temporary table to support the cleansed file list

CREATE TABLE #ParsedFileList (

PFLID INT PRIMARY KEY IDENTITY (1,1) NOT NULL,

DateTimeStamp DATETIME NOT NULL,

FileSize VARCHAR(50) NOT NULL,

FileName1 VARCHAR (255) NOT NULL

)

 

— Initialize the variables

SELECT @CMD1 = ””

SELECT @CMD2 = ””

SELECT @CMD3 = ””

SELECT @FilePath = ”E:\FTPImport\TLogs\SQLRXDB\”

SELECT @ArchivePath = ”E:\FTPImport\Archives\SQLRXDB\”

SELECT @UndoFilePath = ”E:\MSSQL\Backup\ROLLBACK_UNDO_SQLRXDB.BAK”

 

— Build the string to capture the file names in the restore location

SELECT @CMD1 = ”master.dbo.xp_cmdshell ” + char(39) + ”dir ” + @FilePath + ”\*.*” + char(39)

 

— Build the string to populate the #OriginalFileList temporary table

SELECT @CMD2 = ”INSERT INTO #OriginalFileList(Col1)” + char(13) +

”EXEC ” + @CMD1

 

— Execute the string to populate the #OriginalFileList table

EXEC (@CMD2)

 

— Delete unneeded data from the #OriginalFileList

DELETE FROM #OriginalFileList

WHERE COL1 IS NULL

 

DELETE FROM #OriginalFileList

WHERE COL1 LIKE ”%Volume%”

 

DELETE FROM #OriginalFileList

WHERE COL1 LIKE ”%Directory%”

 

DELETE FROM #OriginalFileList

WHERE COL1 LIKE ”%<DIR>%”

 

DELETE FROM #OriginalFileList

WHERE COL1 LIKE ”%bytes%”

 

— Populate the #ParsedFileList table with the final data

INSERT INTO #ParsedFileList (DateTimeStamp, FileSize, FileName1)

SELECT LTRIM(SUBSTRING (Col1, 1, 20)) AS ”DateTimeStamp”,

LTRIM(SUBSTRING (Col1, 21, 18)) AS ”FileSize”,

LTRIM(SUBSTRING (Col1, 40, 1000)) AS ”FileName1”

FROM #OriginalFileList

 

 

DECLARE TLogs2Restore CURSOR FOR

SELECT FileName1 FROM #ParsedFileList ORDER BY FileName1 ASC

 

— Open the cursor

OPEN TLogs2Restore

 

FETCH NEXT FROM TLogs2Restore INTO @File2Restore

 

— Loop through

WHILE (@@FETCH_STATUS = 0) BEGIN

BEGIN TRY

       SET @RestoreStmt = ”RESTORE LOG [SQLRXDB] FROM DISK = N”””+@FilePath+@File2Restore+””” WITH STANDBY = ”””+@UndoFilePath+””””

       –print @RestoreStmt

       EXEC sp_executesql @RestoreStmt

END TRY

BEGIN CATCH

       SET @error_msg = error_message()

       SET @ErrMsg = ”Log file ”+@File2Restore+” cannot be restored due to error: ”+@error_msg

       PRINT @ErrMsg

END CATCH

FETCH NEXT FROM TLogs2Restore INTO @File2Restore

 

END

 

— Close and deallocate the cursor

CLOSE TLogs2Restore

DEALLOCATE TLogs2Restore

 

 

— Move restored tlogs to archive folder

DECLARE TLogs2Archive CURSOR FOR

SELECT FileName1 FROM #ParsedFileList ORDER BY FileName1 ASC

 

— Open the cursor

OPEN TLogs2Archive

 

— Loop through

WHILE (1=1)

BEGIN

       FETCH NEXT FROM TLogs2Archive INTO @File2Move

       IF @@FETCH_STATUS < 0 BREAK

 

       SET @CMD3 = ”EXEC master.dbo.xp_cmdshell ””ROBOCOPY.EXE ”+@FilePath+” ”+@ArchivePath+” ”+@File2Move+” /MOV /XO /NP”””

       –print @CMD3

       EXEC sp_executesql @CMD3

 

END

 

— Close and deallocate the cursor

CLOSE TLogs2Archive

DEALLOCATE TLogs2Archive

 

— Drop the temporary tables

DROP TABLE #OriginalFileList

DROP TABLE #ParsedFileList

 

SET NOCOUNT OFF

GO

 

,

@database_name=N’master’,

@flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N’RestoreSched’,

@enabled=1,

@freq_type=4,

@freq_interval=1,

@freq_subday_type=8,

@freq_subday_interval=1,

@freq_relative_interval=0,

@freq_recurrence_factor=0,

@active_start_date=20160824,

@active_end_date=99991231,

@active_start_time=1500,

@active_end_time=235959

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:

 

GO

The last thing I need is something that will tell me that transaction logs have not been restored. This is supposed to resemble the LS_Alert job that is normally created for Log Shipping. In my client’s case, they wanted to know if data was out of synch by more than 2 hours. Since regular log shipping is not set up and nothing is automatically being logged, I instead am using xp_readerrorlog to find messages that logs have been restored for the target database with a 2 hour window. If it does not find those messages in the logs then an e-mail notification is sent.

— create a homegrown log shipping monitoring job to let you know when it is out of synch

USE [msdb]

GO

 

/****** Object: Job [SQLRXDB LogShipping Monitor]   ******/

BEGIN TRANSACTION

DECLARE @ReturnCode INT

SELECT @ReturnCode = 0

 

DECLARE @jobId BINARY(16)

EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N’SQLRXDB LogShipping Monitor’,

@enabled=0,

@notify_level_eventlog=0,

@notify_level_email=2,

@notify_level_netsend=0,

@notify_level_page=0,

@delete_level=0,

@description=N’Scrapes the SQL log looking for log restored statements. If none found in the past N hours email alert will be sent.’,

@category_name=N’Log Shipping’,

@owner_login_name=N’sa’,

@job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/****** Object: Step [Monitor Log Shipping]   ******/

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’Monitor SQLRXDB Log Shipping’,

@step_id=1,

@cmdexec_success_code=0,

@on_success_action=1,

@on_success_step_id=0,

@on_fail_action=2,

@on_fail_step_id=0,

@retry_attempts=0,

@retry_interval=0,

@os_run_priority=0, @subsystem=N’TSQL’,

@command=N’DECLARE @TimeStart DATETIME;

DECLARE @TimeEnd DATETIME;

DECLARE @CntLogRestoredMsgs INT;

DECLARE @msgstr VARCHAR(500);

SET @TimeStart = DATEADD(hh, -2, GETDATE()); — 2 hours

SET @TimeEnd = GETDATE();

 

CREATE TABLE #sqlerrlog

       (PK_Col int IDENTITY(1,1) PRIMARY KEY

       , LogDate datetime

       , ProcessInfo varchar(15)

       , [Text] varchar(max) );

— read the current sql server log

INSERT INTO #sqlerrlog

       EXEC xp_readerrorlog 0, 1, N”Log was restored”, N”SQLRXDB”, @TimeStart,@TimeEnd;

— read archive #1 sql server log in case of recent restart

INSERT INTO #sqlerrlog

       EXEC xp_readerrorlog 1, 1, N”Log was restored”, N”SQLRXDB”, @TimeStart,@TimeEnd;

 

SELECT @CntLogRestoredMsgs = COUNT(*) FROM #sqlerrlog

 

IF @CntLogRestoredMsgs = 0

BEGIN

       SET @msgstr = ”No transaction logs have been restored for the SQLRXDB database on OurServerName in the past N hours.”

                   + CHAR(10) + ”Investigate as soon as possible!”

 

       EXEC msdb.dbo.sp_send_dbmail

             @profile_name = ”DBMail”,

      @recipients = ”Admins@company.com”,

      @subject =”OurServerName Log Shipping Alert”,

      @body = @msgstr

 

END

 

— cleanup

DROP TABLE #sqlerrlog

 

,

@database_name=N’master’,

@flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N’LSMonitorSchedule’,

@enabled=1,

@freq_type=4,

@freq_interval=1,

@freq_subday_type=8,

@freq_subday_interval=1,

@freq_relative_interval=0,

@freq_recurrence_factor=0,

@active_start_date=20160710,

@active_end_date=99991231,

@active_start_time=80000,

@active_end_time=20001

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:

 

GO

And…BADABING!! You now have disconnected log shipping set up. Set off the jobs and work out any path or naming issues. Once I had all pieces going, it has continued to work like a charm! I hope that someone finds this helpful since I could not find anywhere that had all parts of this process documented.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Table-Valued Parameter Performance Using C# Data Tables

The following testing and resulting data is a direct result of a recent issue that was discovered at one of our client sites.  After setting up SQL monitoring that does use a SQL trace to capture long running queries, suddenly certain processes would lock up and be unable to complete.  Initially we thought it was everything except the SQL trace causing the issue until we finally turned off all trace captures and the problem went away.  This was very concerning to us since using either SQL trace or XEvents is the method that we generally use to capture query performance. Without a fix, our performance tuning attempts would be severely hamstrung and slowed down. This problem had to be fixed yesterday!

Another side effect that would happen when things locked up was that despite having the max server memory set, SQL would end up using substantially more memory that was allocated to it.  At this client, developers were importing data from an application using C# and table-valued parameters which should have not been an issue. After a lot of searching, we finally came across some comments by another SQL expert who mostly described our issue and recommended that any C# TVP’s should have their columns defined or SQL could lock up if either trace or XEvents Completed Events are captured.  When we checked the C# code we found that the string columns that were being imported were not defined in an effort to make the import code flexible.  We were able definitively fix this very obscure problem with just one line of C# code that is in the below post.  The end result is a FASTER import of data that can be captured in trace or XEvents.  Nice!!

Since we had to put together this fix for a very undocumented issue, we sincerely hope that any DBA or developer who runs into this problem can correct it much faster than we did.  A link to download a zip file of all code displayed in this post is included at the end.  Enjoy!

********************************************

Table-Valued Parameter Performance Using C# Data Tables

— By Jeff Schwartz

Overview

Much has been written about the use of table-valued parameters (TVP) and their performance. These narratives appear to be contradictory because some state that TVPs should be used only for small datasets, whereas others assert that TVPs scale very well. Adding to the confusion is that the fact that most of the performance-related discussions focused on TVP use within T-SQL. Many of these examples employ either SQL trace or Extended Events (XEvents) to evaluate performance, but almost none of them has discussed what can occur when C# data tables and SQL trace or XEvents are used simultaneously. This paper details the analysis and resolution of a situation that occurred at a customer site when an existing C# application suddenly began to run extremely slowly after a very lightweight SQL trace that utilized a several-minute duration threshold and captured no frequent events was activated.

Initial Research

Preliminary research indicated that under very specific conditions the combination of TVPs and SQL trace could result in unintended performance consequences, e.g., excessive SQL Server memory consumption to the point where SQL Server exhausted all memory on the server regardless of the maximum memory setting value. The research also suggested that string usage and lengths in unspecified locations might have an impact on this issue. The question of whether TVPs might have some record-count performance threshold above which performance would degrade was also examined. Although bulk insert was not used by the application, the research team also decided to compare bulk and TVP insertion speeds because that is another source of conflicting information.

Since very little literature exists that discusses the combination of TVPs and SQL trace, the only recourse involved creating test frameworks, performing actual data loads, and using internal application timers in conjunction with either SQL trace or XEvents to monitor performance.

Test Frameworks

1.Two frameworks were constructed: T-SQL-only and C# calling a T-SQL stored procedure.

a. The T-SQL-only test loaded data into a TVP, which then called a stored procedure to perform the insertion into the final table.

b. The development team provided C# code that illustrated how the user application read a flat file into a C# data table, which was then passed as a TVP to a T-SQL stored procedure. This code was followed as closely as possible in the test framework.

2.A SQL server table containing data captured from the sys.dm_db_index_operational_stats Data Management View (DMV) supplied the data for testing because

a. The data table contained approximately 3.2 million records, thereby being large enough to stress the application code and SQL Server adequately. The largest data file provided by the application team was approximately 1.6 million records, so the testing could push well beyond current record levels.

b. Each data record contained 47 columns, which was wider in terms of data columns than any application table being loaded.

c. No string value columns existed in the data. This insured that user string data could not affect any interaction between the C# application and SQL Server, and that any string-related behavior was strictly due to the manner in which the application loaded the data table or the TVP, or the way the TVP parameter-passing was recorded in the SQL trace. All data columns were smallint, int, bigint, or date. This also insured exact record lengths so that any record count-related behavior was independent of record size. Some of the data files being loaded by the development team contained only fixed-width data types, whereas others contained at least a few variable length string columns.

3.The frameworks were designed to load various numbers of records using the same starting point to determine whether a performance threshold existed for TVPs, regardless of whether SQL trace (or XEvents) was active, as well as to determine whether even small numbers of records could be processed with the SQL trace (or XEvents) active.

4.Thirty-one different record levels, shown in Table 1, were used for testing to insure any record-count gaps were small. A separate CSV file of each length was created from the data cited in #2, always beginning with the first record.

Blog_20160901_1

Table 1: Record Counts used in Testing

5.Several methods of loading the flat file CSV data into C# data tables were developed and tested. The C# data table configurations were as follows:

a. Unlimited string length columns (initially used in the production C# application) – herein noted as NoStringMax (String #N/A on graphs)

b. Fixed length string columns of at most 30, 100, 500, or 1000 characters each – herein noted as StringMax (String <#> on graphs). Note: the development team stated that no column would exceed 1,000 characters in length.

c. Exact data type mapping so that the data table column types matched those of the receiving SQL Server table exactly – herein noted as Map

d. A common routine was used by options a, b, and c above to send the data table to SQL Server using a TVP.

e. The section of code that loaded the data table from the flat file was timed separately from the routine cited in d, which also was surrounded by its own timers. This enabled comparison of data load, TVP passing, and combined times.

f. The name of the CSV file is passed in as a parameter along with an adjusted version of the file name so the appropriate test identifiers can be written to the application log file for later analysis.

6.The tests were run with the following SQL Server monitoring options:

a. No SQL Server monitoring at all.

b. SQL trace monitoring including all normally captured events except for completed ones. These were handled as follows:

i.Exclude all completed events

ii.Include ad-hoc and batch completed events with a two-second duration threshold

iii.Include ad-hoc and batch completed events as well as completed statement events with a two-second duration threshold

c. SQL Extended Events (XEvents) including all normally captured events except for completed ones. These were handled as follows:

i.Exclude all completed events

ii.Include ad-hoc and batch completed events with a two-duration threshold

iii.Include ad-hoc and batch completed events as well as completed statement events with a two-duration threshold

7.All tests were performed on the same server to eliminate network and hardware variability. This server had a quad-core Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz processor, 32 GB of RAM, and a 500 GB SSD HD.

8.Various combinations of cold and warm SQL Server caches, cold and warm Windows caches, and SQL Server maximum memory limit sizes were tested. Some of the earlier attempts will be discussed in the Results section below. The final specifications were as follows:

a. No applications other than SQL Server, SSMS, and the C# application were active on the test server.

b. SQL Server’s maximum memory limit was set to 12 GB. This allowed SQL to go over its limit without distorting overall system behavior by exhausting server memory entirely.

c. Hot Windows file cache, i.e., a full run that loaded ALL of the flat file data into RAM was performed before the timing tests were run. This insured that the HD was involved minimally.

d. Hot SQL Server buffer cache, achieved by running the next series soon after the last run completed (with no intervening runs), and by running each sequence four times in immediate succession.

C# Code

The NoStringMax routine cited in #5a in the previous section is shown in Figure 1.

Blog_20160901_2

Figure 1: Test C# Data Table Load Routine Using No String Length Maximums

The StringMax routine cited in #5b in the previous section is shown in Figure 2. The ONLY difference between this routine and the previous one is the addition of the MaxLength assignment that is highlighted below. This single line sets the maximum length of each data table column to the value of the iMaxStringLen parameter. As cited in #5b, this value ranged from 30 to 1,000 during testing.

Blog_20160901_3

Figure 2: Test C# Data Table Load Routine Using String Length Maximums

The Map routine cited in #5c in the previous section is shown in Figure 3 through Figure 6. Logically, this code is identical to that of #5a and #5b, but the individual column mappings make the routine much longer. Note: since no strings are used for the data table columns, the MaxLength parameter is unnecessary.

Blog_20160901_4

Figure 3: Test C# Data Table Load Routine Using Data Type Mapping – Part 1

Blog_20160901_5

Figure 4: Test C# Data Table Load Routine Using Data Type Mapping – Part 2

Blog_20160901_6

Figure 5: Test C# Data Table Load Routine Using Data Type Mapping – Part 3

Blog_20160901_7

Figure 6: Test C# Data Table Load Routine Using Data Type Mapping – Part 4

The routine cited in #5d in the previous section is shown in Figure 7.

Blog_20160901_8

Figure 7: C# Test Routine that Sends Data Table to SQL Server Using a TVP

Results

1.The T-SQL code alone, cited in Test Frameworks #1a did not recreate the problem observed at the customer site when either SQL trace or Extended Events (XEvents) were used, so additional testing was unnecessary.

2.No problems occurred when the C# application was used in conjunction with SQL trace or XEvents as long as no completed events were captured. Therefore, capturing only specific trace events created the problems.

3.Further research showed that when the C# code was running, adding ad-hoc or batch completed events to SQL traces or XEvent sessions caused issues. Adding completed statement events did not change things appreciably.

4.Interestingly, the extreme misbehavior was triggered by the combination of using NoStringMax C# code and having SQL trace or XEvents capture ad-hoc or batch completed events. Although the StringMax and Map routines ran a little more slowly for certain record count levels when SQL trace or XEvents captured ad-hoc or batch completed events, the dramatic memory consumption issues did not occur at all when these methods were used.

5.Initially, testing employed a 28 GB maximum SQL Server memory setting, but as occurred at the customer site, when the problem arose, SQL Server consumed all the memory on the server, which caused the server fundamentally to stop working. Using this setting and the NoStringMax C# code, CSV files with record counts up to 25,000 could be processed without taking hours, failing, or causing the server to run completely out of memory. However, the application could not reach the 50,000 record level.

6.Since testing needed to process MUCH higher numbers of records, the decision was made to reduce SQL Server memory to 4 GB in an attempt to provide additional memory space for SQL Server memory overflow. Another reason for lowering the SQL Server memory allocation was to insure that the tests consumed all of SQL Server’s buffer pool and that SQL Server had to operate against memory limits. This consideration, as well as the run-time variations, necessitated the four successive iterations for each combination.

7.Unfortunately, using the 4 GB setting, the lower-end NoStringMax C# code runs while capturing SQL trace ad-hoc or batch completed events caused application failures at approximately the 4,000 or 8,000 record levels. This clearly indicated an increase in SQL Server memory was required, so it was increased to 8 GB. Although the runs processed more records, they still crashed well before the 25,000 record level. Finally, moving the memory limit up to 12 GB enabled everything to run as before, effectively mimicking the 28 GB testing without running the risk of exhausting Windows memory.

8.Figure 8 highlights the NoStringMax C# code runs through the 25,000 record level while SQL trace or XEvents ad-hoc or batch completed events were being captured. It is evident that the run times were unacceptable. When the same test levels are viewed in Figure 9, the problem becomes abundantly clear. With SQL trace or XEvents capturing ad-hoc or batch completed events, the run times for the 10,000 record tests using the NoStringMax C# code ranged between 483 and 584 seconds. When ad-hoc or batch completed events were not captured, the run times were approximately 0.14 SECONDS! The values for all other tests, including NoStringMax C# code with SQL trace and XEvents ad-hoc or batch completed events off, are shown in Figure 9.

Blog_20160901_9

Figure 8: TVP Load Timing when Trace On or XEvents on

9.Although many lines appear in Figure 9, three groupings exist and these are the most important. Group 1, which consists of dotted lines, includes all tests during which ad-hoc or batch completed events were not captured. Group 2, which consists of solid lines, includes all tests during which SQL trace captured ad-hoc or batch completed events. Group 3, which consists of dashed lines, includes all tests during which XEvents captured ad-hoc or batch completed events. Important note: The NoStringMax runs shown in Figure 8 are NOT in Figure 9 because of scaling considerations. Figure 9 highlights several notable issues. Most importantly, once a maximum string length is specified, performance improves even when SQL trace or XEvents ad-hoc or batch completed events are not captured. In addition, the terrible performance problems go away. Interestingly, the behaviors of the various StringMax and Map runs were almost identical through approximately 175,000 records and then again at about 800,000 records. In between, unexplained divergence occurs, but it is only a few seconds and occurred primarily between the 200,000 and 700,000 record levels. The pattern shown in Figure 9 was repeated in every test sequence conducted. It is noteworthy that the StringMax and Map versions of the routine outperformed the NoStringMax under all comparable trace or XEvent-related conditions, and at higher volumes, even when completed events were captured.

Blog_20160901_10

Figure 9: TVP Load Timing when Trace/XEvents off OR Trace or XEvents on and Mapped Data Table or Limited-Length String Values Used

10.Once the number of records exceeeded two million, the load of the C# data table began to crash due to application memory exhaustion. Note: neither Windows nor SQL Server memory was consumed excessively during this phase, so the limitation was strictly application-related.

11.Figure 10 summarizes the overall insertion rates of all previously cited methods at the two million record level, in addition to the bulk insert method. This graph shows clearly that the worst performer was the unlimited string method, and that the best one was the bulk insert method. Note: these rates exclude the reading of the flat file into the C# data table as well as the creation of the file that was suitable for bulk insertion.

Blog_20160901_11

Figure 10: TVP versus Bulk Insert Records per Second

Conclusions

The use of unlimited-length C# strings with data tables not only performs worse without capturing any SQL trace or XEvents ad-hoc or batch completed events, it performs dreadfully when they are. Their use can cause poor overall server performance if the server’s memory is exhausted. Finally, their use prevents the ability to monitor SQL Server query performance using either SQL trace or XEvents. However, when a maximum string length is specified, regardless of its length, performance without SQL trace or XEvents improved and the problematic interaction with SQL trace or XEvents was mitigated almost completely. Since the Map method is inefficient and limiting from an application development perspective and performance was not substantially better than with limited-length strings, its use does not appear to be beneficial.

No table-valued parameter scalability issues were observed with the possible exception of the 200,000 to 700,000 record range when performance inexplicably, and consistently, dipped. However, from 800,000 to 2 million, performance experienced no such dip. Interestingly, standard bulk insert performance exceeded that of TVPs when the same data was used.

Recommendations

Since the code to implement the maximum-length string involves only one line and the performance monitoring and improvement benefits are substantial, the <data table name>.MaxLength = <n> statement should be included in data table load routines as shown by the highlighted line in Figure 2. <n> should be the maximum expected length of any data column. Limiting string length will allow C# TVP data loads to run more efficiently and also enable customers to conduct normal query performance monitoring.

********************************************

The code shown in this post can be downloaded from here….  SQLRX_TVP_CSharp_Code

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Finding overlapping times with the LAG Function

— by Jeffry Schwartz

Some functions in SQL are a little difficult to use and understand. I was recently using the LAG function to shed some light on a performance issue, so I thought I would show how I used it and the results it produced in a way that is easy to understand.  At one of our clients after evaluating SQL trace data, I noticed that the attention records (caused by problems with the application, connections, etc.) not only occurred around some non-yielding scheduler messages that had been appearing along with timeouts, but also that the attention records were overlapping, i.e., sometimes one began and then others occurred before the first one ended.  I’ve looked at the attention data from other clients and found that not only do they not overlap like this, they don’t last long enough to overlap, meaning that this is a special case.

To show clearly which ones overlap, I decided to use the lag function to compare the end time of an older one with the start time of the current one.  If the start time of the current record occurs before the end time of a previous record, the flag is set.  This way I don’t have to scrutinize the timestamps and compare them manually.  I partition by ServerName because I want to compare all those from the same instance and I want to order them by StartTime so when I compare adjacent records, they are in time order.  You can partition and order by anything you want.

Here is the query, which is generic enough to be understood easily.  I’ve included the timestamps themselves so you can see how it works.

SELECT StartTime, EndTime,

CASE WHEN StartTime < LAG(EndTime,1,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 1],

CASE WHEN StartTime < LAG(EndTime,2,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 2],

CASE WHEN StartTime < LAG(EndTime,3,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 3],

CASE WHEN StartTime < LAG(EndTime,4,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 4],

CASE WHEN StartTime < LAG(EndTime,5,0) OVER (PARTITION BY ServerName ORDER BY StartTime) THEN 1 ELSE 0 END AS [Overlap Lag 5],

LAG(EndTime,1,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime1],

LAG(EndTime,2,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime2],

LAG(EndTime,3,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime3],

LAG(EndTime,4,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime4],

LAG(EndTime,5,0) OVER (PARTITION BY ServerName ORDER BY StartTime) AS [lag EndTime5],

Duration / 1000.0 AS [Duration Secs], Duration / 60000.0 AS [Duration Minutes]

–, SPID, ClientProcessID, RowNumber,

–ServerName AS [Instance], LTRIM(ApplicationName) AS [Application],

–DBName AS [Database], HostName, NTUserName, LoginName AS [Login Name]

FROM dbo.SQLTraceReceiver WITH (NOLOCK)

INNER JOIN SQLStaticDatabaseInfo ON DatabaseID = [DBID]

WHERE SQLTraceReceiver.EventClass = 16 — Attention

 

Here is the output:

Blog_20160526_1

A timestamp of 12:00:00 AM really means NULL because the lag is invalid.  I highlighted the overlaps to make them easier to see.  Obviously, you may not need to use lag for time evaluations as I do, but this shows you how to compare values of adjacent records and report something about the comparison without having to display the actual lag values.  I left the [LAG EndTime<n>] columns in the query so you could see it in action, but for the final real query, I actually would not include them and would add the descriptive columns that are commented out back in.

Lag takes a little getting used to, but you once you play with it a bit as in this example, it makes perfect sense.  These functions came out on 2012, so if you are using this version or higher, you are good to go.  Note:  I could have used LEAD to go the other way, i.e., to compare future records using the current record.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!