Monthly Archives: August 2015

AlwaysOn – Monitoring and Alerting

–by Ginger Keys

AlwaysOn Availability Groups is a great new feature of SQL 2012 and 2014. This feature is a combination of database mirroring and failover clustering, and it provides a strong level of high availability and disaster recovery; however there is no straightforward monitoring/alerting processes to go along with this new functionality.

It is important for you to 1) observe the current state and health of your Availability Group, but more importantly 2) get alerted if things go wrong.

You need to know if your AG is healthy, online, synchronizing, and available. And you might need to keep track of issues with failover, and the overall health of the servers and the Availability Group.

There are a few different methods to monitor your system, and also to alert you to problems. I have tried to piece together a broad approach to monitoring/alerting. These methods can be configured and altered to include more, less, or different data, depending upon your specific needs for your organization and system. But this is a good starting point for you to begin with, and alter if needed.


SSMS Availability Group Dashboard –

The first method you can use to observe the state of your AlwaysOn Availability Group is the AlwaysOn Dashboard.

Below is the AlwaysOn Availability Group I have created:


SQL Management Studio (SSMS) provides a Dashboard tool to monitor the current state and health of your Availability Groups (AG). Simply right click on the Availability Groups folder in Object Explorer, and select “Show Dashboard” to get an overview of the state of your AG.


In this Dashboard, you can see which replicas are primary/secondary, the databases in the AG, the failover mode, and if they are online and connected to the AG. If there are any issues, a link will appear in the Issues column which will help you troubleshoot.

The synchronization state column indicates the method of data replication – if the state is ‘synchronized’ that means that the data is replicating synchronously to the secondary replica (the primary waits for secondary to harden the log before moving on to the next task). If the state is ‘synchronizing’, that means the data is replicating asynchronously, so the primary replica is not waiting for the secondary to harden the log, and there could be possible data loss.

Notice in my dashboard above, I have included the columns Estimated Data Loss and Estimated Recovery Time. These two measures are not included in the default view, but you can add the additional columns by right clicking on the column heading and choosing additional columns to include. Estimated Data Loss tell us the time difference of the last transaction log record in the primary replica and secondary replica. If the primary replica fails, the transaction log records in that time window will be lost. The Estimated Recovery Time tells us the time in seconds it takes to redo the catch-up time. The catch-up time is the time it will take for the secondary replica to harden the logs and catch up with the primary replica. For organizations that are highly transaction sensitive and have no tolerance for data loss, these are important metrics to keep an eye on.

The AG Dashboard in SSMS will not notify you of problems…you will only see problems through AG Dashboard if you happen to be looking at the dashboard at the exact time a problem occurs. This tool is only beneficial for seeing the current state and health of your AlwaysOn Group.

The AlwaysOn Dashboard is easy to use and very intuitive. It queries several DMVs and produces a nice, easy-to-read report for you to see the current state of your AlwaysOn Group. However, producing this nice report can create a performance hit on your server.

If you want to gather information about the current health of your Availability Group directly through DMVs instead of opening the Dashboard, you can run the following script to derive roughly the same information:

select cluster_name,


from sys.dm_hadr_cluster


select ar.replica_server_name,





CASE ars.connected_state

WHEN 0 THEN ‘Disconnected’

WHEN 1 THEN ‘Connected’


END as ConnectionState

from sys.dm_hadr_availability_replica_states ars

inner join sys.availability_replicas ar on ars.replica_id = ar.replica_id

and ars.group_id = ar.group_id


select distinct rcs.database_name,




CASE rcs.is_failover_ready

WHEN 0 THEN ‘Data Loss’

WHEN 1 THEN ‘No Data Loss’


END as FailoverReady

from sys.dm_hadr_database_replica_states drs

inner join sys.availability_replicas ar on drs.replica_id = ar.replica_id

and drs.group_id = ar.group_id

inner join sys.dm_hadr_database_replica_cluster_states rcs on drs.replica_id = rcs.replica_id

order by replica_server_name


Dynamic Management Views –

The second method you can use to observe the current state of your AlwaysOn Availability Groups is through querying dynamic management views (DMVs).  SQL provides several DMVs to monitor the state of your AlwaysOn Availability Group that will give you information about your AG cluster, networks, replicas, databases, and listeners.

















These DMVs are all explained here . You can query any of these DMVs to gather information about your AG such as configuration, health status, and the condition of your Availability Group.  Another great link for further explanation of these DMVs is here .

Just an FYI…AlwaysOn AG catalog views require View Any Definition permission on the server instance. AlwaysOn Availability Groups dynamic management views require View Server State permission on the server.


SQL Server Agent Alerts

The best method for creating alerts for your AG that will notify you as soon as any problem or event occurs, is the SQL Server Agent Alerts. These alerts are a great way to be proactive in monitoring your AG, and there are several alerts specifically related to AlwaysOn Availability Groups. In order to find which error codes correspond to an AG event you can run this query:

use master


select message_id as ErrorNumber, text

from sys.messages

where text LIKE (‘%availability%’)

and language_id = 1033

This will give you a result set with 293 rows. You can peruse through and determine which errors are important for you, but I have devised a list with what we feel is the most important information to be alerted on:



In order to create a SQL Server Agent Alert, you must have the Database Mail configured and enabled, and you must have an Operator created. For instructions on how to do this click here:

You can create your alerts using TSQL by executing the following script, and substituting the Name of Alert and Operator that is applicable to your environment:

— 1480 – AG Role Change (failover)

EXEC msdb.dbo.sp_add_alert

@name = N'[Name of Alert]’,

@message_id = 1480,

@severity = 0,

@enabled = 1,

@delay_between_responses = 0,

@include_event_description_in = 1;


EXEC msdb.dbo.sp_add_notification

@alert_name = N'[Name of Alert]’,

@operator_name = N'[Operator]’,

@notification_method = 1;


Or you can create your alerts using the New Alert Wizard. In SSMS, expand the SQL Server Agent, and right click on Alerts. Select New Alert:


Create a name for your Alert, specify the Error Number, and make sure the alert is Enabled:


On the Responses page of the alert properties, specify the operator(s) you created earlier to receive the alerts:


On the Options page of the alert properties, specify how (or if) you would like to receive the alert error message text (email, pager, net send).


Be sure to create these alerts on each of your replicas.




It’s certainly important to be able to observe the current state of your AlwaysOn Availability Group, and it’s even more important to be notified when an issue occurs that requires your attention. The methods outlined above provide a comprehensive approach, allowing you to be proactive to ensure your system is healthy.

If you would like assistance monitoring your AlwaysOn Group, or your SQL Servers and databases, please contact us at or . We would be happy to answer any question you have, and would be interested to hear about your experiences with AlwaysOn!


For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at We will be happy to help! Leave a comment and feel free to track back to us. We love to talk tech with anyone in our SQL family!

The Pros & Cons of Hyperthreading

— By Jeff Schwartz

Hyperthreading is one of those features that most IT personnel have heard about, but many do not understand exactly what it does or why it may improve or degrade the performance of their servers. Although hyperthreaded processors have been available for many years, the confusion surrounding it is as commonplace today as it was when the technology was introduced in 2002. Server, processor, and motherboard manufacturers have contributed to the confusion by activating the feature by default for several years after it was introduced, then deactivating it by default for a few years, and recently by activating it again by default. In addition, for many years, determining whether it was on or off could be accomplished only by reviewing the BIOS settings during a reboot, which is extremely difficult to schedule for most production servers.

What is Hyperthreading?

Hyperthreading involves having two distinct pipelines into a single processor execution core. Research into code sequences showed that code executing on a processor frequently, and naturally, pauses because of things such as main memory fetches into the processor cache and improper code branch prediction. The rationale is to provide the execution portion of an idling core processor with work to do while these kinds of activities are handled. These two pipelines are presented to the outside world as two distinct processors: one physical and one “logical.” Therefore, as far as any operating system is concerned, twice as many processors are available when hyperthreading is active compared to when it is not.

Why not use it all the time?

In theory, utilizing available processor cycles is a great idea and therefore, this mechanism should be used all the time. However, in practice, some problems arise. Two main issues are involved: processor schedulers and the length of time a unit of work stays on a processor without interruption. Since the execution portion of a processor is still a single entity, scheduling appropriately is critically important, i.e., two threads from the same process should NOT be scheduled on the same core. If they are, each thread will take turns stopping the other and this degrades performance. The difficulty, particularly in the SQL Server environment, is that SQL Server schedules its own threads and depending upon the SQL Server version as well as the Windows version on which it is running, SQL Server may or may not be aware of the physical/logical distinction or know how to handle this distinction properly. Therefore, there is a good chance that it may schedule two dependent threads on the same core. The other issue is that a large or parallelized SQL Server query threads usually consume their full time allotment on a processor before they are suspended. Any thread sitting in the other pipe of a hyperthreaded processor will wait the full time before it is allowed to run again. If that thread is from the same query, the two threads will stop each other from running and make the query run substantially longer. Therefore, under normal conditions, most SQL Server customers do not benefit from employing hyperthreading.

One type of workload, high volume small transaction processing, DOES benefit from hyperthreading, but even then it does not benefit as much as one might expect. Benchmarks and actual user experiences have demonstrated that at most, a 30 percent gain can be obtained when transactions are extremely small, e.g., they execute in under 200 milliseconds. Therefore, if a server only handles small high volume transactions regardless of whether they are inquiry, insert, or update, it will benefit from hyperthreading. As soon as any large queries arrive at the server, the benefit is gone and the problems cited in the previous paragraph arise.

How can I tell whether Hyperthreading is active?

Historically, this has been difficult to determine, but in the past few years, Windows and WMI have improved to provide this information without interrogating the BIOS. Older versions did not provide adequate or accurate information, but beginning with Windows 2008, the information is quite useful. This results from the fact that older versions of Windows either did not know or report the physical/logical processor distinction. The following PowerShell (other languages that can access WMI will work also) commands provide the necessary information. The first command returns the number of sockets on the system followed by the number of logical processors, and the second command returns information regarding the processors themselves. The first output was obtained on a quad-core single-socket system with hyperthreading turned off, and the second output was obtained from the same system with hyperthreading turned on. Clearly, the number of logical processors doubles when hyperthreading is turned on.

gwmi win32_computersystem | fl NumberOfProcessors, NumberOfLogicalProcessors



Hyperthreading can be useful in a SQL Server environment if the workload is comprised solely of extremely lightweight transactions. Under these conditions, the most one can expect to obtain from hyperthreading is a 30 percent improvement. If the server is running the latest versions of Windows and SQL Server, it may be possible to obtain some improvement for workloads that are highly transactional with a few small reports mixed in. However, in most SQL Server environments, hyperthreading will most likely degrade performance.

Find Permission Changes In The Default Trace

— By Lori Brown  @SQLSupahStah

I recently had someone contact me who needed to figure out when a user’s database permissions were changed. Apparently, a domain account that had been working for months doing things in a specific database suddenly could not do its normal work. On checking, we found that all of the user’s permissions had been removed from the database. The permissions were reset but management wanted to know when and who changed things. All the people with access to assign permissions were asked about it and everyone swore that no changes had been made. Uh huh…

Fortunately the default trace in SQL does contain auditing events for when objects and logins are changed.


A quick query of the default trace files using fn_trace_getinfo and fn_trace_gettable will give you what you need.

DECLARE @tracefile VARCHAR(500)

— Get path of default trace file

SELECT @tracefile = CAST(value AS VARCHAR(500))

FROM ::fn_trace_getinfo(DEFAULT)

WHERE traceid = 1

AND property = 2

— Get security changes from the default trace


FROM ::fn_trace_gettable(@tracefile, DEFAULT) trcdata  — DEFAULT means all trace files will be read

INNER JOIN sys.trace_events evt

ON trcdata.EventClass = evt.trace_event_id

WHERE trcdata.EventClass IN (102, 103, 104, 105, 106, 108, 109, 110, 111)

ORDER BY trcdata.StartTime



More info on fn_trace_getinfo:

More info on fn_trace_gettable:

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at We will be happy to help! Leave a comment and feel free to track back to us. We love to talk tech with anyone in our SQL family!

Tip of the Month: Resource Governor for I/O in SQL 2014

Available in SQL Server 2014 Enterprise Edition

The Resource Governor can now restrict the number of I/O’s sent to the disk subsystem. In the past, only CPU and Memory could be throttled, but now DBAs can also throttle I/O. This gives DBAs the ability to further control those users who insist on running reports or other queries that can bring all other activity on the instance to a grinding halt.

Here is what you can now do with the Resource Governor:

  • Set constraints on the physical I/O operations
  • Throttles I/O by controlling the number of I/O’s sent to the disk subsystem
  • I/O’s can be reads or writes of any
  • Limit the physical I/O’s issued for user threads in a resource pool

MAX_OUTSTANDING_IO_PER_VOLUME = set the maximum outstanding I/O operations per disk volume. Can be set and act as a safety even when MAX_IOPS_PER_VOLUME and MIN_IOPS_PER_VOLUME have been set.




MAX_IOPS_PER_VOLUME and MIN_IOPS_PER_VOLUME = the maximum and minimum IO operations per second respectively



More info can be found at: