Thursday, April 10, 2014

SQL Sentry v8: Baselines from Every Angle

In my last post I covered the exciting new intelligent alerting features in SQL Sentry v8. Next up in the series is another huge and often requested feature: Baselining

Designing this feature proved to be somewhat of a challenge due to the diversity of opinions on this topic. It seems that everyone has a different idea about baselining – what it is, and how it should be used effectively to improve performance. So, as is often the case, we tried to look beyond the term and assess what people truly need to accomplish with this feature. We knew that what we built had to satisfy a variety of use cases, be simple to use, and integrate seamlessly with the rest of our software.

To pull it off, we've leveraged both preexisting and new SQL Sentry functionality to produce what I believe is the most comprehensive and intuitive baselining for SQL Server and Windows to date. With the new Performance Advisor baselining you can:

  • Compare the current date range against built-in temporal baselines to quickly ascertain patterns in counter behavior
  • Create baselines for performance metrics using source data from any date range
  • Create any number of baselines with friendly names and easily switch between them
  • Send alerts or take various other actions in response to deviations from baseline values

Built-in Baselines

On the Performance Advisor Dashboard you'll see a new Baseline dropdown in the upper right. It is pre-populated with several temporal baselines:

BaselineDropdown

Select one of these baselines and you'll see a moving line or min/max range overlaid on top of the current charts:

Baseline-PreviousDay-Range
(click to enlarge)

In the shot above I've selected Previous Day with a Range style overlay. You can see patterns emerge on several charts, and it turns out that the spikes in waits, CPU, etc., appear to be normal behavior for that time of day. Of course, normal doesn't necessarily mean acceptable ;-) Fortunately, PA lets you ascertain the root cause of most any spike, whether from SQL Server or some other Windows process, so you quickly can make a determination about acceptability.

Note that the built-in baselines listed will change depending on the size of the active date range. For example, if you are in a 3-day range, Previous Day doesn't make sense, so you won't see it. Also, the Range option is currently only available when viewing detail data (range <= 30 minutes).

Custom Baselines

In addition to the built-in baselines, you can create your own custom baselines! To do so, simply click and drag on any chart to highlight a date range, then select the Create Baseline context menu. Here I'll create a baseline called "Business Hours" which represents normal and acceptable performance during that time:

Baseline-Create

Next, a screen appears which shows the Average, Min, Max and Standard Deviation for each dashboard metric over the selected range. From there you can choose which metric is shown by default on multi-metric charts, as well as manually adjust the baseline value by editing the Average column:

Baseline-Create-Metrics
(click to enlarge)

Once saved, the baseline is added to the Baseline dropdown as selected, and a horizontal line appears on each chart showing the baseline value for each metric. The line provides a reference point so you can easily spot unusual activity. Here we see that at around 10:10AM something started impacting disk waits, CPU, PLE, and page reads, causing them to deviate from the normal business hours level:

Baseline-User
(click to enlarge)

From here you can highlight the range and Jump To other areas like Top SQL, Processes or the Event Manager Calendar to get at the root cause.

There are many use cases like this for custom baselines. Here are a few others:

  • Create a baseline prior to making a configuration change on the server, then use it to immediately monitor how the change impacts performance.
  • Create a baseline for intensive maintenance operations like integrity checks, index rebuilds, or backups, and periodically review the operation against its baseline to ensure it is still performing as expected. I've seen many times where a change to the network or the SAN by some other group can have a dramatic impact on maintenance ops, and this is a great way to catch this!
  • Create periodic date-based baselines, such as "Jan 2014", "Feb 2014", etc., and use them to assess how performance is changing over time.

You can create as many custom baselines as you want, and edit them at any time by clicking the build button to the right of the dropdown. For multi-metric charts, you can change the default metric shown without editing the baseline by right-clicking it in the legend.

Baselines and Alerting

These visualization capabilities are great, but the feature wouldn't be complete without the ability to generate alerts using baseline data. That's where the other new SQL Sentry functionality comes in: when a custom baseline is created at the Windows computer or SQL Server level, it becomes available for use in custom conditions at that level.

Using baselines in a custom condition is easy – instead of setting an explicit threshold value on the right side of a comparison, select Performance Counter Baseline, and its value for the metric is automatically substituted. This way if a baseline value ever changes, you won't need to update any conditions.

In the condition below, for demonstration purposes I've added 3 separate baseline tests with some basic math to control how large of a deviation from the baseline value is required to trigger the condition:

Baseline-CustomCondition
(click to enlarge)

The system knows to use the baseline value for the counter on the left side of the comparison. Early versions required full declaration of the counter on the right, but this took too long to configure and made the comparisons far too wide, so we came up with this context-aware shorthand.

You can select a baseline on the left side, but you'll forego the shorthand when doing so. It generally reads easier to have the counter on the left and threshold on the right anyway.

Baselines vNext

As you can see, we're doing quite a bit with baselines out of the gate... but as I said in my last post on v8, we are just getting started! We're already working on some very cool enhancements to make baselines even more useful. Meantime, I hope you take advantage of these new capabilities, and as always, your feedback is appreciated.

Download SQL Sentry v8 here: New Users | Existing Users

Friday, March 14, 2014

SQL Sentry v8: Intelligent Alerting Redefined

It's tempting to insert some clever reference to everyone's favorite vegetable juice medley here, but I will refrain, for fear of diminishing what is perhaps the most significant SQL Sentry release to date! In v8 we've addressed the three most frequent requests from our users – performance alerting, baselining, and cloud access – and we've done so in typical SQL Sentry fashion. In other words, we haven't been content to tick a feature box – we've gone all out to ensure that each feature has been constructed thoughtfully and thoroughly, and integrates seamlessly with the rest of our software, ultimately providing a superior user experience.

Download SQL Sentry v8 here: New Users | Existing Users

I normally review all major new features in a single post, but this one got so long I had to split it up. First up in the series on v8: Performance Alerting (aka, Custom Conditions)

Pre-v8 Performance Alerting

There are only a couple of feature areas where heretofore a few in our space have been able to legitimately claim superiority, the big one being performance alerting. You've been able to do it via our Event Manager product for some time, but it was not the most elegant of systems for general purpose performance alerting, since it was initially designed around performance related to jobs.

SQL Sentry v8 changes all of this – the new Performance Advisor delivers performance alerting in a big way, and much more. Yes, we've leapfrogged the competition once again. Leapfrog may not be the right term though, as that could imply that at some point they may jump back over us. Knowing what went into building this feature, IMO there is little chance of this happening. (Let's just say it's about as likely as them building a better tool for plan analysis ;-)

Enter Custom Conditions

The name custom conditions isn't glamorous... but it does accurately describe the whole feature in a way that advanced performance alerting, enhanced change detection, advisory rule builder, or similar flashier but narrower terms don't. You can do all of these things and more with custom conditions, but ultimately they are just that – a condition that you define on which you want to be alerted and/or take some action. A custom condition can:

  • Detect and alert on threshold excursions for:
    • Windows performance counters (including SQL Server, SSAS, etc., counters)
    • Virtual performance counters (those auto-calc'd by SQL Sentry from DMVs or other sources)
  • Detect and alert on changes to values from:
    • SQL queries against user databases
    • SQL queries against the SQLSentry database
    • DMV/DMF queries
    • WMI queries
  • Compare multiple values from the same or different subsystems, using any combination of ANDs and ORs, and any number of nest levels
  • Apply math (multiply, divide, add, subtract) to values from any subsystem, and use the results in comparison operations
  • Combine multiple conditions together to create compound conditions
  • Detect when any SQL query against a target exceeds a specific duration threshold
  • Test any performance counter against a baseline value, or a percentage of a baseline value
  • Fire any combination of 11 different actions (Send Email, Send SNMP Trap, Execute SQL, etc.)
  • Show event markers on the Performance Advisor Dashboard charts, including enhanced tooltips with supplemental information
  • Write to the new Events Log, where a historical record of the event with a snapshot of all associated metrics is maintained
  • Easily be applied to all monitored servers, groups of servers, or individual servers

Whew! I know this sounds like a lot, but one of the best parts is how easy custom conditions are to configure. Everything is point and click, and even complex conditions can be configured quickly.

To get started, open the top-level Custom Conditions > Conditions List node in the Navigator pane:

ConditionsListNode

When you open this node for the first time, you will be prompted to download a base set of conditions created by SQL Sentry. You can also do this any time from the Tools menu. These will give you a good starting point for monitoring, and can also serve as a reference when creating your own conditions.

First I'll cover a simple example of what you can do with custom conditions using one from the included pack: High Compiles. Here's how it looks in the condition designer – you can specify a name, rich text description, definition, and various other parameters which control when the condition is triggered (evaluates to true):

CC-HighCompiles
(click to enlarge)

This definition contains two comparisons: the first ensures that batches/sec is >100 before even trying the second, which tests to see if compiles/sec is >15% of batches/sec. This is called short-circuiting, and it serves to both eliminate false positives, and minimize the amount of processing the custom condition engine must perform. We can see this in action here:

CC-HighCompiles-ShortCircuit
(click to enlarge)

The above is the view presented in the Evaluation Status grid and the Events Log after a condition has been evaluated against a target, and it embeds the results for each comparison, expression, and operation. This info is invaluable for testing and troubleshooting.

Speaking of, you will typically want to test any new condition you create against monitored servers before assigning actions to it, to ensure you aren't bombarded with alerts due to inappropriate thresholds. To do this, simply click either Evaluate button in the upper right:

CC-HighCompiles-EvalButtons
(click to enlarge)

The condition selected in the Conditions grid on the left will immediately be queued for evaluation by the SQL Sentry Monitoring Service, and results will show up in the Evaluation Status grid on the right within a few seconds... even when testing against hundreds of servers!

How is this possible? Check out SQL Sentry dev lead Brooke Philpott's (b|t) excellent post on how we are leveraging new .NET 4.5 features to pull this off without causing thread starvation or other performance issues.

Next up is a more advanced condition which uses Jonathan Kehayias' adaptable Page Life Expectancy (PLE) formula:

CC-LowPLE
(click to enlarge)

This condition has performance counters on both sides of the comparison, along with some math to dynamically calculate PLE based on the buffer size. It also takes advantage of short-circuiting – the first comparison ensures that the buffer size is greater than 2GB (131,072 is the number of pages per GB) before testing the second comparison. You can see in the example above, it has scaled the standard PLE > 300 seconds rule up to 588 due to the larger buffer on this server.

Another very cool enhancement in use here is the Any instance. When Any is used, all instances for a counter are automatically tested independently – NUMA nodes 000, 001, 002, and 004 here – and they are automatically synchronized inside each comparison. In other words, pages from instance 001 are only compared with PLE from instance 001, and so on. This way you don't have to configure a bunch of identical conditions for each counter instance, one for each NUMA node in this case.

Creating Custom Conditions

To create your own, click the Create Custom Condition button at upper left. All conditions have access to Windows metrics, but you may want to access SQL Server or SSAS metrics as well, so select one of the sub-items as appropriate:

createcustomcondition

Next, select the data type of the first comparison (numeric, string, or date/time), then the value source type:

CreateCC-Type

You have a variety of source types to choose from, and this is one of the aspects that makes this new feature so powerful. For the first time you can easily integrate any combination of values from any source to produce an intelligent "rule".

As you go through the dropdowns, the subsequent elements change based upon your prior selections, guiding you in the right direction:

CreateCC-Cat

In the shot above, the performance categories listed first (without the colon) are virtual counter categories exclusive to SQL Sentry. Just as in our performance reporting, we expose all data we collect. After all, why collect it if we aren't going to let you use it? A great example is wait stats. We eliminate innocuous waits and group them into friendly categories (Disk, CPU, etc.) and classes (Transaction Log, Parallelism, AlwaysOn, etc.), any of which you can now use for alerting purposes:

CreateCC-DiskWaits

So in a few clicks you can have a condition which triggers if total disk waits goes over 500ms for more than 30 seconds... on any server. Further, response rulesets can easily be applied to control exactly when you are alerted.

Compound Conditions

Conditions can easily be embedded within other conditions. Here's an example of one that picks up high CPU, but only when a process other than SQL Server is involved:

CC-HighCPU-NonSQL (click to enlarge)

It leverages the simple High CPU condition to short-circuit if CPU isn't greater than 90%. If it passes, it then it calculates the CPU % not associated with sqlservr.exe, and triggers only if it is greater than 25% of the total.

Alerting

Whenever you save a condition, if no actions have been defined you'll be prompted to add them:

CustomConditionsActionsPrompt

If Yes, you'll see the Actions Selector where you can choose from any of the 11 different actions which can be taken in response to the condition:

CustomConditionsActionsSelector

The new Send to Alerting Channels action (the default) allows you to control how/where the alerts are presented throughout the client app. The different channels can be individually selected from the Condition Settings tab once the action has been added. Just like all other conditions/actions in SQL Sentry, these can be set globally where they will apply to all servers in the environment, and easily overridden at the group or server level as needed.

Once enabled, all custom condition/action combinations show up in the newly redesigned Conditions pane at right, where alert settings, rulesets, and windows can easily be adjusted:

ConditionsPane
(click to enlarge)

The initial alerting channel is for the Performance Advisor Dashboard (see below for how this works), and more are on the way.

Events Log

When Send to Alerting Channels is selected, events will always be logged to the new Events Log:

EventsLog
(click to enlarge)

This is the place to go to see status for all condition events, active or completed. The grid lists all events for the selected date range, but can be easily filtered via the column headers or pre-loaded filters. If you click on any row you will see the stateful condition definition at bottom, and a richly formatted description of the issue at right. Notes can easily be added, users assigned, severity changed, and events can be snoozed or closed from here. If you click an End Time cell you will see the metrics which caused the condition to evaluate to false again, thus ending the event.

Events on the Dashboard

From the Events Log you can quickly jump to the Performance Advisor Dashboard by right-clicking an event to see what was happening on the server at the time. You will see event markers on all associated dashboard charts:

DashboardEventMarkers
(click to enlarge)

In this case, the condition references both waits and disk latency counters, so the system recognizes this and overlays event markers on the appropriate charts, along with a warning glyph which, when clicked, shows the full details for the events in range. These event markers are on by default whenever using the dashboard, but can be toggled off using the toolbar.

Sharing Conditions

To share your conditions, simply right-click any condition and select Export. A .condition file will be created which can be imported into any other SQL Sentry v8 environment. Soon you will have the ability to share your creations in a more integrated fashion via our new cloud portal, with your other SQL Sentry environments as well as the community at large.

We're Just Getting Started

With SQL Sentry v8 we've taken intelligent alerting to an entirely new level, and we've done so in a way that is simple for anyone to understand and use. We're going to continue to enhance this feature, and to release new conditions, so be sure to download them when prompted. I'm excited about the possibilities there, but even more excited to see the kinds of conditions that you are going to dream up!

Coming up I'll be covering the other major features in v8: Baselining and the SQL Sentry Cloud Portal. Stay tuned...