Friday, October 3, 2008

SQL Sentry v4.1 Released!

Well, I've thought it would be a good idea to do this for some time now (i.e., blog)... but I can honestly say that it wasn't until now, with the v4.1 release behind us, that I've felt like the time spent on a blog would be better spent (or at least as well spent) as time spent working on our software. I know, lame excuse, but true nonetheless ;)

So without further ado, I'll get on with what we've been doing for the past few months. The new v4.1 release contains 20 new features and almost 60 fixes over the previous build that's been public since early August (v4.0.0.48). The complete list can always be found here.  A huge focus for this release was streamlining and generally improving the initial setup process including the Quick Start wizard, as well as usability improvements for Performance Advisor.

We received a tremendous amount of feedback on v4.0.x builds, and many of the resulting changes have gone into v4.1. We've put out several incremental releases since the initial v4.0 release, but the v4.1 code branch was actually started before many of them because some of the changes were significant enough that we didn't want to run the risk of regressions. That said, v4.1 is undoubtedly the most thoroughly tested build we've ever released. We always test hard, but this time we set a new record for release candidates with 12!

An invaluable part of the feedback process for v4.1 involved onsite visits with many customers and evaluators both inside the U.S. and in Europe. We took the time to actually sit down with folks and go through the install and setup process with them in their own environments. Then we'd go through the various modules of the product to see which elements were causing confusion or usability problems. It's tempting to try and solicit this type of feedback exclusively via web meeting, phone, or email since it's generally "easier", but there is simply no substitute for face-to-face interaction. You see things and obtain feedback that you never would otherwise.

Setup Improvements

I'll start with one of the major setup improvements. In v4.0.x if you were trying to monitor a server with Performance Advisor for the first time and the service user didn't have the necessary rights on the target, you might have seen a cryptic ACCESS_DENIED error along with hex codes and stack traces from WMI. In v4.1 you will now see a friendly message like:

WMI access was denied. Please ensure the SQL Sentry Server user account has Windows Administrator privileges on the target server.

Seems simple enough, but inventorying the myriad error codes that can come from the various subsystems, classifying them, and creating friendly messages for each actually took significant effort.

Stack traces can be a bit scary for users -- even though we were simply relaying handled exceptions back to the user about the security (or other) problem, to the user it might look like a bug in the software itself. We obviously want to avoid that type of confusion whenever possible, and even better, help users to resolve the problem instead of having to contact our support group, or worst case bail out of the install.

Dashboard Improvements

Chart Resolution

We've also made many improvements to the Performance Advisor dashboard. For example, some users expressed confusion about how the apparent granularity of the performance charts would change when zooming out from say, a 30 minute view, to a 30 day view. In a 30 minute view we show the actual raw data, which for many metrics like SQL activity is collected every 10 seconds. Unfortunately we can't show a month's worth of 10 second samples on all of the dashboard charts -- trust me, we tried early on ;)

We gave up pretty quickly though when we realized that not only does it incur a major hit in rendering performance, it's just TMI -- the human brain simply can't effectively synthesize >250,000 points on a single simple line chart, let alone 250K times the number of series on the charts, times the number of charts on the dashboard. This would defeat the whole purpose of what we are trying to accomplish with the dashboard. Which is, by the way, to provide for the first time in history a single screen to which a DBA can look to determine in seconds where performance bottlenecks occur on a server, for any point in time. Much more on this topic in future posts…

We spent a lot of time developing the Performance Advisor background processes that continuously roll the raw data up into different "break levels", or aggregates. Depending on the size of the active date range, you may be looking at 2 minute, 10 minute, 30 minute, or 4 hour aggregates. Think about how Google Maps seamlessly changes the resolution of the map when you zoom in and out -- it's pretty much the same principle.

Back to the improvement -- we now show you the active break level, so you'll know whether you are looking detail or aggregated data when zooming:

resolution

Point and Range Selection
In the same area you'll notice fields for Sample Pos(ition), and Range Start/Range End. In the previous release, you could highlight a point or a range on one chart and the point or range would be synchronized across all other charts. This enabled you to easily correlate what was happening across multiple charts and metrics. However, we found that a big usability problem was that the default action after lifting your mouse was always to zoom in. The caused you to lose your place, so then you'd have to zoom back out and start over again.

In v4.1 we now show you exactly which point and/or range is being selected in real time, and instead of auto-zooming, we present several context menus which allow you to jump directly to one of the other Performance Advisor tabs with the selected range to see exactly which heavy SQL, blocks, or deadlocks occurred during that range. You can also jump directly to the Event Manager calendar for the selected range. When you go back to the dashboard, your original master range and range selection is still active and can be changed as often as needed, without losing your place!
 
range_selection_w_menus
 
Another nice thing we are doing is persisting the selected points/ranges while navigating around or zooming in/out.
New "History" Mode!
Well, not really.  The dashboard has always had two modes: Sample and History. Sample mode displays bar charts for a single point in time (or sample), while history mode changes over to line-type charts in order to display multiple points across a date range. The default when you open the dashboard directly from the Navigator pane (versus jumping to the dashboard from an Event Manager calendar) is sample mode showing realtime data, where the bars are continuously moving as new data comes in.

One of the amazing things we saw when visiting clients is that many didn't even realize there was a history mode! This is also a perfect (and somewhat frightening) example of something that we may have never seen otherwise. Don’t get me wrong, realtime sample mode is great -- it's very cool to see all of the colorful bars jump around -- but the real power of the dashboard is with history mode, and many users didn't even realize it was there!

The reason for this confusion invariably turned out to be the fact that we had a rather insignificant, depressible button on the toolbar for toggling between sample and history.  So unless you knew to click it, or to change the date range on the toolbar and click Go, you may not ever even see history mode… ughh!

In v4.1 we've replaced this button with a colorful toggle button which changes both the graphic and text when you click it, clearly highlighting the mode change.  We've also done the same for the neighboring Refresh and Start/Pause realtime (or auto-refresh) mode buttons.
 
new_buttons
 
Seems rather simple and obvious I know. This one actually was simple to implement -- but it does serve as a good example of how such a small thing can lead to a big usability issue… as well as how we are always listening in an effort to improve the product, in ways both large and small ;)

Block Versioning

Now, moving off of the dashboard onto the Blocking SQL tab… In v4.0 we had block detection of course, but now instead of showing each detected version of a block as a separate entry, we group multiple versions of the same block together, and allow you to flip through the different versions via a new Version column with embedded dropdown control:
  block_versions 
This can dramatically reduce the noise in many scenarios. Likewise, we now only show one instance of the block on the Event Manager calendar instead of multiple, and only send one email for all versions of the same block. (A more accurate way to put this is we only fire any actions configured for the Blocking SQL condition once for each unique block).

Actions and Settings Config

Another big improvement in v4.1 is we now enable easy configuration of all Performance Advisor-specific actions or settings from the associated Performance Advisor tabs, versus only from the SQL Server node in the Navigator pane. For example, you can access all of the settings for Top SQL, like the minimum duration, statement collection, etc., from the Top SQL tab.  If you're on Top SQL, simply hover over one of the Settings or Actions tabs on the right-hand side of the screen and it will expand to show you available settings:   

pa_actions_settings 
You can also access them by select the event source nodes in the Navigator pane... assuming you are also watching the server with Event Manager, which is required for these nodes to show up:  

nav_actions_settings 
That’s all for now. I’ll have more coming on v4.1 improvements and other great info about how to get the most out of SQL Sentry software soon...

1 comment:

  1. Nice one Greg. I read the change log from top to bottom, but your blog is much more interesting. Keep it up!

    ReplyDelete