Wednesday, October 8, 2008

More Fun with v4.1...

I mentioned in my last post how some users weren't even aware of the Performance Advisor dashboard's History mode because of issues with the toolbar buttons. What I didn't mention was that for some an even more significant "hidden" feature was Performance Advisor itself!

First some background.  Both Event Manager and Performance Advisor share the same code base, so you only need one console, service and database for both products.  There were many reasons for taking this approach, including ease of installation via a unified setup program, ease of configuration, and seamless integration possibilities between the products.  For example, we can show you all performance events (heavy SQL, blocks, & deadlocks) on the Event Manager calendar without any configuration (as long as you have a license for both products).

calendar_jump_to_pa

We also enable you to quickly jump back and forth between Event Manager and Performance Advisor via context menus and toolbar buttons.  These menus are shown in both the Event Manager calendar shot above where a block is selected, and below with the same block selected in Performance Advisor.  (I should mention that the easiest way to get from a block on the calendar to the same block in Performance Advisor is actually to double-click it, or right-click and select Open.)

blocking_jump_to_cal

Note how the two products give you a completely different yet complementary view of the same event.  Performance Advisor shows you all of the great details of all versions of the block chain, while Event Manager shows you in chronological fashion how the block was interacting with all other events around the same timeframe (heavy SQL, SQLAgent jobs, SSIS, maint plans, etc.).  You can stare at a grid for hours and not gain the same level of understanding about a performance event as you get after looking at the calendar for a few seconds.  You'll see relationships between events that you never new existed... which enables you to be exponentially more effective when troubleshooting and resolving difficult performance problems.

Hopefully by now you can see some of the great benefits of a single SQL Sentry console for both products.  However, a downside of the single console approach is that, well, there is only one console.  Let me explain.  If each product was its own Windows application, the separation between the products would be quite clear.  There would be little chance of confusion regarding how to monitor (or "watch" in SQL Sentry lingo) a server with each product, how to open the primary screen for each product, etc.

But since we leverage the same Navigator pane that we've always had in Event Manager, and it looks pretty much the same as before, some users had difficulty figuring out how to pull up Performance Advisor.  At first we were puzzled by this, since if you use the context menus it's clear that there are now two sets of Watch and Open child menu items on computer (aka Device) and SQL Server nodes, and you are forced to pick one or the other:

open_context_menus

Ah, but there was yet another way to open the Event Manager calendar in previous versions -- via double-click of the left mouse button.  As it turns out, many existing users had become accustomed to using this approach since it's faster/easier.  Therein lied the "problem".  Although in v4.0 we had added a User Preference to control the "default view" (or product) when left-clicking a node, we had set this to "Event Manager" for upgrades to avoid changing behavior for existing users.  So if you were a left-clicker, unless you went into the User Prefs and adjusted this, you'd continue to get the Event Manager calendar and never see Performance Advisor!  Right.  2+ years of effort down the drain :O

After many hours of careful deliberation, to address this problem we decided to introduce a new feature in v4.1 called the "Product Selector" ;)  Now what you will get by default when you double-click a computer or SQL Server in the Navigator is this screen:

product_selector

The screen actually serves a few purposes:  1) to let you know you have multiple products available, 2) to allow you to continue to double-click and let you choose which product to open, and 3) to allow you to adjust the new "default view" User Pref via the "Default" column checkbox.

NOTE: All Event Manager users who upgrade to v4.x get a fully functional 30-day, 5-server trial for Performance Advisor for your efforts, so #1 applies to you as well.

#3 only comes into play if you deselect the "Always prompt me" checkbox, after which point you will no longer see the Product Selector... we'll just open the product you checked as the default.

We've already found that this in itself is causing some confusion though!  Some users expected the "Default" checkbox to control which product is pre-selected the next time you see the form, which it does not.  To avoid this, in v4.2 we've simplified the form slightly; we've eliminated the Default column and just use the currently selected product as the default if you ever deselect the "Always prompt me" option.  In addition, the last product selected will always be pre-selected the next time you open the form.

It's truly amazing to me that I was able to fill up an entire blog post with a relatively insignificant topic... although it is indeed typical of types of issues I spend much of my days on.  Hopefully someone out there will find it helpful ;)  Please bear with, this whole blogging thing is still new to me.  I promise to extricate myself from the day-to-day details of UI design & usability issues soon and delve more into how to use SQL Sentry to solve real world performance issues.  Until next time...

Friday, October 3, 2008

SQL Sentry v4.1 Released!

Well, I've thought it would be a good idea to do this for some time now (i.e., blog)... but I can honestly say that it wasn't until now, with the v4.1 release behind us, that I've felt like the time spent on a blog would be better spent (or at least as well spent) as time spent working on our software. I know, lame excuse, but true nonetheless ;)

So without further ado, I'll get on with what we've been doing for the past few months. The new v4.1 release contains 20 new features and almost 60 fixes over the previous build that's been public since early August (v4.0.0.48). The complete list can always be found here.  A huge focus for this release was streamlining and generally improving the initial setup process including the Quick Start wizard, as well as usability improvements for Performance Advisor.

We received a tremendous amount of feedback on v4.0.x builds, and many of the resulting changes have gone into v4.1. We've put out several incremental releases since the initial v4.0 release, but the v4.1 code branch was actually started before many of them because some of the changes were significant enough that we didn't want to run the risk of regressions. That said, v4.1 is undoubtedly the most thoroughly tested build we've ever released. We always test hard, but this time we set a new record for release candidates with 12!

An invaluable part of the feedback process for v4.1 involved onsite visits with many customers and evaluators both inside the U.S. and in Europe. We took the time to actually sit down with folks and go through the install and setup process with them in their own environments. Then we'd go through the various modules of the product to see which elements were causing confusion or usability problems. It's tempting to try and solicit this type of feedback exclusively via web meeting, phone, or email since it's generally "easier", but there is simply no substitute for face-to-face interaction. You see things and obtain feedback that you never would otherwise.

Setup Improvements

I'll start with one of the major setup improvements. In v4.0.x if you were trying to monitor a server with Performance Advisor for the first time and the service user didn't have the necessary rights on the target, you might have seen a cryptic ACCESS_DENIED error along with hex codes and stack traces from WMI. In v4.1 you will now see a friendly message like:

WMI access was denied. Please ensure the SQL Sentry Server user account has Windows Administrator privileges on the target server.

Seems simple enough, but inventorying the myriad error codes that can come from the various subsystems, classifying them, and creating friendly messages for each actually took significant effort.

Stack traces can be a bit scary for users -- even though we were simply relaying handled exceptions back to the user about the security (or other) problem, to the user it might look like a bug in the software itself. We obviously want to avoid that type of confusion whenever possible, and even better, help users to resolve the problem instead of having to contact our support group, or worst case bail out of the install.

Dashboard Improvements

Chart Resolution

We've also made many improvements to the Performance Advisor dashboard. For example, some users expressed confusion about how the apparent granularity of the performance charts would change when zooming out from say, a 30 minute view, to a 30 day view. In a 30 minute view we show the actual raw data, which for many metrics like SQL activity is collected every 10 seconds. Unfortunately we can't show a month's worth of 10 second samples on all of the dashboard charts -- trust me, we tried early on ;)

We gave up pretty quickly though when we realized that not only does it incur a major hit in rendering performance, it's just TMI -- the human brain simply can't effectively synthesize >250,000 points on a single simple line chart, let alone 250K times the number of series on the charts, times the number of charts on the dashboard. This would defeat the whole purpose of what we are trying to accomplish with the dashboard. Which is, by the way, to provide for the first time in history a single screen to which a DBA can look to determine in seconds where performance bottlenecks occur on a server, for any point in time. Much more on this topic in future posts…

We spent a lot of time developing the Performance Advisor background processes that continuously roll the raw data up into different "break levels", or aggregates. Depending on the size of the active date range, you may be looking at 2 minute, 10 minute, 30 minute, or 4 hour aggregates. Think about how Google Maps seamlessly changes the resolution of the map when you zoom in and out -- it's pretty much the same principle.

Back to the improvement -- we now show you the active break level, so you'll know whether you are looking detail or aggregated data when zooming:

resolution

Point and Range Selection
In the same area you'll notice fields for Sample Pos(ition), and Range Start/Range End. In the previous release, you could highlight a point or a range on one chart and the point or range would be synchronized across all other charts. This enabled you to easily correlate what was happening across multiple charts and metrics. However, we found that a big usability problem was that the default action after lifting your mouse was always to zoom in. The caused you to lose your place, so then you'd have to zoom back out and start over again.

In v4.1 we now show you exactly which point and/or range is being selected in real time, and instead of auto-zooming, we present several context menus which allow you to jump directly to one of the other Performance Advisor tabs with the selected range to see exactly which heavy SQL, blocks, or deadlocks occurred during that range. You can also jump directly to the Event Manager calendar for the selected range. When you go back to the dashboard, your original master range and range selection is still active and can be changed as often as needed, without losing your place!
 
range_selection_w_menus
 
Another nice thing we are doing is persisting the selected points/ranges while navigating around or zooming in/out.
New "History" Mode!
Well, not really.  The dashboard has always had two modes: Sample and History. Sample mode displays bar charts for a single point in time (or sample), while history mode changes over to line-type charts in order to display multiple points across a date range. The default when you open the dashboard directly from the Navigator pane (versus jumping to the dashboard from an Event Manager calendar) is sample mode showing realtime data, where the bars are continuously moving as new data comes in.

One of the amazing things we saw when visiting clients is that many didn't even realize there was a history mode! This is also a perfect (and somewhat frightening) example of something that we may have never seen otherwise. Don’t get me wrong, realtime sample mode is great -- it's very cool to see all of the colorful bars jump around -- but the real power of the dashboard is with history mode, and many users didn't even realize it was there!

The reason for this confusion invariably turned out to be the fact that we had a rather insignificant, depressible button on the toolbar for toggling between sample and history.  So unless you knew to click it, or to change the date range on the toolbar and click Go, you may not ever even see history mode… ughh!

In v4.1 we've replaced this button with a colorful toggle button which changes both the graphic and text when you click it, clearly highlighting the mode change.  We've also done the same for the neighboring Refresh and Start/Pause realtime (or auto-refresh) mode buttons.
 
new_buttons
 
Seems rather simple and obvious I know. This one actually was simple to implement -- but it does serve as a good example of how such a small thing can lead to a big usability issue… as well as how we are always listening in an effort to improve the product, in ways both large and small ;)

Block Versioning

Now, moving off of the dashboard onto the Blocking SQL tab… In v4.0 we had block detection of course, but now instead of showing each detected version of a block as a separate entry, we group multiple versions of the same block together, and allow you to flip through the different versions via a new Version column with embedded dropdown control:
  block_versions 
This can dramatically reduce the noise in many scenarios. Likewise, we now only show one instance of the block on the Event Manager calendar instead of multiple, and only send one email for all versions of the same block. (A more accurate way to put this is we only fire any actions configured for the Blocking SQL condition once for each unique block).

Actions and Settings Config

Another big improvement in v4.1 is we now enable easy configuration of all Performance Advisor-specific actions or settings from the associated Performance Advisor tabs, versus only from the SQL Server node in the Navigator pane. For example, you can access all of the settings for Top SQL, like the minimum duration, statement collection, etc., from the Top SQL tab.  If you're on Top SQL, simply hover over one of the Settings or Actions tabs on the right-hand side of the screen and it will expand to show you available settings:   

pa_actions_settings 
You can also access them by select the event source nodes in the Navigator pane... assuming you are also watching the server with Event Manager, which is required for these nodes to show up:  

nav_actions_settings 
That’s all for now. I’ll have more coming on v4.1 improvements and other great info about how to get the most out of SQL Sentry software soon...