Thursday, May 3, 2012

SQL Sentry v7 goes Gold!

After what has been a relatively smooth beta, we're going gold with v7 today! Read more about v7 features in my last post. I've also got some exciting news regarding new fragmentation-related features included in the base software, and how to get FREE licenses of both Performance Advisor and Fragmentation Manager.

New Base Software Features

Originally Fragmentation Manager was designed to be either ON or OFF, there was no in between. When it was OFF for a server, no table and index size or fragmentation-related data whatsoever was made available inside either Performance Advisor or Event Manager.

We've changed this, in a big way. A significant portion of the new features previously found only in Fragmentation Manager are now ON by default, whether or not you've purchased a license for it. Here's what you now get out of the box with v7, for both new installs and upgrades:

  • New "Indexes" tab in Performance Advisor, with table/index size and other details
  • Scheduled and manual fragmentation analysis with adjustable scan mode, from the SQL Server instance down to partition level
  • Table/index size and other details on the Performance Advisor Disk Space tab, organized by data file
  • Calendar display of all historical and future fragmentation analysis events, with drag-and-drop support
  • New "Databases" node for each SQL Server instance in the Navigator pane, listing all table and indexes
  • Partitioning sliding window support for fragmentation analysis (defrag only max partition, or all others)

These are some significant capabilities which should add value for most any environment.

If you purchase and enable Fragmentation Manager, here are the additional features that become available:

  • Automated Defragmentation
    • Scheduled and manual rebuilds & reorgs, from the SQL Server instance down to partition level
    • Support for multiple concurrent defrag operations, which can dramatically reduce overall defrag time
    • Adjustable rebuild/reorg thresholds, scan mode and many other options 
    • Post-defrag analysis capability, so you can instantly see gains from defrag
  • Calendar display of all historical and future defrag events, with drag-and-drop support
  • Alerting for defrag success, failure, and completion
  • Partitioning sliding window support for defrag (defrag only max partition, or all others)
  • Historical trending via multiple charts on the Indexes tab:
    • Total server fragmentation, by fragmented % range
    • Total server disk space used, and wasted by fragmentation
    • Total server buffer space used, and wasted by fragmentation
    • Index fragmentation
    • Index disk/buffer space used/wasted
    • Index activity

In a nutshell, the base v7 software provides dramatically increased visibility into table/index size and fragmentation information, and Fragmentation Manager gets you automated defragmentation and many associated options, as well as historical trending.

FREE Licenses of Performance Advisor and Fragmentation Manager!

If the above included features weren't enough, we've introduced 2 new ways for you to get FREE software licenses:

  1. If you own five (5) licenses of Performance Advisor and five (5) licenses of Event Manager, you will automatically get:
    • One (1) Performance Advisor license, for monitoring the SQL Server instance where your SQLSentry database is located ($1495 value)
    • One (1) Fragmentation Manager license, for defragmenting the SQL Server instance where your SQLSentry database is located ($795 value)
  2. Regardless of the number of licenses you own, if you monitor the SQL Server instance where your SQLSentry database is located with a paid license of Performance Advisor, you will automatically get:
    • One (1) Fragmentation Manager license, for defragmenting this instance only ($795 value)

Some important points:

  • These free licenses won't show up in your license counts, they are just "there" ;-)
  • If you meet the criteria for #1 above and are already monitoring the SQL Server instance holding your SQLSentry database with a paid Performance Advisor license, then you're effectively getting a 6th PA license for free, which you can use to monitor any server.
  • There are no ASM (annual software maintenance) costs for these free licenses, and they are perpetual.

Thanks again to all who participated in the beta, and we hope you like what you see in SQL Sentry v7!

Download SQL Sentry v7 here: New Users | Existing Users

Tuesday, April 10, 2012

SQL Sentry v7 Beta: First Look

It's been a while since my last post. Yes, we're still here (as you well know if you follow us on Facebook or Twitter), we've just been heads down since the PASS Summit getting v7 ready to ship. It's been a long road, but we're releasing the public beta today!

v7 represents the culmination of almost a year of effort, and ideas going back much, much further than that. We've completely redone several aspects of the software such as alerting (condition and action) configuration, and we've added some awesome new features like automated defrag, computer groups, and CMS support to boot. Did I mention SQL Server 2012? ;-)

Terminology Changes

We've made some long overdue changes to the SQL Sentry lexicon in the interest of making things clearer, and since I'll be using these new terms I wanted to get this out of the way first:

  • A Device is now a Computer (pretty sure I just heard a collective HOORAY! – trust me, we had our reasons for devices, but we won't get into that here ;-)
  • The former Global node is now the Shared Groups node
  • The SQL Sentry Console is now the SQL Sentry Client
  • The SQL Sentry Server Service is now the SQL Sentry Monitoring Service

Computer Groups

The first thing existing users will notice when they open the client is the new Shared Groups node at the very top of the Navigator. This node represents your entire SQL Sentry environment organized by Site. It is called "shared" because every SQL Sentry user sees exactly the same view here. The user-specific device registrations and groups (formerly Global) has been moved and renamed to Local Groups to better reflect what they actually are. You can still configure server-specific settings and below here, but not global settings – those are now set at the Shared Groups root node only.

Sites have always been there to enable logical partitioning of servers and monitoring services. For example, if your HQ is in Atlanta, but you have 100 SQL Servers in Miami and 200 SQL Servers in New York, you might install one monitoring service in Miami, and two in New York. You would create a site for each location, and place the monitoring services in the appropriate site so that they only monitor the servers in their location.

In v7, you can now easily apply special alerting rules to the servers in Miami and New York, versus having to touch each server in order to override global alerting settings:

ComputerGroups

In addition, you can create unlimited nested child groups in each site, and – you guessed it – apply specialized rules to those groups as well. The inheritance works exactly as it always has in SQL Sentry, you start at the highest level (Shared Groups), then override those global settings as needed at lower levels. Previously alerting & setting configuration looked like this:

  • Global
    • Computer
      • SQL Server
        • Object (job, report, etc.)

Now it looks like this:

  • Shared Groups
    • Site
      • [Child Group] [,...n] 
        • Computer
          • SQL Server
            • Object

As you can see, the ability to group servers can dramatically reduce the alerting configuration required for many environments.

Custom Object Groups

Being able to click on a group node in the Navigator and easily change settings for a bunch of servers at once is great, but it's inherently limited by the fact that a computer node can only exist in one group at a time in the navigator. What if you want to have another set of rules for servers that effectively "cuts across" navigator groups? For example, "All QA Servers" in both Miami and New York?

This is easy to do with custom groups. You simply create a new group by double-clicking the Object Groups node in the navigator, add the QA servers to it, then adjust the settings:

ObjectGroups

Similarly, if you wanted to disable Runtime Threshold alerts for all transaction log backup jobs, you can easily search on the jobs using a name pattern, use Shift + left click to highlight and add several at once, add the Runtime Max condition, then select "Disable".

Automated Defragmentation

Your first thought here may be, "I already have scripts that perform automated defragmentation, why do I need a tool?" Good question! Here are three compelling reasons:

  • Manageability
  • Visibility
  • Defrag Speed
Manageability

There are several great scripts out there that many use to perform automated defrag. They can get the job done, but the main issue is that they are all, well, scripts. Configuring exactly which databases and indexes are defragmented and when can be a challenging and time-consuming task, especially if you are talking about 10s or 100s of SQL Servers. Manual script changes and multiple jobs on each server are typically required.

With our new Fragmentation Manager module, just like everything else in SQL Sentry, you can start at the top and work your way down. For example, if you have 20 SQL Servers, you can set a default global defrag schedule of 2am for all servers at once by enabling it at the Shared Groups level:

GlobalDefrag

So in 30 seconds or less, you've configured enterprise-wide defrag!

DISCLAIMER: I am NOT recommending that you do this, as every environment is different, and you'd of course want to disable any existing defrag jobs first. I'm just letting you know what is possible ;-)

Typically you'll want to enable Fragmentation Manager at the SQL Server instance level by right-clicking the instance in the navigator pane, or clicking the Enable button on the new Indexes tab inside Performance Advisor.

Once you've enabled one or more defrag schedules, if you view the "Defragmentation Schedule" sample event view, or the calendars for any of those servers, you'll see defrag instances show up alongside other events:

DefragSchedule

You can of course drag-and-drop to move them. But what if you have a 100GB index on one of the servers that really needs to be analyzed and defragged separately? You simply select the index and override the inherited schedule:

IndexSchedule

It's that easy. Everything is point-and-click, and since the SQL Sentry monitoring service manages all of the defrag tasks, there are no scripts or jobs required.

Visibility

Once you've enabled the Fragmentation Management Module on a SQL Server, you'll see a new Fragmentation tab appear inside Performance Advisor:

FragTab

This tab has tons of good information about your indexes, including 6 charts showing disk and buffer space, used and wasted, both total and at the index level. The purpose of this tab is not only let you know the state of fragmentation on a server, but help you make good decisions about how and when to defrag your indexes, adjust fill factors, or even change index definitions. One of the coolest charts on this tab is Index Space Usage (center bottom) – it shows you exactly how much of an index is on disk and in buffer over time, and how much disk and buffer space is wasted due to non-full pages.

There are also 3 new alerting conditions: Defrag Started, Completed, and Failure, so you can be as informed as you want to be regarding the status of your SQL Sentry defrag operations.

Speed

No, we haven't invented some magical new higher performance technology for analyzing and defragging your indexes... however, we have come up with a unique approach for potentially dramatically speeding up your regular defrag process, thereby reducing the maintenance window required for defrag – by allowing more than one concurrent defrag operation:

MaxOps

If your disk system can handle it, why not run multiple analysis or defrag tasks in tandem? Most systems we've tested have no problem running 2 or 3 concurrent defrag ops, especially when indexes are split across multiple data files and disks. An op can be an analysis, reorg, or rebuild. Currently this setting is capped at 5 for safety. I recommend starting with 2 concurrent ops on a test server, and see how it performs. With the Performance Advisor dashboard and Disk Activity views, you can easily assess the performance impact of increasing the concurrent defrag ops.

Alerting Enhancements

In addition to group-based alerting configuration, many other major improvements have been made in the area of alerting:

  • You can now configure multiple actions of the same type for the same condition! For example, you can have 3 different Send Email actions for the Job Failure condition, each with different alert targets (users or groups), different rulesets, and different alert windows.
  • What's this, "windows"? Yes, that's right, you can now set exactly when contacts should be alerted using configurable ranges of time, for example "Business Hours" or "Weekends". You can even create compound windows which combine multiple windows together.
  • We no longer list all conditions by default, only those that are in effect. This can dramatically reduce the noise when viewing and configuring alerts.
  • Inherited conditions/actions are displayed in one pane, and conditions/actions set at the current level are in another (Explicit).
  • Since there can now be multiple levels of inheritance with groups, we show you exactly where the inherited settings are coming from via the Object column.
  • You can choose to Disable, Override, or Combine with an inherited condition action. Combine works just as it sounds – you can set the same action again at the current level, but leave the inherited action in effect.

Together, I think you'll find that these changes make for the most flexible and robust alerting system we've ever had.

Performance Advisor Dashboard Enhancements

Aside from various cosmetic improvements, the two primary new features on the dashboard are NUMA support and mirroring queue monitoring. When monitoring a NUMA system, you'll notice that both the Windows and SQL Server memory charts are now split to show exactly how much memory is allocated to and used by each NUMA node. In addition, page life expectancy history is also shown for each node. When monitoring a system acting as a primary, mirror, or both, the Send and/or Redo Queues are shown on the same chart previously used to show backup/restore activity.

Beta Download

For a full list of all changes in v7 click here. I've really only scratched the surface. Please take the beta for a spin, and let us know what you think – we want your feedback!

New Users
Existing Users

As always, upgrading your existing SQL Sentry environment to the beta, and from the beta to v7 RTM is fully supported. Be sure to take a backup of your current SQL Sentry database first. Rolling back for any reason is easy – uninstall the beta, restore the database backup, then reinstall the previous version and point it to the database. No settings will be lost.

Friday, October 7, 2011

SQL Sentry v6.2: SharePoint Timer Jobs, Oh My!

We recently released SQL Sentry v6.2, and it introduces an exciting new product: Event Manager for SharePoint. v6.2 contains several other new features as well, including SSAS usage metrics, performance dashboard event overlay, enhanced console security, and a streamlined setup process. SharePoint support and SSAS usage in particular have been heavily requested, so this release should make quite a few customers very happy!

SharePoint Timer Job Monitoring

If you already use SharePoint, you may be familiar with the concept of timer jobs. SharePoint 2010 has its own scheduler service that carries out various background tasks on an ongoing basis – over 100 timer jobs out of the box, ranging from health checks to data imports to history cleanup. These jobs can run on any server in the farm, and they can access resources on other servers. Most of the jobs are set to run on predefined recurring schedules, and the net result is a significant amount of timer job activity going on pretty much all day and night. Since much of this activity is touching SQL Server, it's critical for the database administrator to have visibility into exactly when the jobs are running, and to be able to ascertain the impact they are having on performance.

Using the Event Manager for SharePoint calendar, here's a shot of a typical SharePoint timer job schedule over two days:

sharepoint_cal_2day_hist

The instances in orange indicate a runtime overlap of at least 20 seconds, so it's apparent that many timer jobs run at the same time – that's right, the dreaded schedule collisions! One of the primary reasons for the existence of the Event Manager calendar is of course to be able to detect and resolve schedule collisions, so even if you've already cleaned up your SQL Agent schedules, you'll be able to put it to good use again for timer jobs ;-)

Here's a shot of the 12am slot with timer jobs only:

sharepoint_cal_tmd_hilite_sm

We have no less than 20 timer jobs running concurrently at midnight every night. The calendar view can be easily expanded to include other events to see if these jobs are colliding with SQL Agent jobs, for example.

Here's a closer view of the 4pm slot, with SQL events from the timer jobs included. You can distinguish the timer jobs from the SQL events at a glance by the small glyphs in the upper left of each instance:

sharepoint_collision_raw

I know, this is pretty ugly, so I'll apply a quick filter to show only events with duration >3 seconds:

sharepoint_collision_3sec

Again, these are all out of the box jobs and schedules! In multi-server farms, not all jobs may be running on the same server, so the server on which a job ran is shown in its tooltip along with many other details.

Timer job schedules can work quite differently than SQL Agent, Windows Scheduler, or other schedulers you may have seen. They can use the concept of "ranges" – you set possible start and end times, and the job can run any time during that range. Supposedly the randomness will help avoid job collisions... but note the "SharePoint BI Maintenance" job in the middle of this shot with odd start time of "12:00:40 PM":

sharepoint_collision

Yep, you guessed it, it's using a range! If I right-click the job and select Properties, I'm launched into the job schedule in the SharePoint Central Administration web interface:

image

We can see that this job is set to run hourly, at any time during the hour. So the randomness didn't help much here – it ended up running concurrently with 4 other jobs. Personally, I like to control exactly when my jobs run and not to leave that up to chance, so I changed both range values to 30, which effectively tells the job to run only at 30 minutes past the hour. (NOTE: You need to be setup as a farm administrator to access the schedule properties.)

Can timer jobs impact SQL Server performance? You'd better believe it! I'll demonstrate using another new feature, Dashboard Event Overlay (more on this below). By right-clicking a longer running timer job and selecting Jump To > Performance Advisor > Dashboard...

sharepoint_jumpto_dashboard

...I'm taken right into the Performance Advisor dashboard where the event duration is overlaid on the X-axis of each chart:

sharepoint_dashboard

We can see that there is a strong correlation with this job and several metrics, including:

  • Network inbound
  • CPU
  • Windows hard faults
  • Disk latency
  • Transactions and batches
  • Bookmark lookups
  • Disk waits
  • Log flushes

Since this particular job runs every 30 minutes, the real question becomes, how will it impact user activity against SharePoint databases on the same server throughout the day? More on this in a future post.

SSAS Usage Totals

The next major new feature is the SSAS Usage Totals tab, which is part of Performance Advisor for SSAS:

ssas_usage_totals

Previously, SSAS usage information was only available at the query level, and you were unable to see the actual attribute members requested. This made it difficult to get a server-wide perspective on data access patterns. You can now see this server-level data aggregated three ways – by attribute, aggregation, or partition – over any date range. If you spot a particular attribute combination that is heavily requested by many queries, you can right-click the row and copy the attribute vector to the clipboard, then paste it into Agg Manager to create the aggregation. (Look for this capability to be fully integrated in a future release.)

Enhanced Console Security

A common need, especially in larger, more segmented environments, is the ability to restrict visibility and access to certain servers to specific users. Previously, in the SQL Sentry Console you could use groups to effectively hide servers from unauthorized users, and we've always leveraged native security to ensure that there are no escalation of privileges risks – just because someone could "see" a server didn't mean they could do anything with it. Although this was fine for many environments, in others, unauthorized users aren't even allowed to know that certain servers exist! This was a problem.

Hence the need for this new capability, which allows a SQL Sentry administrator to truly restrict SQL Sentry users to specific SQL Servers via their Windows login:

security_props

Once associated, you simply select the servers that the user can access:

security_rights

For existing environments, nothing changes automatically when upgrading to v6.2. However, once a user has been associated with at least one SQL Server, from that point forward they can no longer access any other SQL Servers monitored by SQL Sentry in any way. This goes for any new servers that may be added later – the user must be explicitly associated with those servers before they are accessible via the console.

Dashboard Event Overlay

A powerful new feature is the ability to overlay events across the Performance Advisor dashboard charts (see screenshots above). Previously, you might know that a SQL statement ran between X time and Y time, and you suspected that it was impacting performance, but when you flipped over to the dashboard there were no visual cues as to exactly when the event ran – you had to perform the correlation in your head. Now, if you use the Jump to Dashboard context menu from either the Event Manager calendar or the Top SQL tab, you'll notice new horizontal indicators on the X-axes representing the selected event(s). When coming from the calendar, any event highlighted using the new enhanced highlighting options will have a corresponding indicator on the dashboard. This means that you can easily see what impact a particular user, application, or even SPID, is having on performance!

Streamlined Setup

Last but not least, we took a hard look at the installation process and initial setup wizard. Several screens were eliminated and/or consolidated, and the process was simplified overall. In addition, we've added a new Start Page which gives you quick access to some of the more commonly used functions, every time you launch the console:

start_page

We'll be adding more to this page as time goes on.

Take v6.2 for a spin and let us know what you think: new users | current users. I'm at the PASS Summit this week, so stop by for an in person demonstration, as well as a sneak peak at some exciting new features coming in v6.5!

Wednesday, May 18, 2011

The 2013 PASS Summit in Charlotte – Good Call!

We woke up today to the great news that the Summit is coming to Charlotte in 2013! This is something that we'd hoped would happen eventually, but with all of the back and forth on this topic over the past couple of years, we honestly weren't holding our collective breath. Although Charlotte appeared to be on pretty much everyone's Top 10 list of possible future Summit locations, for good reason, there just didn't seem to be any kind of critical mass. There is a huge group that would be happy if the Summit never left Seattle... and, like us, pretty much everyone else wants the Summit in their city. ;-)

As a SQL Server ISV based in the Charlotte area, we've advocated bringing the Summit to Charlotte at times, whenever folks would listen... but we aren't the loudest bunch, and we're also not ones to get too wrapped up in the political side of things. We keep a fairly low profile, we'll state our case and leave it at that... we're just not going to try too hard to sway anyone in any particular direction, for better or worse.

That said, our partner liaison, Peter Shire (b|t), has worked tirelessly for several years on the Charlotte PASS Chapter to foster a vibrant SQL Server community here, with the help of our local Microsoft offices. (The #2 SQL Server support operation behind Dallas is located here in Charlotte.) In addition, last year volunteers from SQL Sentry and Microsoft helped to pull off a highly successful SQLSaturday event, which I'd like to think helped to put us on the map. It did appear to leave many with a positive impression of our city and its ability to support something like a Summit.

Ultimately, the folks on the PASS Board performed the necessary diligence to find the best possible location, and looking at the criteria board members like Tom LaRock used, they made the right call. Every city has its pros and cons, and although Charlotte may not be the strongest in all areas, overall it's just a tough city to beat – I'd call it "well-rounded." I've lived here since the early '80s, and I've spent time in most major cities around the U.S. and Europe, and without a doubt, Charlotte holds its own against any of them.

As many will find out, Charlotte is not only a great city, it's truly an optimal location to hold a PASS Summit. Will there be bumps in the road? Of course – it's not going to be perfect. There will be gripes about various things, there always are. But on balance I think it's the right move. Bring the Summit to a well-rounded East coast location with a strong Microsoft presence, and expose many database professionals that won't normally make it to Seattle to this great educational and community event.

For me, 2013 can't come soon enough!

Tuesday, April 26, 2011

Don't Fear the Trace

Not a week goes by that I don't run across either a comparison of server-side (file) and Profiler (rowset) trace performance, with rowset ending up with the short end of the stick; and/or someone who's afraid (or not allowed) to run traces against a production server because it will undoubtedly kill SQL Server performance. The amount of misinformation out there on this topic is mind-boggling – IMO it's one of the most misrepresented aspects of SQL Server. Unfortunately, it seems many have taken the misinformation at face value, and have perpetuated it without performing sufficient diligence.

There are a some key points that always should be, but rarely seem to be, considered when discussing the performance impact of tracing:

  • Disk speed matters for server-side traces. File traces are "lossless", so can and will block SQL Server activity if the disk where the trace file is being written is not keeping up. A nice overview by Adam Machanic (b|t) is here: http://technet.microsoft.com/en-us/library/cc293610.aspx. This means that if you are writing too many events to a busy or otherwise slow disk, it can bring SQL Server to a screeching halt. This is easy to repro – try writing a heavy file trace to a USB thumb drive. This may sound extreme, but it's probably not far off many SQL Server disk systems I see where busy data and log files are mixed on the same set of spindles, causing severe contention.
  • Rowset traces will drop events to avoid stopping SQL Server activity. The longest a rowset trace will wait before dropping events is 20 seconds. This is less than the commonly used application query timeout of 30 seconds. This can be a bit harder to repro with Profiler, but we've done it in our lab by creating a .NET-based rowset trace consumer with artificial delays in code to slow trace event consumption. In this sense, rowset traces can be considered "safer" than file traces, since although they can slow SQL Server, they won't stop it completely.
  • Filters matter, for both trace types. Although the above points are important to understand, they only come into play in extreme scenarios that most of us should never see. You should almost never need to run an unfiltered trace, or use "Starting" or other events that aren't filterable. For example, if you use only RPC:Completed and SQL:BatchCompleted events and restrict the rows returned using selective filter thresholds against integer-based columns like Duration, CPU, etc., and avoid text filters, especially with wildcards, you can minimize the impact to performance of either trace type.
  • Profiler <> Rowset. Profiler uses a rowset trace, and so frequently when Profiler is knocked, rowset is condemned by extension. They are not one and the same. SQL Sentry Performance Advisor (and some other tools) also use a rowset trace. We stream the trace binary data back and decode it, as described at the bottom of this page: http://technet.microsoft.com/en-us/library/cc293613.aspx. While Profiler has a relatively heavy interface for working with trace data, our trace server, a Windows service, does not. It uses a highly optimized multi-threaded scheme for consuming trace rows as quickly as possible, and efficiently storing the trace data for later retrieval. So although the network overhead may be roughly equivalent for the same trace run by Profiler (remotely) or our trace server, that's where any similarity ends. If the trace uses smart filtering, the overhead should be a non-issue anyway – by default we use a conservative filter of Duration>=5000ms, with a floor of 100ms for safety if no other CPU or I/O filters have been applied.

A Simple Test

We recently conducted some cursory tests, primarily to confirm the blocking/non-blocking behavior described above. We simulated high activity via a lightweight T-SQL statement in a 1 million cycle loop, and used a trace with only "Completed" events and no filters. The goal was to generate 1 million trace events as quickly as possible, and assess the impact consuming those events by each trace type had on the total load run time. I want to emphasize that this is an isolated test scenario, and the results should not be over-generalized. We plan on conducting much more comprehensive testing in the near future. Nonetheless, the results are interesting:

  • Both server-side and rowset traces always caused the load to run slower than when there were no traces.
  • A server-side trace writing to slow local disk caused the load to run much slower than a rowset trace consumed locally.
  • A server-side trace writing to a fast local disk caused the load to run slightly faster than a rowset trace being consumed remotely over a 1000Mbps network.
  • After factoring in the additional overhead required to copy the trace file over the network and load it into Profiler, the overall time for the server-side trace from the last test was greater.

It seems that with remote tracing, you can pay me now with rowset, or pay me later with server-side. Although the results were repeatable, I'm not publishing test details at this point because these tests were not strictly controlled. The point here is that trace impact on performance is always an "it depends" scenario. Unfortunately it rarely seems to be presented as such – blanket statements such as "Profiler is dangerous" and "server-side traces are always faster" are commonplace. Sure, consumer type and location, network speed, etc., can certainly affect the performance impact of rowset traces, but so can writing a server-side trace file to the same spindles as a busy transaction log... and the impact can potentially be much more detrimental to your SQL Server.

The Source

Many thousands of SQL Servers are monitored 24/7 by our rowset-based trace server and we haven't had a single report where its impact on the target was even noticeable, so we're always puzzled by the anti-rowset sentiments. I've tried to assess how the confusion and fear started, and much of it seems to have a common source: the "Standard" Profiler trace:

std_trace_props

It's been with us forever, and I'm not sure who created it as the default trace... but best case it is of little to no value, and worst case it is dangerous to your SQL Server. Presenting this trace as the default to a SQL Server newcomer is analogous to putting a loaded weapon in front of a toddler. There are no filters applied, and it includes Audit Login and SQL:BatchStarting events, neither of which are filterable by any performance metrics (Duration, CPU, Reads, Writes), meaning you will be capturing all of them. It also includes Audit Logout, which is effectively not filterable by Reads or Writes when connection pooling is in use, since those metrics are cumulative. On a busy OLTP system this trace is bad, very bad!

std_trace_filters

So unless you've changed the default trace (easy to do via Edit Template), every time you open Profiler you've got a very high overhead trace as the starting point. As a result, this trace is referenced in books, trace performance benchmarks have been conducted and published using it, and countless users have naively run it as-is against busy servers. I'm convinced that this is a big reason that Profiler is so frequently kicked around, and why so many are now afraid to use it at all.

Bottom line: Don't use this trace! Certainly don't use it alone for benchmarking the relative performance impact of the two trace types.

A true comparative test should consider:

  • Events and columns used
  • Trace consumer location (local or remote)
  • Disk location (local or remote)
  • Disk speed and type (shared or dedicated)
  • Network speed
  • Load type (T-SQL text size, DML mix, etc.)
  • Transactions/Batches per second on the target
  • CPU and memory resources available
  • Filter types
  • Filter thresholds
  • ...

As I mentioned, we have more comprehensive testing coming which covers these factors, and we'll publish all of the details, with hopes of clearing up a lot of confusion and fear around tracing once and for all.

In the meantime, don't fear the trace. If you target the specific events and columns you need, and use conservative integer-based filters to restrict the rows returned, the benefits gained through intelligent analysis of the data should far outweigh any performance impact on your SQL Servers.

Friday, April 8, 2011

SQL Sentry v6.1: Something Old, Several Things New

Since releasing v6 a couple of months ago, we've been hard at work on some cool new features for v6.1. Our x.1 releases are usually comprised mostly of fixes for bugs not caught during the major release (x.0) beta, and a few new features. This x.1 release is different in that contains many exciting and high impact new features, and a higher feature:fix ratio than normal. This is likely due to the lengthy v6 beta, as well as some new QA processes we've put in place. Change list is here.

We've just published the v6.1 Release Candidate to the website (new users|current users), and our team is showing it off now at SQLBits 8 in Brighton, UK. Read on for an overview...

SNMP Support

I'm covering this one first, as it's been on the feature list the longest. Believe it or not, we obtained our SNMP "enterprise number" and started on our MIBs in 2004, not long after the original release of Event Manager v1! Our enterprise number is 20707, and the latest enterprise number issued as of this post was... gulp... 37722. (I think I'll filter this feature from our "average turnaround time" calculation ;-) We didn't actually start coding SNMP support until recently though – for a few reasons it kept getting prioritized down: high complexity, relatively low demand, and the fact that users have been able to work around it and get alerts into other monitoring systems via our "execute" actions (Execute Process and Execute SQL). SNMP is not a feature that everyone uses, but those that do really depend on it.

Fortunately, earlier this year, the most knowledgeable and capable SNMP developer on the planet, Eric Eicke (b|t), joined our development team. Eric developed the first .NET-based SNMP component several years ago, and it's used by many large organizations, including Microsoft and HP. Eric was also the one that helped us get started on SNMP back in 2004. There was no fumbling around with RFC's, Eric knew exactly what needed to be done and did it, including v3 support, authentication, and encryption!

MIBs are installed as part of the new setup. Once you've loaded them into your enterprise monitoring system, activating SNMP in SQL Sentry is straightforward from the new SNMP Config tab:

snmp_settings

Once configured, simply tick the Send SNMP Trap action for any condition (along with a ruleset if desired):

snmp_action

Qure Integration

This next feature is not quite as old, but is equally as exciting ;-) Performance Advisor has always had "Top SQL" collection and normalization/aggregation, and this year we became the first to introduce integrated query plan capture and analysis. So we automatically catch your heaviest queries, and give you best-in-class features for making manual query analysis as quick and easy as possible. If your server has 5 or 10 "problem" queries, and you've got some experience dealing with execution plans and indexes, no problem, you can use Performance Advisor to straighten everything out in an afternoon. But what if you're dealing with 20, 50, 100, or more problem queries? Or what if it's not the individual query that's the problem, but rather the cumulative impact?

For many systems, automated workload-level analysis may make more sense. That's where Qure comes in. Developed by DBSophic, Qure is the brainchild of MVP Ami Levin, and is the leading software available for SQL Server workload analysis. It uses a copy of your production database and makes index, schema, and/or query changes, then tests those changes with real workloads to see which work best, before making any recommendations. Of course, we all trust the query optimizer to behave predictably and make good decisions <ahem>, but there's just no substitute for good, old-fashioned brute force validation.

SQL Sentry and Qure provide complementary functionality with very little overlap, so bringing the two together made a lot of sense for both companies. As such, DBSophic has just released a new version of Qure which only works with trace data from the SQL Sentry repository database (v6 or above), Qure for SQL Sentry:

qure_se

Here are my top 3 reasons to try Qure for SQL Sentry:

  • Faster Analysis. With the new special edition of Qure, you don't have to deal with manual workload traces. You simply point Qure to your existing SQL Sentry repository database, select the SQL Server and database to optimize, and hit "Go". Qure automatically pulls in all Top SQL trace and QuickTrace events previously captured by SQL Sentry, and uses it for the analysis.
  • Easier Validation. With the Performance Advisor dashboard's history mode, the Performance Counter Date Range Comparison report, query runtime graphs, and query plan history, with SQL Sentry you can easily validate the results of a Qure optimization from several angles.
  • Cost-effective. The new combined sku, Power Suite + Qure Quick Start Pack, gives you 5 perpetual Performance Advisor and Event Manager licenses plus 5 full Qure optimizations, all at a significant discount over purchasing the products separately. If you're already a SQL Sentry customer, you can purchase the new Qure edition separately as well.

You can run an unlimited number of sample analyses with the Qure for SQL Sentry trial software, and it will give you some good, actionable recommendations for FREE.

"Too Much Data" Begone!

Longtime Event Manager users will really appreciate this feature. Prior to this release, if you had too many event instances in a range to render them on the calendar in a meaningful fashion, we would show a block like this for the entire range:

too_much_data

When we introduced Performance Advisor and began showing performance events on the calendar, Top SQL in particular, these blocks became much more prevalent, and could really hamper visibility as well as increase time spent on filters.

Now, instead of a big "too much data" block, you'll see a small hatched rectangle on the far right of the range:

too_much_data_new

When you hover over it the entire range is highlighted, and a tooltip is displayed which shows how many other events exist in the range. The events actually displayed in the range have been prioritized by status and runtime – failures and longer running events are shown first, the rest are filtered from view. Just like before, you can double-click to zoom into a smaller range and see the other events.

Custom Calendar Highlighting

Previously, when you selected an event instance on the calendar, we would highlight other related instances using basic logic. For jobs and tasks we used the server and job name, and for Top SQL we used the "Text Data". So other Top SQL instances would only highlight if the SQL matched exactly. Depending on the scenario, this could be very limiting.

Now, you can pick and choose exactly which attribute or combination of attributes to use for highlighting related instances. Here are some examples:

  • You want to see all SQL, blocks and deadlocks for a particular Application
  • You want to see all SQL, blocks and deadlocks associated with a SQLAgent job
  • You want to see all SQLAgent jobs that are part of the same chain

All of these scenarios and more are possible. You simply right-click a calendar event and select the common attribute(s) using the new Highlight context menus:

calendar_highlighting

Tab Undocking / Multi-monitor Support

You now have the ability to drag any SQL Sentry main level tab outside of the Console. This is especially nice for multiple monitor setups. This shot shows a main tab being undocked:

tab_undocking

Fusion-io Drive Support

SSDs are definitely on the upswing in the SQL Server world. Over the past couple of years we've seen much greater acceptance and adoption in mission critical SQL Server environments, and the company that's making the most waves here is Fusion-io. Previously, none of their drives would show up in our Disk Activity view, however, due to their PCI-based architecture and the fact that they're represented differently in the Windows subsystems we use to gather drive data.

We've addressed this in v6.1, so if you have any Fusion-io drives, single or duo, they'll be rendered on Disk Activity just like all other drive types. This gives you a great way to validate their performance – latency on these drives should be low enough that the flow lines are always green. If they are not, there is a problem.

Here's a shot of a high volume OLTP system in the U.K. with lots of database files, running on a Fusion-io ioDrive and ioDrive Duo card with Windows RAID-0, courtesy of Christian Bolton (b|t) of Coeo:

Fusion-DiskActivity

New Plan Explorer Features

The new version of Plan Explorer is 1.2, and the features I'll cover below apply to both the full SQL Sentry v6.1 and Plan Explorer v1.2. The codebase is the same between them, but there are additional capabilities that open up when you use the full software, as covered in my post on SQL Sentry v6 Plan Analysis Features.

Actual Plan Retrieval

Previously you could retrieve the estimated plan from a server, but to view an actual plan you had to run the query in SSMS and copy the plan into Plan Explorer. Now you'll see an Actual Plan toolbar button that allows you to retrieve the actual plan for any query:

actual_plan

Just like SSMS, the query must be executed against the target in order to get the actual plan. You will see a progress bar while the query is running, but you won't see any query results when it completes, only the actual plan info along with actual CPU and read IO metrics.

Note the new Command Text tab above. This is an editable view of the query text, and this is what gets executed. The Text Data tab is no longer editable – that SQL comes directly from the plan, and there were several problems with making it executable that are beyond the scope of this post (and now water under the bridge ;-) I think you'll find the new design much simpler, and more robust.

We've had some lively discussion on whether or not to go down the path of showing results, but where we ended up is that Plan Explorer is a plan analysis tool, not a query tool. More often than not, viewing results isn't required to make good plan optimization decisions. We of course already show the number of estimated and actual rows, and in most cases this is sufficient.

One big advantage of not returning results is that you'll often get the actual plan back much faster than you would otherwise. For example, a query that returns 200,000 wide rows and takes 30 seconds to load in SSMS may take only 2-3 seconds in Plan Explorer.

Expressions Tab

Prior to this release, we've shown expressions only in operator tooltips. The higher you go up the plan tree, the more levels of nesting you can have, and tracing an expression all the way back to its source using tooltips could be daunting.

This is why we've added a new dedicated tab for expressions. The tab only appears if expressions exist in the plan, and it shows standard and expanded views of all expressions, along with the entire references tree for each:

pe_expressions_tab

I should warn you, there is a lot going on behind the scenes with expressions... the optimizer generates many expressions that you'd never normally see, and it can cause overload at first. More on this in a future post.

Join Diagram Tab

This tab only appears if the query has joins. It presents a view similar to the Query Designer diagram in SSMS, although only joined columns are shown for each table. Join information exists on the Plan Diagram and Query Columns tabs, but this is a different look at it which can be especially valuable if your query has views or nested views. The query optimizer flattens all views down to their base tables as part of the plan generation process, but in SSMS you see joins for the views, not the base tables. This can make it difficult to decipher which tables and columns are actually involved in a join. Fortunately, the plan contains this data, so we're able to reconstruct and show the base table joins from the plan XML. This can be very helpful for making good indexing decisions to support those joins.

pe_joins_tab

This is our first cut at joins, and getting it to this point has not been trivial. What looks like a simple join on the plan diagram can be oh so much more behind the scenes. We're still finding scenarios that we haven't seen before, but we think it's far enough along that it can provide value. If you run into anything weird, please do let us know about it – email the plan to support at sqlsentry.net.

It's your feedback that enables us to continue to improve and evolve the software, so please keep it coming!

Friday, March 4, 2011

Reducing Plan Explorer Startup Time with NGen

Recently on Twitter I ran across a comment by Oscar Zamora (b|t) about some "slowness" with Plan Explorer. We've gone to great lengths to ensure Plan Explorer is snappy and responsive, even when working with giant plans, so needless to say I was surprised to hear this (and perhaps a tiny bit incredulous ;) I contacted Oscar for more specifics about where exactly he was seeing the lag, and he was happy to help. As it turns out, it wasn't UI responsiveness he was referring to, it was the application startup time. Here are the startup times he was seeing on two different machines:

  • Desktop
    • First start: 22 seconds
    • Warm start: 5 seconds
  • Laptop
    • First start: 40 seconds
    • Warm start: 10 seconds

Yikes! After a quick version check, we determined he was still on beta version .96. After upgrading to v1.1, his load times dropped a bit:

  • Desktop
    • First start: 16 seconds
    • Warm start: 5 seconds
  • Laptop
    • First start: 25 seconds
    • Warm start: 9 seconds

Things were looking better, but still not ideal, especially on his laptop. On my desktop and laptop, both the initial and subsequent loads always run around 5 seconds. Next I took at look at his laptop specs and Windows Experience Index details. His graphics and disk performance were about 1 point lower than my laptop, but processor and RAM were about 2 points lower. This seemed to point directly to it being a jitting performance issue. JIT (just-in-time compilation) is standard behavior for .NET apps, so a startup delay is not unusual. Exactly how much of a delay depends on several factors, including timing of method calls, the amount of code being jitted, as well as CPU, memory, and, to some extent, disk speed.

It's a safe bet that the slower CPU and memory on Oscar's laptop are causing jitting to take longer than normal, but what can we do about it? Since the Plan Explorer code itself is already well-optimized, there really was only one other option: using the Native Image Generator, or NGen.

NGen is a tool that effectively pre-compiles an entire .NET app before runtime, and caches an image of it on disk. We've known about NGen since its first release, but since prior versions suffered from various shortcomings, we had elected not to use it. However, in .NET 3.5 SP1 and in .NET 4.0 many improvements were made, and since we now require .NET 4.0 for both Plan Explorer and SQL Sentry v6, it was time for another look.

Using NGen is straightforward. You can easily create a pre-compiled image of Plan Explorer via these steps:

  1. Open a command prompt as an Administrator
  2. Change to the .NET 4.0 directory:
    x86 systems:  cd %windir%\Microsoft.NET\Framework\v4.0.30319\
    x64 systems:  cd %windir%\Microsoft.NET\Framework64\v4.0.30319\
  3. Run the following command (assuming you've used the default install path – if not, change it accordingly):
    ngen install "C:\Program Files\SQL Sentry\SQL Sentry Plan Explorer\SQL Sentry Plan Explorer.exe"

You should see a bunch of compilation-related messages fly by. If you don't, then something isn't right, perhaps an invalid path. When the messages complete, go ahead and open Plan Explorer.

Or, if you'd prefer to forego the manual process above, you can download the new Plan Explorer release with NGen compilation built-in. The compilation is deferred; it doesn't actually happen during setup. Rather, the NGen service performs the compilation behind the scenes after installation, the next time your computer is "idle". For this reason, bear in mind it may not complete before you launch Plan Explorer for the first time. You can tell when it's done by looking at the "Microsoft .NET Framework NGEN v4.0.30319_x64" service status. It only starts when there are items queued for compilation, so if it's stopped that means it has finished its work.

Either way you go, you should be pleased with the new load time. On every machine we've tested, startup is almost instant. If you're opening Plan Explorer regularly, this time savings can add up, leaving you more time for plan-hacking!