Friday, October 7, 2011

SQL Sentry v6.2: SharePoint Timer Jobs, Oh My!

We recently released SQL Sentry v6.2, and it introduces an exciting new product: Event Manager for SharePoint. v6.2 contains several other new features as well, including SSAS usage metrics, performance dashboard event overlay, enhanced console security, and a streamlined setup process. SharePoint support and SSAS usage in particular have been heavily requested, so this release should make quite a few customers very happy!

SharePoint Timer Job Monitoring

If you already use SharePoint, you may be familiar with the concept of timer jobs. SharePoint 2010 has its own scheduler service that carries out various background tasks on an ongoing basis – over 100 timer jobs out of the box, ranging from health checks to data imports to history cleanup. These jobs can run on any server in the farm, and they can access resources on other servers. Most of the jobs are set to run on predefined recurring schedules, and the net result is a significant amount of timer job activity going on pretty much all day and night. Since much of this activity is touching SQL Server, it's critical for the database administrator to have visibility into exactly when the jobs are running, and to be able to ascertain the impact they are having on performance.

Using the Event Manager for SharePoint calendar, here's a shot of a typical SharePoint timer job schedule over two days:

sharepoint_cal_2day_hist

The instances in orange indicate a runtime overlap of at least 20 seconds, so it's apparent that many timer jobs run at the same time – that's right, the dreaded schedule collisions! One of the primary reasons for the existence of the Event Manager calendar is of course to be able to detect and resolve schedule collisions, so even if you've already cleaned up your SQL Agent schedules, you'll be able to put it to good use again for timer jobs ;-)

Here's a shot of the 12am slot with timer jobs only:

sharepoint_cal_tmd_hilite_sm

We have no less than 20 timer jobs running concurrently at midnight every night. The calendar view can be easily expanded to include other events to see if these jobs are colliding with SQL Agent jobs, for example.

Here's a closer view of the 4pm slot, with SQL events from the timer jobs included. You can distinguish the timer jobs from the SQL events at a glance by the small glyphs in the upper left of each instance:

sharepoint_collision_raw

I know, this is pretty ugly, so I'll apply a quick filter to show only events with duration >3 seconds:

sharepoint_collision_3sec

Again, these are all out of the box jobs and schedules! In multi-server farms, not all jobs may be running on the same server, so the server on which a job ran is shown in its tooltip along with many other details.

Timer job schedules can work quite differently than SQL Agent, Windows Scheduler, or other schedulers you may have seen. They can use the concept of "ranges" – you set possible start and end times, and the job can run any time during that range. Supposedly the randomness will help avoid job collisions... but note the "SharePoint BI Maintenance" job in the middle of this shot with odd start time of "12:00:40 PM":

sharepoint_collision

Yep, you guessed it, it's using a range! If I right-click the job and select Properties, I'm launched into the job schedule in the SharePoint Central Administration web interface:

image

We can see that this job is set to run hourly, at any time during the hour. So the randomness didn't help much here – it ended up running concurrently with 4 other jobs. Personally, I like to control exactly when my jobs run and not to leave that up to chance, so I changed both range values to 30, which effectively tells the job to run only at 30 minutes past the hour. (NOTE: You need to be setup as a farm administrator to access the schedule properties.)

Can timer jobs impact SQL Server performance? You'd better believe it! I'll demonstrate using another new feature, Dashboard Event Overlay (more on this below). By right-clicking a longer running timer job and selecting Jump To > Performance Advisor > Dashboard...

sharepoint_jumpto_dashboard

...I'm taken right into the Performance Advisor dashboard where the event duration is overlaid on the X-axis of each chart:

sharepoint_dashboard

We can see that there is a strong correlation with this job and several metrics, including:

  • Network inbound
  • CPU
  • Windows hard faults
  • Disk latency
  • Transactions and batches
  • Bookmark lookups
  • Disk waits
  • Log flushes

Since this particular job runs every 30 minutes, the real question becomes, how will it impact user activity against SharePoint databases on the same server throughout the day? More on this in a future post.

SSAS Usage Totals

The next major new feature is the SSAS Usage Totals tab, which is part of Performance Advisor for SSAS:

ssas_usage_totals

Previously, SSAS usage information was only available at the query level, and you were unable to see the actual attribute members requested. This made it difficult to get a server-wide perspective on data access patterns. You can now see this server-level data aggregated three ways – by attribute, aggregation, or partition – over any date range. If you spot a particular attribute combination that is heavily requested by many queries, you can right-click the row and copy the attribute vector to the clipboard, then paste it into Agg Manager to create the aggregation. (Look for this capability to be fully integrated in a future release.)

Enhanced Console Security

A common need, especially in larger, more segmented environments, is the ability to restrict visibility and access to certain servers to specific users. Previously, in the SQL Sentry Console you could use groups to effectively hide servers from unauthorized users, and we've always leveraged native security to ensure that there are no escalation of privileges risks – just because someone could "see" a server didn't mean they could do anything with it. Although this was fine for many environments, in others, unauthorized users aren't even allowed to know that certain servers exist! This was a problem.

Hence the need for this new capability, which allows a SQL Sentry administrator to truly restrict SQL Sentry users to specific SQL Servers via their Windows login:

security_props

Once associated, you simply select the servers that the user can access:

security_rights

For existing environments, nothing changes automatically when upgrading to v6.2. However, once a user has been associated with at least one SQL Server, from that point forward they can no longer access any other SQL Servers monitored by SQL Sentry in any way. This goes for any new servers that may be added later – the user must be explicitly associated with those servers before they are accessible via the console.

Dashboard Event Overlay

A powerful new feature is the ability to overlay events across the Performance Advisor dashboard charts (see screenshots above). Previously, you might know that a SQL statement ran between X time and Y time, and you suspected that it was impacting performance, but when you flipped over to the dashboard there were no visual cues as to exactly when the event ran – you had to perform the correlation in your head. Now, if you use the Jump to Dashboard context menu from either the Event Manager calendar or the Top SQL tab, you'll notice new horizontal indicators on the X-axes representing the selected event(s). When coming from the calendar, any event highlighted using the new enhanced highlighting options will have a corresponding indicator on the dashboard. This means that you can easily see what impact a particular user, application, or even SPID, is having on performance!

Streamlined Setup

Last but not least, we took a hard look at the installation process and initial setup wizard. Several screens were eliminated and/or consolidated, and the process was simplified overall. In addition, we've added a new Start Page which gives you quick access to some of the more commonly used functions, every time you launch the console:

start_page

We'll be adding more to this page as time goes on.

Take v6.2 for a spin and let us know what you think: new users | current users. I'm at the PASS Summit this week, so stop by for an in person demonstration, as well as a sneak peak at some exciting new features coming in v6.5!

Wednesday, May 18, 2011

The 2013 PASS Summit in Charlotte – Good Call!

We woke up today to the great news that the Summit is coming to Charlotte in 2013! This is something that we'd hoped would happen eventually, but with all of the back and forth on this topic over the past couple of years, we honestly weren't holding our collective breath. Although Charlotte appeared to be on pretty much everyone's Top 10 list of possible future Summit locations, for good reason, there just didn't seem to be any kind of critical mass. There is a huge group that would be happy if the Summit never left Seattle... and, like us, pretty much everyone else wants the Summit in their city. ;-)

As a SQL Server ISV based in the Charlotte area, we've advocated bringing the Summit to Charlotte at times, whenever folks would listen... but we aren't the loudest bunch, and we're also not ones to get too wrapped up in the political side of things. We keep a fairly low profile, we'll state our case and leave it at that... we're just not going to try too hard to sway anyone in any particular direction, for better or worse.

That said, our partner liaison, Peter Shire (b|t), has worked tirelessly for several years on the Charlotte PASS Chapter to foster a vibrant SQL Server community here, with the help of our local Microsoft offices. (The #2 SQL Server support operation behind Dallas is located here in Charlotte.) In addition, last year volunteers from SQL Sentry and Microsoft helped to pull off a highly successful SQLSaturday event, which I'd like to think helped to put us on the map. It did appear to leave many with a positive impression of our city and its ability to support something like a Summit.

Ultimately, the folks on the PASS Board performed the necessary diligence to find the best possible location, and looking at the criteria board members like Tom LaRock used, they made the right call. Every city has its pros and cons, and although Charlotte may not be the strongest in all areas, overall it's just a tough city to beat – I'd call it "well-rounded." I've lived here since the early '80s, and I've spent time in most major cities around the U.S. and Europe, and without a doubt, Charlotte holds its own against any of them.

As many will find out, Charlotte is not only a great city, it's truly an optimal location to hold a PASS Summit. Will there be bumps in the road? Of course – it's not going to be perfect. There will be gripes about various things, there always are. But on balance I think it's the right move. Bring the Summit to a well-rounded East coast location with a strong Microsoft presence, and expose many database professionals that won't normally make it to Seattle to this great educational and community event.

For me, 2013 can't come soon enough!

Tuesday, April 26, 2011

Don't Fear the Trace

Not a week goes by that I don't run across either a comparison of server-side (file) and Profiler (rowset) trace performance, with rowset ending up with the short end of the stick; and/or someone who's afraid (or not allowed) to run traces against a production server because it will undoubtedly kill SQL Server performance. The amount of misinformation out there on this topic is mind-boggling – IMO it's one of the most misrepresented aspects of SQL Server. Unfortunately, it seems many have taken the misinformation at face value, and have perpetuated it without performing sufficient diligence.

There are a some key points that always should be, but rarely seem to be, considered when discussing the performance impact of tracing:

  • Disk speed matters for server-side traces. File traces are "lossless", so can and will block SQL Server activity if the disk where the trace file is being written is not keeping up. A nice overview by Adam Machanic (b|t) is here: http://technet.microsoft.com/en-us/library/cc293610.aspx. This means that if you are writing too many events to a busy or otherwise slow disk, it can bring SQL Server to a screeching halt. This is easy to repro – try writing a heavy file trace to a USB thumb drive. This may sound extreme, but it's probably not far off many SQL Server disk systems I see where busy data and log files are mixed on the same set of spindles, causing severe contention.
  • Rowset traces will drop events to avoid stopping SQL Server activity. The longest a rowset trace will wait before dropping events is 20 seconds. This is less than the commonly used application query timeout of 30 seconds. This can be a bit harder to repro with Profiler, but we've done it in our lab by creating a .NET-based rowset trace consumer with artificial delays in code to slow trace event consumption. In this sense, rowset traces can be considered "safer" than file traces, since although they can slow SQL Server, they won't stop it completely.
  • Filters matter, for both trace types. Although the above points are important to understand, they only come into play in extreme scenarios that most of us should never see. You should almost never need to run an unfiltered trace, or use "Starting" or other events that aren't filterable. For example, if you use only RPC:Completed and SQL:BatchCompleted events and restrict the rows returned using selective filter thresholds against integer-based columns like Duration, CPU, etc., and avoid text filters, especially with wildcards, you can minimize the impact to performance of either trace type.
  • Profiler <> Rowset. Profiler uses a rowset trace, and so frequently when Profiler is knocked, rowset is condemned by extension. They are not one and the same. SQL Sentry Performance Advisor (and some other tools) also use a rowset trace. We stream the trace binary data back and decode it, as described at the bottom of this page: http://technet.microsoft.com/en-us/library/cc293613.aspx. While Profiler has a relatively heavy interface for working with trace data, our trace server, a Windows service, does not. It uses a highly optimized multi-threaded scheme for consuming trace rows as quickly as possible, and efficiently storing the trace data for later retrieval. So although the network overhead may be roughly equivalent for the same trace run by Profiler (remotely) or our trace server, that's where any similarity ends. If the trace uses smart filtering, the overhead should be a non-issue anyway – by default we use a conservative filter of Duration>=5000ms, with a floor of 100ms for safety if no other CPU or I/O filters have been applied.

A Simple Test

We recently conducted some cursory tests, primarily to confirm the blocking/non-blocking behavior described above. We simulated high activity via a lightweight T-SQL statement in a 1 million cycle loop, and used a trace with only "Completed" events and no filters. The goal was to generate 1 million trace events as quickly as possible, and assess the impact consuming those events by each trace type had on the total load run time. I want to emphasize that this is an isolated test scenario, and the results should not be over-generalized. We plan on conducting much more comprehensive testing in the near future. Nonetheless, the results are interesting:

  • Both server-side and rowset traces always caused the load to run slower than when there were no traces.
  • A server-side trace writing to slow local disk caused the load to run much slower than a rowset trace consumed locally.
  • A server-side trace writing to a fast local disk caused the load to run slightly faster than a rowset trace being consumed remotely over a 1000Mbps network.
  • After factoring in the additional overhead required to copy the trace file over the network and load it into Profiler, the overall time for the server-side trace from the last test was greater.

It seems that with remote tracing, you can pay me now with rowset, or pay me later with server-side. Although the results were repeatable, I'm not publishing test details at this point because these tests were not strictly controlled. The point here is that trace impact on performance is always an "it depends" scenario. Unfortunately it rarely seems to be presented as such – blanket statements such as "Profiler is dangerous" and "server-side traces are always faster" are commonplace. Sure, consumer type and location, network speed, etc., can certainly affect the performance impact of rowset traces, but so can writing a server-side trace file to the same spindles as a busy transaction log... and the impact can potentially be much more detrimental to your SQL Server.

The Source

Many thousands of SQL Servers are monitored 24/7 by our rowset-based trace server and we haven't had a single report where its impact on the target was even noticeable, so we're always puzzled by the anti-rowset sentiments. I've tried to assess how the confusion and fear started, and much of it seems to have a common source: the "Standard" Profiler trace:

std_trace_props

It's been with us forever, and I'm not sure who created it as the default trace... but best case it is of little to no value, and worst case it is dangerous to your SQL Server. Presenting this trace as the default to a SQL Server newcomer is analogous to putting a loaded weapon in front of a toddler. There are no filters applied, and it includes Audit Login and SQL:BatchStarting events, neither of which are filterable by any performance metrics (Duration, CPU, Reads, Writes), meaning you will be capturing all of them. It also includes Audit Logout, which is effectively not filterable by Reads or Writes when connection pooling is in use, since those metrics are cumulative. On a busy OLTP system this trace is bad, very bad!

std_trace_filters

So unless you've changed the default trace (easy to do via Edit Template), every time you open Profiler you've got a very high overhead trace as the starting point. As a result, this trace is referenced in books, trace performance benchmarks have been conducted and published using it, and countless users have naively run it as-is against busy servers. I'm convinced that this is a big reason that Profiler is so frequently kicked around, and why so many are now afraid to use it at all.

Bottom line: Don't use this trace! Certainly don't use it alone for benchmarking the relative performance impact of the two trace types.

A true comparative test should consider:

  • Events and columns used
  • Trace consumer location (local or remote)
  • Disk location (local or remote)
  • Disk speed and type (shared or dedicated)
  • Network speed
  • Load type (T-SQL text size, DML mix, etc.)
  • Transactions/Batches per second on the target
  • CPU and memory resources available
  • Filter types
  • Filter thresholds
  • ...

As I mentioned, we have more comprehensive testing coming which covers these factors, and we'll publish all of the details, with hopes of clearing up a lot of confusion and fear around tracing once and for all.

In the meantime, don't fear the trace. If you target the specific events and columns you need, and use conservative integer-based filters to restrict the rows returned, the benefits gained through intelligent analysis of the data should far outweigh any performance impact on your SQL Servers.

Friday, April 8, 2011

SQL Sentry v6.1: Something Old, Several Things New

Since releasing v6 a couple of months ago, we've been hard at work on some cool new features for v6.1. Our x.1 releases are usually comprised mostly of fixes for bugs not caught during the major release (x.0) beta, and a few new features. This x.1 release is different in that contains many exciting and high impact new features, and a higher feature:fix ratio than normal. This is likely due to the lengthy v6 beta, as well as some new QA processes we've put in place. Change list is here.

We've just published the v6.1 Release Candidate to the website (new users|current users), and our team is showing it off now at SQLBits 8 in Brighton, UK. Read on for an overview...

SNMP Support

I'm covering this one first, as it's been on the feature list the longest. Believe it or not, we obtained our SNMP "enterprise number" and started on our MIBs in 2004, not long after the original release of Event Manager v1! Our enterprise number is 20707, and the latest enterprise number issued as of this post was... gulp... 37722. (I think I'll filter this feature from our "average turnaround time" calculation ;-) We didn't actually start coding SNMP support until recently though – for a few reasons it kept getting prioritized down: high complexity, relatively low demand, and the fact that users have been able to work around it and get alerts into other monitoring systems via our "execute" actions (Execute Process and Execute SQL). SNMP is not a feature that everyone uses, but those that do really depend on it.

Fortunately, earlier this year, the most knowledgeable and capable SNMP developer on the planet, Eric Eicke (b|t), joined our development team. Eric developed the first .NET-based SNMP component several years ago, and it's used by many large organizations, including Microsoft and HP. Eric was also the one that helped us get started on SNMP back in 2004. There was no fumbling around with RFC's, Eric knew exactly what needed to be done and did it, including v3 support, authentication, and encryption!

MIBs are installed as part of the new setup. Once you've loaded them into your enterprise monitoring system, activating SNMP in SQL Sentry is straightforward from the new SNMP Config tab:

snmp_settings

Once configured, simply tick the Send SNMP Trap action for any condition (along with a ruleset if desired):

snmp_action

Qure Integration

This next feature is not quite as old, but is equally as exciting ;-) Performance Advisor has always had "Top SQL" collection and normalization/aggregation, and this year we became the first to introduce integrated query plan capture and analysis. So we automatically catch your heaviest queries, and give you best-in-class features for making manual query analysis as quick and easy as possible. If your server has 5 or 10 "problem" queries, and you've got some experience dealing with execution plans and indexes, no problem, you can use Performance Advisor to straighten everything out in an afternoon. But what if you're dealing with 20, 50, 100, or more problem queries? Or what if it's not the individual query that's the problem, but rather the cumulative impact?

For many systems, automated workload-level analysis may make more sense. That's where Qure comes in. Developed by DBSophic, Qure is the brainchild of MVP Ami Levin, and is the leading software available for SQL Server workload analysis. It uses a copy of your production database and makes index, schema, and/or query changes, then tests those changes with real workloads to see which work best, before making any recommendations. Of course, we all trust the query optimizer to behave predictably and make good decisions <ahem>, but there's just no substitute for good, old-fashioned brute force validation.

SQL Sentry and Qure provide complementary functionality with very little overlap, so bringing the two together made a lot of sense for both companies. As such, DBSophic has just released a new version of Qure which only works with trace data from the SQL Sentry repository database (v6 or above), Qure for SQL Sentry:

qure_se

Here are my top 3 reasons to try Qure for SQL Sentry:

  • Faster Analysis. With the new special edition of Qure, you don't have to deal with manual workload traces. You simply point Qure to your existing SQL Sentry repository database, select the SQL Server and database to optimize, and hit "Go". Qure automatically pulls in all Top SQL trace and QuickTrace events previously captured by SQL Sentry, and uses it for the analysis.
  • Easier Validation. With the Performance Advisor dashboard's history mode, the Performance Counter Date Range Comparison report, query runtime graphs, and query plan history, with SQL Sentry you can easily validate the results of a Qure optimization from several angles.
  • Cost-effective. The new combined sku, Power Suite + Qure Quick Start Pack, gives you 5 perpetual Performance Advisor and Event Manager licenses plus 5 full Qure optimizations, all at a significant discount over purchasing the products separately. If you're already a SQL Sentry customer, you can purchase the new Qure edition separately as well.

You can run an unlimited number of sample analyses with the Qure for SQL Sentry trial software, and it will give you some good, actionable recommendations for FREE.

"Too Much Data" Begone!

Longtime Event Manager users will really appreciate this feature. Prior to this release, if you had too many event instances in a range to render them on the calendar in a meaningful fashion, we would show a block like this for the entire range:

too_much_data

When we introduced Performance Advisor and began showing performance events on the calendar, Top SQL in particular, these blocks became much more prevalent, and could really hamper visibility as well as increase time spent on filters.

Now, instead of a big "too much data" block, you'll see a small hatched rectangle on the far right of the range:

too_much_data_new

When you hover over it the entire range is highlighted, and a tooltip is displayed which shows how many other events exist in the range. The events actually displayed in the range have been prioritized by status and runtime – failures and longer running events are shown first, the rest are filtered from view. Just like before, you can double-click to zoom into a smaller range and see the other events.

Custom Calendar Highlighting

Previously, when you selected an event instance on the calendar, we would highlight other related instances using basic logic. For jobs and tasks we used the server and job name, and for Top SQL we used the "Text Data". So other Top SQL instances would only highlight if the SQL matched exactly. Depending on the scenario, this could be very limiting.

Now, you can pick and choose exactly which attribute or combination of attributes to use for highlighting related instances. Here are some examples:

  • You want to see all SQL, blocks and deadlocks for a particular Application
  • You want to see all SQL, blocks and deadlocks associated with a SQLAgent job
  • You want to see all SQLAgent jobs that are part of the same chain

All of these scenarios and more are possible. You simply right-click a calendar event and select the common attribute(s) using the new Highlight context menus:

calendar_highlighting

Tab Undocking / Multi-monitor Support

You now have the ability to drag any SQL Sentry main level tab outside of the Console. This is especially nice for multiple monitor setups. This shot shows a main tab being undocked:

tab_undocking

Fusion-io Drive Support

SSDs are definitely on the upswing in the SQL Server world. Over the past couple of years we've seen much greater acceptance and adoption in mission critical SQL Server environments, and the company that's making the most waves here is Fusion-io. Previously, none of their drives would show up in our Disk Activity view, however, due to their PCI-based architecture and the fact that they're represented differently in the Windows subsystems we use to gather drive data.

We've addressed this in v6.1, so if you have any Fusion-io drives, single or duo, they'll be rendered on Disk Activity just like all other drive types. This gives you a great way to validate their performance – latency on these drives should be low enough that the flow lines are always green. If they are not, there is a problem.

Here's a shot of a high volume OLTP system in the U.K. with lots of database files, running on a Fusion-io ioDrive and ioDrive Duo card with Windows RAID-0, courtesy of Christian Bolton (b|t) of Coeo:

Fusion-DiskActivity

New Plan Explorer Features

The new version of Plan Explorer is 1.2, and the features I'll cover below apply to both the full SQL Sentry v6.1 and Plan Explorer v1.2. The codebase is the same between them, but there are additional capabilities that open up when you use the full software, as covered in my post on SQL Sentry v6 Plan Analysis Features.

Actual Plan Retrieval

Previously you could retrieve the estimated plan from a server, but to view an actual plan you had to run the query in SSMS and copy the plan into Plan Explorer. Now you'll see an Actual Plan toolbar button that allows you to retrieve the actual plan for any query:

actual_plan

Just like SSMS, the query must be executed against the target in order to get the actual plan. You will see a progress bar while the query is running, but you won't see any query results when it completes, only the actual plan info along with actual CPU and read IO metrics.

Note the new Command Text tab above. This is an editable view of the query text, and this is what gets executed. The Text Data tab is no longer editable – that SQL comes directly from the plan, and there were several problems with making it executable that are beyond the scope of this post (and now water under the bridge ;-) I think you'll find the new design much simpler, and more robust.

We've had some lively discussion on whether or not to go down the path of showing results, but where we ended up is that Plan Explorer is a plan analysis tool, not a query tool. More often than not, viewing results isn't required to make good plan optimization decisions. We of course already show the number of estimated and actual rows, and in most cases this is sufficient.

One big advantage of not returning results is that you'll often get the actual plan back much faster than you would otherwise. For example, a query that returns 200,000 wide rows and takes 30 seconds to load in SSMS may take only 2-3 seconds in Plan Explorer.

Expressions Tab

Prior to this release, we've shown expressions only in operator tooltips. The higher you go up the plan tree, the more levels of nesting you can have, and tracing an expression all the way back to its source using tooltips could be daunting.

This is why we've added a new dedicated tab for expressions. The tab only appears if expressions exist in the plan, and it shows standard and expanded views of all expressions, along with the entire references tree for each:

pe_expressions_tab

I should warn you, there is a lot going on behind the scenes with expressions... the optimizer generates many expressions that you'd never normally see, and it can cause overload at first. More on this in a future post.

Join Diagram Tab

This tab only appears if the query has joins. It presents a view similar to the Query Designer diagram in SSMS, although only joined columns are shown for each table. Join information exists on the Plan Diagram and Query Columns tabs, but this is a different look at it which can be especially valuable if your query has views or nested views. The query optimizer flattens all views down to their base tables as part of the plan generation process, but in SSMS you see joins for the views, not the base tables. This can make it difficult to decipher which tables and columns are actually involved in a join. Fortunately, the plan contains this data, so we're able to reconstruct and show the base table joins from the plan XML. This can be very helpful for making good indexing decisions to support those joins.

pe_joins_tab

This is our first cut at joins, and getting it to this point has not been trivial. What looks like a simple join on the plan diagram can be oh so much more behind the scenes. We're still finding scenarios that we haven't seen before, but we think it's far enough along that it can provide value. If you run into anything weird, please do let us know about it – email the plan to support at sqlsentry.net.

It's your feedback that enables us to continue to improve and evolve the software, so please keep it coming!

Friday, March 4, 2011

Reducing Plan Explorer Startup Time with NGen

Recently on Twitter I ran across a comment by Oscar Zamora (b|t) about some "slowness" with Plan Explorer. We've gone to great lengths to ensure Plan Explorer is snappy and responsive, even when working with giant plans, so needless to say I was surprised to hear this (and perhaps a tiny bit incredulous ;) I contacted Oscar for more specifics about where exactly he was seeing the lag, and he was happy to help. As it turns out, it wasn't UI responsiveness he was referring to, it was the application startup time. Here are the startup times he was seeing on two different machines:

  • Desktop
    • First start: 22 seconds
    • Warm start: 5 seconds
  • Laptop
    • First start: 40 seconds
    • Warm start: 10 seconds

Yikes! After a quick version check, we determined he was still on beta version .96. After upgrading to v1.1, his load times dropped a bit:

  • Desktop
    • First start: 16 seconds
    • Warm start: 5 seconds
  • Laptop
    • First start: 25 seconds
    • Warm start: 9 seconds

Things were looking better, but still not ideal, especially on his laptop. On my desktop and laptop, both the initial and subsequent loads always run around 5 seconds. Next I took at look at his laptop specs and Windows Experience Index details. His graphics and disk performance were about 1 point lower than my laptop, but processor and RAM were about 2 points lower. This seemed to point directly to it being a jitting performance issue. JIT (just-in-time compilation) is standard behavior for .NET apps, so a startup delay is not unusual. Exactly how much of a delay depends on several factors, including timing of method calls, the amount of code being jitted, as well as CPU, memory, and, to some extent, disk speed.

It's a safe bet that the slower CPU and memory on Oscar's laptop are causing jitting to take longer than normal, but what can we do about it? Since the Plan Explorer code itself is already well-optimized, there really was only one other option: using the Native Image Generator, or NGen.

NGen is a tool that effectively pre-compiles an entire .NET app before runtime, and caches an image of it on disk. We've known about NGen since its first release, but since prior versions suffered from various shortcomings, we had elected not to use it. However, in .NET 3.5 SP1 and in .NET 4.0 many improvements were made, and since we now require .NET 4.0 for both Plan Explorer and SQL Sentry v6, it was time for another look.

Using NGen is straightforward. You can easily create a pre-compiled image of Plan Explorer via these steps:

  1. Open a command prompt as an Administrator
  2. Change to the .NET 4.0 directory:
    x86 systems:  cd %windir%\Microsoft.NET\Framework\v4.0.30319\
    x64 systems:  cd %windir%\Microsoft.NET\Framework64\v4.0.30319\
  3. Run the following command (assuming you've used the default install path – if not, change it accordingly):
    ngen install "C:\Program Files\SQL Sentry\SQL Sentry Plan Explorer\SQL Sentry Plan Explorer.exe"

You should see a bunch of compilation-related messages fly by. If you don't, then something isn't right, perhaps an invalid path. When the messages complete, go ahead and open Plan Explorer.

Or, if you'd prefer to forego the manual process above, you can download the new Plan Explorer release with NGen compilation built-in. The compilation is deferred; it doesn't actually happen during setup. Rather, the NGen service performs the compilation behind the scenes after installation, the next time your computer is "idle". For this reason, bear in mind it may not complete before you launch Plan Explorer for the first time. You can tell when it's done by looking at the "Microsoft .NET Framework NGEN v4.0.30319_x64" service status. It only starts when there are items queued for compilation, so if it's stopped that means it has finished its work.

Either way you go, you should be pleased with the new load time. On every machine we've tested, startup is almost instant. If you're opening Plan Explorer regularly, this time savings can add up, leaving you more time for plan-hacking!

Wednesday, February 16, 2011

SQL Sentry v6 Plan Analysis Features

Almost a year ago, we embarked on the journey to SQL Sentry v6. This was not long after we had decided that our release cycles were too long, and I had just boldly proclaimed that there would be no more year-long cycles – we were shifting to smaller, incremental releases! They're easier to test, they get new features in the hands of our users faster, they keep up momentum, morale, etc.

All of this is true, but looking back it probably wasn't the right release to make that decision. ;-) At the time, our vision of plan analysis was hooking into SSMS – like we do in other parts of the app – in order to render the graphical plan diagram for statements captured via our "Top SQL" subsystem (a lightweight trace that runs continuously and catches "big hitter" queries). I'm not sure of the exact point at which we ditched that idea in favor of rolling our own plan analysis capability, but I remember it happened very quickly. It was probably right after our lead developer Brooke Philpott (blog|twitter), never one to shy away from a development challenge, threw together a quick prototype showing that we could indeed parse the plan XML and render a graphical tree, among other things. Next thing you know, what would have taken a few weeks took over 10 months!

That said, I'm glad we made the decision to build our own plan analysis system, and would do it again. We've received tremendous feedback since releasing the free Plan Explorer, and we're thrilled that it's making a real difference for SQL Server users.

FREE Plan Explorer vs. Performance Advisor v6

This is something we've been asked about a lot lately. Notice I didn't say "... vs. FULL Plan Explorer". That's because there is no "full" version of Plan Explorer per se; it's all the same code. Rather there is an "integrated" version that resides within the SQL Sentry Console, and it has additional capabilities that open up when you have a SQL Sentry Server service and database installed. There are also other plan analysis features included in v6 that exist completely outside of Plan Explorer.

You may have noticed that there are some grayed out columns in the free Plan Explorer statements tree, and when you hover over them you get this message:

full_version_tooltip

No, this isn't some cheap attempt by marketing to lure you in and then force you to buy the full software. Rather, this is the development team telling you that what you are trying to do isn't technically possible... unless you have a SQL Sentry service and database. Which, if you did, would mean that Plan Explorer was no longer a tiny download with simple installation and minimal footprint. Sure, we want you to see what you're missing in the full software, but it was easier and less invasive to just gray columns and/or throw up a message box than add new code paths to selectively show/hide features.

When we unbolted Plan Explorer from the full software, anywhere there was something we just couldn't do, we added this note. Another place you'll see this is if you try to create or edit an index, since that would require SSMS integration, as well as the ability to determine the location of the server and database in question (since they aren't stored in a plan), and then authenticate. This would mean additional development and testing, new external dependencies, and opportunities for something to break.

As an example of what we deal with when integrating with SSMS, in SQL Server 2008 R2 some of our code – which worked perfectly in 2008 – suddenly ceased to function, but only if the SSMS Object Explorer was unpinned!? It took us a while to figure that one out. So index management is something technically we could pull off, and we may add to the free Plan Explorer at some point, but it's honestly not at the top of the list.

Active Plan Capture

When you install SQL Sentry v6, you gain what we call active plan capture, and all of the new plan analysis features depend on this. The Server service continuously monitors running queries and completed trace events and, when appropriate, queries the plan cache for plan XML and/or other plan details, and automatically assembling all of the pieces together. This enables us to do several things that aren't possible with the free Plan Explorer, or any other available software for that matter:

  • Capture a query's plan at the time of query execution and associate it with the query.
  • Assemble the entire call stack for a query, including any nested stored procedure and/or nested dynamic SQL calls.
  • Show statement level metrics from the execution plan side-by-side with trace and DMV metrics.
  • Keep a history of all plan versions used by a query over time, so you can see where plan changes impact performance.

I'll go into each of these in more detail below. If you want to know more about the graphical plan diagram and other shared Plan Explorer features first, check out this post or Aaron Bertrand's (blog|twitter) Plan Explorer posts.

The Query Plan at Query Execution

With the free Plan Explorer version, it's up to the user to identify problem queries, somehow capture the estimated or actual plans, and then manually load them in Plan Explorer. The full version of Performance Advisor does all of this for you. Heavy queries (batches or RPCs) are continuously picked up by the Top SQL trace as normal, based upon the filter settings you provide ("Duration > 5 seconds" by default), and plan information is immediately captured and associated with the queries. The only thing you need to access the plan is click the View button in the new Plan column:

topsql_view_plan

As you can see, this column exists for both completed and running queries... so yes, you can view the plan while a query is running!

It's important to note that this is the estimated plan at time of query execution, not the actual plan including actual rows, executions and data for each plan operation. It is not currently possible to capture the actual plan continuously in a low overhead fashion. (No, not even with Extended Events ;-) We pretty much knew this going in, but we spent a fair amount of time trying some different approaches to pull it off anyway, to no avail. There is some hope that this will be possible in a future version of SQL Server, however. That said, the estimated plan at query execution is certainly the next best thing, and even has some advantages which I'll delve into in a future post.

I've manually added a couple of columns to the Completed Queries grid above: Has Plan and Has Statements. These are hidden by default, but are handy for quickly identifying batches and RPCs for which we were able to capture a plan and/or statements, which will impact what you see when you click "View." Depending on the type of query, timing, and other issues, it's not always possible to capture the plan. It's also possible to capture a batch or RPC with no statements, if none exceeded the trace filter thresholds (this has always been the case).

A Quick Aside: Other Top SQL Enhancements

Long-time Performance Advisor users will probably notice some other differences in the above Top SQL shot, so I'll touch on them briefly:

  • The Running Queries pane – we now show you heavy queries (again based upon your filter settings) while they are running, not just after they complete.
  • The Filter pane is hidden by default.
  • Statement events (SQL:StmtCompleted or SP:StmtCompleted events) are no longer shown here, only SQL:BatchCompleted and RPC:Completed events. This is because they are now integrated into the Plan Explorer view. More on this later.
  • Text Data (TSQL, MDX and XMLA) is now colorized.

Some additional changes to Top SQL that aren't apparent above:

  • Auto-growth and Auto-shrink events are now captured. Both can be a hidden performance killer and can bring activity to a screeching halt while file space is adjusted.
  • New columns have been added:
    • Error - shows the general reason a query terminated prematurely. Rows with errors are also highlighted with a light red background.
    • Information - shows either "rows affected" or "auto-growth size" depending on the event.
  • Blocked spids are shown with a light orange background in the Running Queries pane.
The Query Call Stack

A Profiler trace presents a stream of trace events, and there is little to no sense of how the events are related to one another. Throw in nested procedure calls and dynamic SQL, and all bets are off.

This is why one of the most significant features gained with the full software is the query call stack assembled in real time. This lets you see exactly how captured statements are related and where they fit into the query plan. Most importantly, it shows you which statements did the bulk of the work, and which are candidates for optimization.

Below is the Statements Tree pane for a stored procedure that calls some dynamic SQL via sp_executesql, which calls another stored procedure, which proceeds to call more dynamic SQL:

call_stack
Plan Explorer Statements Tree – Fully Assembled Stack (Performance Advisor v6)

I've highlighted the dynamic SQL calls in yellow, the nested procedure calls in light blue, and the final dynamic SQL generated in pink. Note that the parent procedure itself in this case is effectively glue; it's the dynamic SQL that is doing the bulk of the work. If you were to look at the estimated plan in Plan Explorer, you'd see this:

call_stack_est
Plan Explorer Statements Tree - Estimated Plan (free version)

If you click to enlarge you'll see the initial dynamic SQL call in yellow at the very bottom. That's the same line that makes the three initial dynamic SQL calls in the full stack shot above! Problem is, the compiled estimated plan has no visibility into what that dynamic SQL will actually look like, let alone how it will perform.

A big difference between the two views above is that the call stack shot is in Show Captured Statements Only mode, which is the default whenever we have at least one statement for a batch or RPC. In this mode you see only statements that exceeded the trace filters along with any relevant control structures, in this case an IF block and a WHILE loop. This mode can be toggled off via context menu item, but for complex procedures like this one it's really invaluable, since it filters the "noise" and directs your attention to the heaviest parts of the query.

Correlated Metrics

You probably noticed another difference between the two views – one has Duration and CPU data and the other has those annoying gray columns. Here's another view of a full stack with many captured statements:

combined_metrics

When we have the actuals we're able to fill in those other columns, as well as compare the estimated and actual rows affected by each statement. When there's a significant discrepancy the cells are highlighted in orange, and if you hover over them you'll see a tooltip with more details.

Execution Plan History

On the new Query Plans tab you'll see all plans captured during the specified date range:

queryplans-objectview

There are two modes, toggled using the "Show Totals" (sigma) toolbar button. One shows plans grouped by stored procedure (remember, starting in SQL Server 2005, with statement-level compilation there can be multiple plans per procedure), the other is a flat list of all statement plans. Both show average metrics and operation counts. For example, if you want to see the plan with the most "scan" operations over the last 2 days, simply set your range, sort by one of the "Index Scan" columns, then click the "View" button to open the plan:

queryplans-sort

The rows highlighted in red represent recompiles that occurred within the current date range. The other plans may have been used during the current range, but they must have been compiled earlier.

Here's another really cool use case: Using the new "Jump To->Query Plans" context menu item from the dashboard to find problematic plans. For example, if you see a spike in "Bookmark lookups", you can select it then jump to the plans tab to see only plans used during that range, then sort by "Lookup Operations" to find the plan(s) that may be responsible:

queryplans-jumpto

I should mention that "Bookmark lookups", "Forwarded records", and "Backup/Restore Throughput" are new to the dashboard in v6.

Performance Hit?

All of this looks great, but does the additional load on your servers and network outweigh the benefits of collecting and processing this information? Certainly not. As always, we've gone to extreme lengths to ensure that any hit caused by the new plan analysis features is minimal. Intelligent data collection, compression, and other techniques are in play to ensure the hit will be barely noticeable... not nearly enough to outweigh the performance gains you'll reap from using the new features.

In Closing...

Going in, our goal with v6 was to present the most complete and accurate picture of a query possible – what it did and why. Other than a few aspects that technically aren't achievable today, like continuously capturing the actual plans, I think we succeeded. I'd liken it to a 500-piece puzzle, fully assembled but with 50 pieces missing... which IMO is preferable to 450 pieces in the box, which is what we had before.

I'm really excited about this release and hope you are too! Take it for a spin and let us know what you think. Existing customers get it here.

Tuesday, February 8, 2011

SQL Sentry Sponsors Master Immersion, and 5 Reasons Why You Need to Go!

Unless you've been under a rock lately (or just really, really busy ;-) you're probably aware of the MCM program changes, as well as SQLskills new training format, Master Immersion, which takes their former Immersion training a step further, making it even more MCM-focused. Three of us attended the Boston Immersion event last year (me, Brooke Philpott (blog | twitter), and Aaron Bertrand (blog | twitter)), and so we know firsthand the high quality training it provides. I blogged about the experience here. Now what was already great training has become even better, and if you want to apply it to your quest for MCM certification, you can.

While demonstrating our software, particularly Performance Advisor, over the past several years I've made an interesting observation. It's that a person’s depth of knowledge about SQL Server tends to be proportional to their appreciation for the things our software does. This isn’t a knock on junior level SQL Server professionals because everyone starts somewhere. It's merely a statement that if you don't start with an understanding and appreciation of the underlying issues impacting SQL Server performance, then you'll inherently have less appreciation for why we're showing certain information, and the fashion in which it's being shown.

Hey, everyone wants to be understood and appreciated, right? ;-) So this realization is one thing that has led us to increase our support for various "newer" educational events over the past couple of years. From helping to organize a SQLSaturday here in Charlotte, sponsoring other SQLSaturdays, being one of the first U.S.-based companies to support (and attend) SQLBits, to sending 4 people on the inaugural SQLCruise. Now I'm not going to ask you to believe there is no promotional aspect to what we do, of course there is. We have to sell our software to remain viable. But we truly believe in our software – we know it has substantial real world merit, and we know it is quality through and through. We work too hard for it not to be. Just like SQLskills knows their training is the best, because they put everything they are into making it so.

When we heard about the new Master Immersion Events, we couldn't help but get excited. So much so that we approached Paul Randal (blog | twitter) and Kimberly Tripp (blog | twitter) and asked if there was any way we could be involved. One thing led to another, and I'm proud to announce that we are the exclusive sponsor of the first two Master Immersion events! We are providing a full breakfast and lunch for attendees all week, and some really cool swag! We're also sending three more SQL Sentry employees through the training.

As I said in my previous post, it doesn't matter where you lie on the SQL Server skill or experience spectrum, if working with SQL Server represents a significant portion of your job, you will benefit greatly from this training. If this same material was in the hands of lesser instructors this simply would not be the case, but Kimberly and Paul do an incredible job of presenting deep material in an easily digestible, stimulating fashion.

If you can break away for a week in February or March, do it, you won't regret it. In case you need help justifying it with your boss, spouse, kids, whoever, here are 5 reasons you need to go:

 

5. It is the best SQL Server training, from the best instructors in the business. Period.

4. It costs no more than a week of run-of-the-mill SQL Server training at your local IT training center... except that Kimberly Tripp, Paul Randal, and Brent Ozar (blog | twitter) will be teaching the class.

3. It will make you better at your job, which may lead to increased job security, pay, and praise from management and peers.

2. It doesn't last 3 weeks or cost $18,500, and there is no killer pass/fail test at the end of the week.

1. You won't have to pay for breakfast or lunch since SQL Sentry is picking up the tab!

 

Believe it or not there are still a few seats left for the Feb 21st class on Internals and Performance, as well as the March 14 class on Performance Tuning. So if you've been thinking about it, don't wait, get going and register now!

Monday, January 3, 2011

Ensuring Maximum CPU Performance via Group Policy

There has been quite a bit of talk recently about a performance issue I see regularly on Windows Server 2008 and 2008 R2 servers, and that's the use of the default Power Plan setting of "Balanced". Glenn Berry posted about the impact of this setting on CPU performance; Brent Ozar blogged about a real world case where query CPU time actually doubled on new hardware; Paul Randal is conducting a survey to ascertain the extent of the issue (when Paul does a survey like this it's usually a sign that all may not be well in the SQL Server world); and Steve Jones recently covered it in his editorial.

There isn't much more I can add to the discussion of the impact of this setting on CPU and SQL Server performance, however, I will cover how easy it can be to make it a non-issue in your environment. I first ran across this issue and solution last year when deploying new server hardware. Our environment is based around the Intel Modular Server system and Windows Server 2008 R2, and we needed to be able to run hundreds of Hyper-V virtual machines and SQL Servers for QA testing, scalability testing, and production. The hosts themselves, and the majority of the VMs, are Windows Server 2008 (or R2), and so with the default "Balanced" plan we wouldn't be able to take full advantage of the significant CPU horsepower in which we'd invested (six "big blades", each with dual quad-core, HT-capable Intel X5550s).

The solution is simple: Group Policy. Like most other OS settings, the Power Plan can be set to whatever you want via policy, and any existing or new servers that use the policy will inherit this setting. There are a few ways to check and see whether a power plan policy has already been set; which you use will depend on the size of your environment and your level of access to group policy. The most basic approach is simply to inspect the power plan setting for each of your SQL Server machines and see whether the plan is set and/or editable:

power_plan_setting

If you see the "Some settings are managed by your sys admin" message, it's read-only and set to High performance, then policy is already set and you are good to go.

Alternatively, if you have access to group policy, open up the Group Policy Management Console in Server Manager, navigate down to Group Policy Objects, then select the Settings tab for each policy. Expand down to Administrative Templates > System/Power Management and see if "High Performance" has been set for Active Power Plan as shown below:

domain_policy_settings

If you have multiple policies and it's not clear which your SQL Server machines may be using, you can use either the Group Policy Results or Group Policy Modeling wizards to view the Resultant Set of Policy for one or more servers. Once the results have been generated, check the Settings tab as above.

If it turns out that no policy has been set, you will need to either edit an existing policy or create a new policy that covers your SQL Server machines and apply the settings shown below:

edit_power_plan_policy

If you need more details on this process, see Edit Administrative Template Policy Settings. If you don't have access to group policy at all, I'm sure you will tell whoever needs to know about this and get it changed... and hopefully they aren't too taken by the modest power savings of "Balanced" to accommodate. If they are, tell them to save the money on less powerful CPUs next time since that's the net effect ;-)

For the life of me I don't understand why a server OS would ship with this default (well, I do know why, but I certainly don't agree). I don't expect it will be changing anytime soon though, so hopefully a little group policy will help you get the most out of your CPU resources in the meantime.