Wednesday, October 20, 2010

SQL Sentry Plan Explorer Beta

Recently at SQLBits York I demonstrated the new SQL Sentry Plan Explorer (formerly known as the "Plan Viewer") and officially announced its upcoming release.  We are now pleased to make the beta version of the tool available for download!

Background

We had originally envisioned query analysis features only as part of the fully licensed version of SQL Sentry Performance Advisor.  The goal was to build upon what we all have today with the graphical plan view in SSMS, and in a big way.  Like many of you, I've been using this feature for years, going back to Enterprise Manager, and so had lots of ideas going in about what we could do to make plan analysis more efficient.

After a few significant reworks over many months, things really started coming together, and it was doing most everything I'd wanted and much more.  Working with our lead developer, Brooke Philpott (blog | twitter), one idea had led to another idea, which led to another… it was a gradual, iterative process.  I'm thankful that we embrace a style of agile development here, because this is one project that I believe would have been impossible to design to completion in advance.  It was only through using it on a daily basis to solve real world query issues that the tool was able to evolve into what it is today.

Anyway, somewhere along the way a couple of things became clear to us:

  • These features are so cool and useful (IOHO) that we wanted to share the tool with anyone that wants it, so we are making it available to the SQL Server community for FREE!
  • Much of the code was client-side, so it would be fairly easy to break out into a standalone tool without any dependencies on our collection service or database (one of each is required for every full SQL Sentry installation).

Plan Explorer Features

I'll go ahead and run through some of the key features of the Plan Explorer, and in the process I'll compare and contrast with SSMS.  I am not trying to bash SSMS (promise! ;-), I'm simply trying to illustrate the reasoning behind the design of the tool for longtime users of SSMS/Enterprise Manager.

When you launch the tool for the first time, you'll notice is that the UI is broken into 3 major sections:

  • Statements Tree
  • General Query & Plan Info
  • Plan Details

They're all empty until you load a plan.  There are several ways to load a plan:

  • In SSMS, right-click a graphical plan and select “Show Execution Plan XML”, then copy and paste the plan XML into the Plan Explorer.  It doesn’t matter which pane is active, the clipboard handler is global and will auto-detect the plan XML.
  • Save an execution plan from SSMS to a .sqlplan file, then open the file using the Open toolbar button, File->Open menu, or Ctrl + O.  Raw plan XML files and .QueryAnalysis files (our own proprietary format) are also supported.
  • Right-click an existing .sqlplan file in Windows Explorer and select “Open with -> SQL Sentry Plan Explorer”.
  • Drag-and-drop a plan file onto the application.
  • Retrieve the estimated plan from TSQL. (See General Query & Plan Info section below).

Statements Tree Section

Let's start with how SSMS works.  A batch or stored procedure can of course consist of one or more statements.  With stored procedure estimated plans in SSMS, all individual statement plans are joined together at the top level:

big_plan_estimated

When it's an actual plan for a batch or stored proc, or an estimated plan for a batch, each statement is broken out into a separate section, with embedded cost and other information in the section header:

big_plan_actual

When there is only one or a few statements, SSMS can do Ok with this approach.  However, if it's a large plan with tens or hundreds of statements and conditional logic, looping, etc., it just doesn't scale.  Some plans are so big and complex that it is literally impossible to find the heaviest statements and operations, and some can't even be loaded by SSMS!

We wanted to change all that, and so the Statements Tree represents one of the biggest design differences between the Plan Explorer and SSMS.  Its function is critical when navigating multi-statement plans.  What it shows is a logical breakdown of the entire plan tree, including all control structures, nested procedure calls, and estimated and/or actual metrics for each statement, as well as operation counts by statement:

statements_tree

If you click on any statement row in the grid, you'll be shown the associated plan for that statement only:

statements_tree_stmt_sel

Likewise, if you click on any control structure (IF or WHILE) or EXEC <procname> higher up the tree, you'll see a filtered view of the plan starting from that point:

statements_tree_while_sel

As you've probably guessed by now, you can of course sort the Statements Tree by clicking on any column header, and voila!, the highest cost trees and statements will immediately bubble up to the top:

statements_tree_sorted

In the shot above I've sorted by Estimated Cost (all numeric columns sort DESC on first click).  I can see that one of the subtrees has a nested DELETE that accounts for 27.2% of the total procedure plan cost, and I've have selected that statement row to view its plan.  If you enlarge the image you'll see that a clustered index seek (highlighted in red) accounts for 39.6% of the statement plan cost.  So with a couple of clicks we've quickly ascertained the heaviest cost statement and operation in this rather large plan.

Whether there are 10 or 1,000 statements in the master plan doesn't really matter, it's trivial to find the most expensive areas so you can focus your attention on those.  Big plans that simply weren't usable before can now be managed with ease.

General Query & Plan Info Section

The primary tab you will make use of in this section is "Text Data", which holds the color-coded TSQL batch or procedure definition:

query_info

If you loaded an existing execution plan, the TSQL will be auto-generated.  You can also type TSQL, copy/paste it in, or open a .SQL file, then click the "Retrieve Estimated Plan" button on the toolbar to get the estimated plan.

NOTE: Executing TSQL to retrieve the actual plan is not yet supported, although opening an existing actual plan is fully supported.

You'll notice that, like plans, the TSQL statements are synchronized with the Statements Tree, meaning that if you select a row on the tree view it will auto-select the associated TSQL statement and its execution plan, and vice versa.  See the shots above for examples.

Plan Details Section

This section contains 4 tabs, each of which serves a different purpose:

  • Plan Diagram
  • Plan Tree
  • Top Operations
  • Query Columns

If you select an operator node or row on one of the tabs, it will be auto-selected on all others, so you can switch back and forth between the different views of the plan without losing your place.

Plan Diagram

This is the view that will certainly be most familiar to everyone… although you should immediately notice some differences.  First, there is color!  That's right, the cost information is color-scaled so you can quickly see which operations are heaviest, and Lookups and Scans are also highlighted (if Rows > 100):

plan_diagram_color_scaling

Pretty cool, eh?  But wait, there's more! ;-)

If you right-click the diagram, context menus provide access to a variety of other functions. (bolded because this can be easy to miss!)

Using the context menus, in addition to scaling by total cost (CPU + I/O), you can also use CPU or I/O separately:

plan_diagram_costs_by_cpu

This can be very helpful if your hardware is more I/O constrained than CPU constrained, or vice versa.

To see which subtrees are most expensive, select the "Cumulative Costs" context item:

plan_diagram_cumulative_costs

I won't go into detail on all of the other plan diagram enhancements in this already lengthy post, but here is a list of some of them:

  • Optimized plan node labels prevent truncation of object names in most cases, with menu option to disable truncation completely so full object names are always visible.
  • Optimized tooltips prevent wrapping of object names, column names and predicates.
  • Costs are always displayed above nodes for maximum readability.
  • Costs are shown to the first decimal place.
  • Connector line width can be scaled by either Rows or Data Size.
  • Rows and Data Size labels are displayed above connector lines.
  • Connector lines for bookmark (key|rid) lookups, table spools, and function calls show the true estimated rows or data size, and line width is scaled accordingly. (SSMS always shows the "per execution" rows, CPU and I/O costs for these operators in estimated plans, which can appear to dramatically understate the true costs. For example, if estimated rows=10 and estimated executions=1000 for a key lookup, the line is scaled to 10 rows, not 10,000! What makes this more confusing is that for actual plans SSMS does show the true rows and costs. So you really have to pay attention to the type of plan you're looking at in SSMS – if you're conditioned to always look for the operators with the fattest lines, you can completely miss the highest cost operations.)
  • The mouse scroll wheel can be used for both scrolling up down, and zooming in/out (by holding Ctrl) !

Last but certainly not least, you may notice that some plans seem, well, "different".  This is likely due to the optimized layout algorithm which is able to render many plans using much less real estate than SSMS, meaning you are seeing more of the plan without having to scroll and zoom.  It's easiest to demonstrate this with a couple of pictures.  The shots below are of the same plan:

SSMS version:

plan_tall_ssms

Plan Explorer version:

plan_tall_pe

Plan Tree

This view is just what it sounds like, a tree representation of the plan which shows all operations and associated metrics.  On the surface it looks similar to a showplan_all, however it's different in several respects:

  • You can expand and collapse sections, and sort within levels.
  • Additional metrics are shown, such as "Estimated Data Size", as well as "Actual Rows" and "Actual Data Size" for actual plans, side-by-side with the estimates.  There are many other plan metrics available by right-clicking any column header and selecting "Column Chooser".
  • Significant differences between estimates and actuals are highlighted.
  • Possibly problematic operations like scans and bookmark lookups are highlighted.
Top Operations

Another one that is as it sounds, a simple list of all plan operations, sorted DESC by total estimated cost by default.  You can of course sort by any of the other columns, as well as group by operation type, object, etc.  The same columns and highlighting as on the Plan Tree tab apply here.

Query Columns

This is one of the most useful views for many plans.  What is shows is a list of all columns accessed by the query, for whatever reason (sarg, join, output list, sort), along with the table, operation, and index used.  How is it useful?

Take the case of a bookmark lookup, where you have some columns that aren't covered by the index chosen by the optimizer.  The traditional ways of figuring out which columns aren't covered is by hovering over multiple nodes on the graphical plan to determine the index used and columns accessed, clicking through the Properties pane for those nodes, or looking through a showplan_all, all of which can be tedious and error prone.

Instead, when you look at the Query Columns list, you'll see the columns for related operations grouped together (indicated by a thick separator bar), with the index used and the columns in the lookup highlighted in orange:

columns_lookup

From here it's often a simple matter to make the appropriate index modifications to eliminate the lookup.  In the case above, I can see that index [IDX_ClientID] isn't covering columns [LEASE_ID] and [DOCUMENT_SIZE], so I may want to add them as included columns to the existing index, or create a new covering index.

NOTE: In the full version of SQL Sentry v6, you can directly open the index properties from this and all other plan tabs using context menus, or by double-clicking nodes on the graphical plan.  This is something you used to be able to do in Enterprise Manager but was dropped in SSMS, much to my dismay.  We wanted to make this available in the Plan Explorer as well since it can be a huge timesaver, but currently the tool has no integration with SSMS in order to keep the install simple.  Rest assured, if/when this changes we will add this capability to the tool.

Beta Downloads

You MUST HAVE .NET 4.0 PRE-INSTALLED to install and use the Plan Explorer. Download it here.

SQL Sentry Plan Explorer x64

SQL Sentry Plan Explorer x86

SQL Sentry Plan Explorer Documentation

Plan Explorer Change List

General discussion and support for the free tool will be provided through the SQL Sentry Plan Explorer forum

This beta build has proven very solid, and handles all plans we've seen thus far, regardless of size or complexity.  However, as it is a beta, you may find a quirk or two.  If you run into any problems, or have any ideas, questions, etc., email me at greg at sqlsentry.net.  For questions, please be sure to check the forum and the documentation first.

Special thanks the first beta testers who really hammered the tool and helped us get it where it is today!  In particular, Jonathan Kehayias (blog | twitter), Mladen Prajdic (blog | twitter), Christian Bolton (blog | twitter), Brent Ozar (blog | twitter), Brian Kelley (blog | twitter), and Aaron Bertrand (blog | twitter) provided some great real-world plans and feedback.

There are many more features and use cases I'll be covering in the coming weeks, so stay tuned...

I hope you enjoy using it as much as we've enjoyed building it!

Monday, October 18, 2010

Aaron Bertrand Joins the SQL Sentry Team

I'm pleased to announce that highly regarded SQL Server MVP, Aaron Bertrand (blog | twitter), will be joining the SQL Sentry team!  His first day on the job will be next Monday, October 25th.  We are all extremely excited to have him on board, for a variety of reasons.  On the surface, it may appear to be an attempt to "keep up with the Jones'" in the sense of us bringing on a high profile community advocate, but this is actually much more than that.  Aside from Aaron's vast knowledge and experience with SQL Server, having known and worked with him for several years now, he is a person that just gets things done.  Sure, he has phenomenal communication skills and will undoubtedly continue blogging and speaking about SQL Server, but for us the move is just as operational in nature.

I mean, who better for a SQL Server ISV like us to have in house than the person who has far and away provided the most feedback to Microsoft on SQL Server itself? (see Connect stats lower right)

Aaron was a long time user of SQL Sentry software in his previous job, so he already knows our software inside and out and has provided lots of great feedback over the years.  The SQL Server environment at OTOLabs is unique – high volume OLTP systems with many hundreds of databases – it definitely stressed our software early on and pushed us to come up with ways of dealing with that kind of scale.

If you know Aaron, you know that he says what he believes and doesn't pull any punches.  He's exactly the kind of person we want here, someone with unquestioned integrity who isn't afraid to tell it like it is, and who will continue to look at our software with a critical eye.  I have no doubt that Aaron will have a significant impact on our company and our software in many different ways, some we can anticipate, but many we probably can't.

I could go on, but from here I'll defer to Aaron's post about the move.

Welcome aboard, Aaron!

Wednesday, June 2, 2010

SQLCruise Contest Winner

As you may have heard, as part of our sponsorship of the inaugural SQLCruise, we decided to run a contest to provide one lucky winner a free ticket for the cruise and training.  When we started the contest we expected that most of the submissions would be in the form of tweets or short blog posts, and we did receive many of those which were quite good.  We were surprised and impressed, however, by the effort, ingenuity and humor that went into four submissions in particular.  So much so that we had a tremendously hard time selecting a winner… how could we possibly choose between:

  • Jorge Segarra (blog | twitter) – Here's a guy who, and I can't think of any other way to put it, was born to go on this cruise.  Not only was his video hilarious, it was replete with gratuitous SQL Sentry references, and was actually submitted before the contest officially started.  Talk about enthusiasm!  It was a shot across the SQLCruise bow – "If you want to go on the cruise, you're gonna have to go through me!!"  It also served to ensure we wouldn't spend any time reviewing half-hearted submissions (thanks man! ;-)  How could we in good faith not send him, and deprive the other SQLCruisers of four days of laughs and learning with the Chicken?
  • Matthew Velic (blog | twitter) – A guy who's new to SQL Server and who desperately wants training, and who would apply that training daily for a very worthy cause, fighting Pediatric AIDS.  He clearly spent a lot of time on his video as well, as the production values were through the roof.  How could we not send Matthew?
  • Rebecca Mitchell (blog | twitter) – A DBA in Colorado who not only has the technical skills required to successfully manage lots of SQL Servers, but the creativity to write a poem about SQL Server (yes, that's right, and probably the longest poem I've ever seen, let alone read), with cleverly infused references to SQL Sentry's deadlock analysis and every DBA’s favorite breakfast meat to boot.  How the same person can have both abilities is frankly beyond my comprehension.  How could we not send this rare talent on the cruise?
  • Erin Stellato (twitter) – Erin lives and works in Cleveland.  I mean, how could we not send her? ;-)  Well, her video also happened to be extremely well done, with Hollywood quality acting, score, humor and special effects.  Seriously, she too needed to go!

So after going round and round for hours, we couldn't decide, which left us with a couple of options: either put it out for a community vote (which we may actually do next time), or just bag the whole idea.  Unfortunately we had a hard deadline from Carnival so there was no time to put it out to vote.  Punting on the contest was out too as we certainly couldn't punish everyone for being "too good"… that's just not how we operate here.

After some more agonizing, another idea came to us.  It was extreme… but clearly it was the best option, and a quick review of the official contest rules confirmed that it was indeed "legal":  Jorge, Matthew, Rebecca, Erin, pack your bags, you are all going on SQLCruise!!! ;-)

We have no doubt you will all make the most of this awesome opportunity.  Have a great time, and be sure to spread the word!

Thanks to everyone who participated, I do wish we could send all of you… maybe someday.  Meantime, I hear there are still a few seats left!

Monday, April 5, 2010

Getting Immersed in SQL Server

I had the pleasure of spending the past week at the SQL Server Immersion event in Boston.  I was joined by our lead developer, Brooke Philpott (blog | twitter).  This was an intensive course on SQL Server internals, and much more, put on by the esteemed Paul Randal (blog | twitter) and Kimberly Tripp (blog | twitter), and hosted by Adam Machanic (blog | twitter).  I knew going in that Paul and Kimberly were authorities in their respective areas – Paul in the storage engine and related areas like backup/recovery, and Kimberly in query tuning and index optimization.  What I didn’t expect was the sheer breadth and depth of their knowledge, and that they also share quite a bit of knowledge in common.  They are of course married, so I expect this leads to a lot of information-sharing at the dinner table that wouldn’t occur otherwise.

The material covered is far too extensive to get into here, but I will say that every day, and just about every hour of every day, was a challenge… and I have worked with SQL Server for almost 15 years.  This was not basic stuff.   A couple of times in the past I’ve made the mistake of paying for training where the outline looks great, but you get in class only to find that it’s far too basic and high level, and therefore not worth the time away to be there.  This was not that class.  Even though GAMS, IAMS, PFS pages, headers, slots and associated bits and bytes could be considered “basic” in the sense that they are the building blocks of SQL Server, they are certainly not simple.  The amazing thing is that this class really costs no more than many other lower quality week-long SQL Server training courses, when in reality it should probably cost twice as much.  It is an absolute bargain.

When it comes to the instructors themselves, Kimberly and Paul simply have no equal.  Not only are they both ultimate authorities, but they have endless enthusiasm about the material, and spot on delivery.  If either ever got tired they never showed it, even after going all day and all week.  We witnessed countless demos over the course of the week, some extremely involved, multi-step processes, and I can’t recall a one that didn’t go the way it was supposed to.  When has that ever happened?  It certainly hasn’t for me.  Seems as if whenever I demonstrate something in public there is invariably some glitch or another.  These are not beginner presenters, and they put an extreme amount of preparation and attention to detail into everything that they do.  Completely, utterly professional.

You might think that with this extreme level of skill comes extreme levels of egotism and lack of patience.  Nothing could be further from the truth.  There were no stupid questions the entire week – well, there were of course, but the response from Paul or Kimberly usually started with “Excellent question…” even when the question wasn’t so excellent.  They simply know how to teach, and are approachable, humble, and patient.

Anyone, from the accidental DBA (there were actually several in our class) to the experienced MVP, will gain significantly from this training.  Not everyone has 3 weeks and $20K to spend on MCM training, the first week of which is Paul and Kimberly on similar material.  Is Immersion the same as the first week of MCM?  According to Kimberly it is not quite as deep, which surprised me… I mean, how much deeper can you go? ;-) There is also no test at the end of the week at Immersion, so probably a lot less pressure as a result.  However, if you are thinking about going after the MCM and want to stick some toes in the water first, I suspect Immersion will give you a good idea of what to expect, as well as serve as a serious primer for much of the first week.

What I think it comes down to is that Paul and Kimberly truly enjoy what they do.  They have a passion for sharing the unique knowledge they have accumulated over the years while working at Microsoft and consulting with some of the largest SQL Server enterprises in the world, and it shows.  The SQL Server community should be thankful that they are so willing to share this knowledge, and that they are able to do it so well.  I count myself privileged to have been able to spend a week with them, and learn many things that will undoubtedly help me and my company, today and years into the future.

This was the only public Immersion event they will hold in the U.S. this year.  Keep a close eye on the SQLskills.com upcoming events page for announcements about future events.  I believe there is one in Dublin later this year, but I'm not seeing it on the site, so it may be sold out.

Tuesday, March 9, 2010

SQLSaturday Charlotte: Lessons Learned

We just wrapped up what have been the busiest, most stress-filled, yet rewarding couple of weeks in recent memory.  Overall the feedback we've received about the SQLSaturday Charlotte event has been amazing, thanks to all for the kind words that have been tweeted and blogged thus far!  I do think for the most part the event went off about as well as it could have despite our lack of experience planning technical conferences.  However, we definitely learned a lot that we'll be able to apply should we ever do it again.  I had many questions when I first started working on this event, so I wanted to get my thoughts down while still fresh with hopes that it might be helpful for future SQLSaturday planners.

Scheduling Concerns

One of my primary tasks was trying to coordinate the flood of speakers and sessions we had for the event, a flood created by none other than CSSUG president and SQL Sentry partner channel manager Peter Shire, whose skills at promoting can only be described as P.T. Barnum-like.  (I thought for sure that P.T.'s first name must have been "Peter", but turns out it was "Phineas")  Peter did a phenomenal job securing the support of an amazing list of 35 speakers, and getting the word out to the SQL Server community to the tune of about 400 registered attendees, 280 of whom actually made it to the event.

The starting point for the schedule planning was the facility.  The Microsoft campus here is a fantastic facility in a great location close to the airport, downtown, etc., however, it wasn't necessarily designed to run an event like this.  The initial room breakdown was:

Building AP1:
     Room 1: 200 seats
     Rooms 2,3,4,5: 20 seats each
     Cafeteria: 220 seats

Building AP2:
     Room 6,7,8: 50 seats each

The two buildings are about a 3-minute walk apart.  Because of the relative size difference between the largest and smallest rooms, my biggest fears were putting a hot session in one of the smaller training rooms causing massive overflow, and putting a not-so-hot one of the big rooms… either of which can exacerbate the other.  The original layout had us using the 4 training rooms in AP1, the same building with the cafeteria and the 200 seat room, however, about a week before the event we got word from Microsoft that we'd been bumped by some internal training and had to use 4 training rooms in AP2 instead.  So that left us with:

Building AP1:
     Room 1: 200 seats
     Cafeteria: 220 seats

Building AP2:
     Rooms 2,3,4,5: 20 seats
     Room 6,7,8: 50 seats

The AP2 training rooms are an exact mirror of those in AP1, however it seemed to make for a very lopsided layout since it left us with a single large room in one building, and 8 rooms in the other building.  This caused concern since we'd been counting on a lot of interplay between the 80-100 people in the training rooms and the 200-seat room where I'd put all of the "hottest" sessions.  We didn't understand why the other group couldn't just use the AP2 training rooms, but eventually found out – they were doing lab sessions, and apparently the workstations in AP2 aren't the same class as those in AP1.  We weren't using the workstations so it didn't really matter to us, but apparently it did to them ;)

As it turned out, it probably did end up causing lower attendance for the featured sessions and some overflow in some of the smaller rooms, mainly because some seemed reluctant to leave AP2 when there were so many other good sessions going on next right there.  But did it even come close to ruining the conference?  Certainly not.

Another issue with the change was that we didn't get the new room names in time to put them on the signage, so we ended up using numbered rooms that were offset from the tracks by one – Track 2 was in Training 1, Track 3 in Training 2, etc.  This was very confusing, even to me.  At one point I even directed Wayne Snyder into the wrong room for his session.  Sorry Wayne!

The first thing I did was try and categorize the 80+ submitted sessions so I could get a better handle on the actual areas of interest we had to work with.  What I was really looking for at that point was whether or not we could have any focused tracks.  I ended up with 4 major categories and 24 subcategories:

Category – Subcategory Session Count
Admin - Broker 2
Admin - Clustering 2
Admin - DB Design 2
Admin - Hardware 2
Admin - Maintenance 1
Admin - Memory 1
Admin - Performance 11
Admin - PowerShell 5
Admin - Replication 3
Admin - Troubleshooting 2
Admin - Virtualization 2
BI - Dev 5
BI - SSAS 1
BI - SSIS 5
BI - SSRS 4
Dev - DB Design 2
Dev - Performance 4
Dev - Tools 2
Dev - TSQL 2
Dev - XML 1
General - Career 2
General - General 4
General - Social 3
General - Tools 2

It was apparent that although for a couple like Performance and BI, focused tracks might work, but for others we only had one or two sessions so trying to do clean, interest-based tracks would be fruitless.  I should mention that although for some sessions the categorization was easy, for others it was not clean at all, so I just took my best stab at it.  Even so, there were a few others that seemed to defy categorization, like Kevin Kline's "Top 10 Mistakes on SQL Server" and Sergey Pustovit's "SQL Server 2008/R2 Overview", and those ended up in "General – General" ;)

I was also hoping that by having these categories, I'd be able to send them out to a few people that had planned other recent SQLSaturdays and ask them to order the list by most to least popular at their event.  I got some extremely valuable feedback from those planners, but since they didn't have accurate attendance numbers for all topics we didn't get any rankings back.  Since this info could have been invaluable to me and helped to avoid some of the session/room mismatches we experienced, I'm posting all of our attendance numbers here.  Bear in mind they are only estimates and not exact.  If you see anything that looks way off, let me know.

Next, to my sessions spreadsheet I added a few columns, and used them to come up with a universal rating measure to help decide where sessions should go.  Here they are:

Hot Topic – Yes/No – Is the topic particularly popular right now, and are there a lot of people talking about it in the media?  For example, I don't think that anyone would argue that PowerPivot is a hot topic right now.  If Yes, 2 points were added to the rating.
Premiere Speaker – Yes/No – Certain speakers have significant name recognition, and will draw attendees regardless of the topic.  If Yes, 3 points were added to the rating.
Appeal – This is a 1-10 point value added to the rating for how broad of an audience would possibly be interested in the topic, or the size of the base population.  For example, PowerPivot may indeed be a "Hot Topic", but if the potential base of users is small the appeal number will be lower.

I used the above to calculate a simple rating value, and the higher the number the bigger the room.  Certainly not very scientific, but probably good enough for our purposes.  Ultimately there are so many other variables involved, IMO if you try to be any more exacting you run the risk of skewing things too far one way or another.  For example, if I'd actually tried to estimate the number of "BI" users or "Admin" users that would show in lieu of the general "Appeal" number, and my estimates for those populations were off, it could cause trouble.  Now, keep in mind that this approach is really most applicable to a first event, where you really have no idea who is going to show up.  Next time we'll have a much better idea of the different populations here in our region and will use actual numbers as a basis for predicting session popularity.

Another idea that came up was surveying people in advance to see which sessions or categories they are most interested in.  If we had had more time, and/or if there was a capability to do this built into the sqlsaturday.com website, we likely would have made use of it.  As it was, there was just too much else going on with the planning to even think about attempting something like that.

Below I'll discuss a few of the bigger session surprises in a bit more detail.

Hotter-Than-Expected
TSQL
Specifically Mike Walsh's "You Can Improve Your Own SQL Code" and Geoff Hiten's "Bad SQL" both ended up in a smaller 20-seat room, and both could have easily been in a 50-seat room.

SSRS/SSIS
The 25-seat room we had set aside for these BI topics had 46 people crammed into it at one point.  I didn't put this track into one of the larger rooms mainly because I just wasn't quite convinced it would compete with Performance, Virtualization, PowerShell, and other "hot" topics I had in the 50-person rooms.  I also had a lot of people that know telling me that BI is always a smaller crowd at these things, so I went with our only "in between" space, the 25-seat conference room that we knew could handle up to 35 with some standing.  Good call for not putting it in a 20-seat room I guess, but bad call for not giving it a 50-seater.  If you were one of those that had to act like a sardine for an hour, my sincerest apologies, we'll give you more room next time.

Not-So-Hot
Virtualization

This one shocked me.  With as much talk (and uncertainty) as there is right now about this topic and SQL Server, I was worried that a 50-seat room would overflow for Aaron Nelson's and Denny Cherry's sessions.  I was way off.  They would have both been fine in one of the 20-seat rooms.

SSDs (Solid State Drives)
Again, I was shocked when Kendal Van Dyke's session wasn't packed.  Great, well-known speaker with what I thought would be a hot topic, but it would have been fine in a small room.

Data Compression
The two sessions we had on this topic were the most lightly attended of all, only 4-5 people in each.  This was in a smaller room already, but I had expected to see more interest since it's new to 2008 and disk space is always a concern, however the Enterprise-only caveat or the other concurrent sessions may have been a big factor. 

Too Much of a Good Thing?

The "good thing" in this case being PowerShell.  I guesstimated that this would be a hot topic as it has only seemed to gather steam with DBAs over the past year or so, and with remoting and other cool features in Windows Server 2008 R2 and PowerShell 2.0 there's a lot of new stuff to learn about as well.  Turns out I was right… kind of.  Aaron Nelson's first two sessions in a 50-seat room were pretty much full, but Allen White's two sessions immediately following were, well, not so much.  Again, it certainly wasn't Allen as a speaker since he's highly regarded and authoritative on this topic.  However, I did hear from more than one person that two PowerShell sessions probably would have been sufficient for this crowd.  I think Allen was a little disappointed, but being a glass is half full kind of guy, he said it was good practice for TechEd.  Thanks Allen!

Premature Raffling

One of our biggest mix-ups of the day was the fact that we started the vendor giveaways right after the last session was supposed to end at 5pm.  We still had a session going in the other building that went a bit long, so those attendees and the speaker missed out on the raffles.  To make matters worse, that particular speaker would have won an iTouch had he been present!  Sincere apologies here.  We felt bad, and he did seem a bit upset -- I should have reminded him that he won a WinMo device at our very first SQL Sentry giveaway several years ago, although I'm not sure it would have helped at that point ;)

Bottom line, ensure all sessions have finished and folks have had time to get to the raffle before you start.  Apologies to anyone else that missed out on this – if you did, send us an email and we'll send you either a SQL Sentry T-shirt, USB 2.0 hub, or iPhone cover.

Where'd the Speaker Go?

We had a particularly well-attended session on IO Performance… only problem is we had no speaker.  I made a quick call over the two-way to registration to see if anyone had seen this particular speaker.  They had not.  I let the attendees know this, and they quickly filed out to look for another session, some with looks of disappointment.  As it turns out, the speaker had cancelled a couple of weeks earlier, but apparently it got lost in the hundreds of other emails that were flying around about the event.  Mistakes like this happen and stuff just gets missed sometimes… which is why there should have been a mechanism in place to catch this.  Had we done a speaker roll check earlier in the day we would have caught it, and probably could have gotten another speaker to fill in, or if not, at least made attendees aware of the cancellation in advance.  We just assumed all speakers were there – not safe with 35 speakers.

Bear in mind that some speakers may show up later if their first session isn't until later in the day, so you may not want to check roll first thing in the a.m., but either way it's generally an easy matter to at least confirm whether or not the speaker was in town.  In this case they were not.

Plan for Cancellations

Jorge Segarra (aka, @SQLChicken), one of the organizers for the Tampa SQLSaturdays told me to expect between 5-7 cancellations.  I thought that sounded high – how can that many speakers commit for something like this, then just cancel?!?  Well, he was right.  There were many different reasons we heard, but what can you do.  Fortunately we had enough speakers with enough sessions to make up for it.  I think the 5-7 backups is a good number to expect for an event of this size, probably less for smaller events.  If you saw a session on the schedule that seemed a bit out of place for the room or other sessions around it, it was probably a backfill from a cancellation.

A Final Word

It's important to note that a lot of these scheduling dilemmas were due to the broad range of room sizes we had here.  If your event has 300 people and 4-5 rooms that hold 75 people each you've got a lot more margin for error, and you're probably not going to need to go to this level of detail.  Anyway, I hope some future SQLSaturday planners will find something here that's useful.  If any of you planners have any questions about what we did or why, I'll be glad to help however I can.  Please post them here or shoot me a DM or email.

A Huge Thanks to All

I wanted to take this opportunity to thank all of those that helped with the planning and organization for this great event.  First, I wanted to thank Bill Walker, head of the SQL Server CSS team here in Charlotte, and his counterpart Lynne Moore for their immediate and unwavering support.  From day one, they threw all of their considerable resources behind the event and without them it just wouldn't have happened.  All of us were truly amazed at the forces they were able to mobilize so quickly to just get it done.  I also wanted to thank Sergey Pustovit, Chris Skorlinski and Evan Basalik from the SQL Server CSS team for putting on some great sessions, and all of the other CSS members who volunteered their time whose names I don't know – they ran the "SQL Clinic" booth and helped out in innumerable other ways.

Next I wanted to thank all of the people here at SQL Sentry who volunteered their time, in no particular order -- Peter Shire, Karen Gonzalez, Nick Harshbarger, Brooke Philpott, Jason Hall, Natalie Wieland, Jason Ackerman, Steve Wright, and also Ken Teeter for the great photography.

Again in no certain order, sincere thanks go out to Jorge Segarra, Aaron Bertrand, Andy Kelly, Grant Fritchey, Tim Ford, John Welch, Rafael Salas, Geoff Hiten and of course Andy Warren for allowing me to pick their brains during the planning process.  Not sure what I would have done without their input and words of wisdom.

And last but certainly not least, thanks to all of the speakers who came from near and far to be here.  We are greatly appreciative that you decided to take the time to provide such a fantastic educational opportunity for our SQL Server community here in the Carolinas.

Wednesday, February 24, 2010

SQL Sentry v5.5 Need to Know

SQL Sentry v5.5 Release Candidate 2 was just published to the website.  Although it's a dot release, don't be fooled, v5.5 stands to be a watershed release for us in more ways than one.  Not only does it represent the culmination of many months of hard work by the team, but it contains what have easily been the most heavily requested enterprise features since the original release of Performance Advisor for SQL Server in 2008:  the Global Server Status View and Performance Reporting.  There are also many very cool secondary features such as Automatic Disk Misalignment Detection, Mount Point Support, as well as a variety of performance optimizations.  Read on for more details…

Global Server Status View

Finally, you can view key performance metrics for more than one server at a time from the SQL Sentry Console!  You simply right-click the Devices root node, or any device group node, and select the "Open Performance Advisor Overview" context item:

open_pa_overview

There are a couple of different ways to look at the data.  The default is Windows metrics at the top level:

overview_windows

From there you drill in to see metrics for each SQL Server or SSAS instance on the server.  Just like on the PA dashboard, we also show how much Windows resources are being consumed by the SQL Server and SSAS processes for network, CPU, and memory:

overview_windows_instance

The level below that shows the actual instance metrics, which will of course change depending on whether you’re looking at SQL Server or SSAS.

overview_windows_instance_expanded

If you only want to see SQL Server and/or SSAS metrics without the Windows, simply uncheck Windows in the dropdown at top.  This gives you a very clean look across all SQL/SSAS instances in your environment:

overview_instances

As shown in the shots above, if a performance metric exceeds commonly accepted preset thresholds (such as Page life expectancy < 300 secs), the cell and associated server name will turn red to highlight it.  And to answer the question you are about to ask… no, there is currently no way to customize this, but this is coming ;-)  For this release the primary focus was getting the alarming framework in place with some basic presets.

However, to hold you over until, you can set custom filters for any column or combination of columns.  For example, if you want to see only servers with "Transactions/sec > 100 OR Blocks > 0 OR Deadlocks > 0", simply click the column header and set a filter:

overview_filter_set

As the view auto-refreshes and metrics change you’ll see server rows pop in and out.  This way you can stay focused on the servers requiring your immediate attention, based on limits you’ve defined.  Here's what a filter looks like once set:

overview_filter_multi

Note there are now only two visible servers, the rest are hidden by the filter.

When you combine this with the ability to adjust the interval displayed, filtering becomes an even more powerful feature:

overview_intervals

Let me elaborate.  By default the grid shows the last raw sample value collected, but since many metrics are sampled every 10 seconds there may be temporary spikes that you want to ignore.  You may not want the filtered server list changing every refresh.  If you simply increase the interval to anything other than "Last Sample" -- say "10 Minutes" -- the metrics displayed will be last calculated average over that interval.  This can effectively smooth out many spikes, and thus prevent a lot of volatility with the server list.

Even without the use of filters, this is a great feature for getting quick snapshot of how all servers have been performing over different ranges, up to the past 3 days.

One last item on the global view -- if you don’t care for the default dark skin, you can change to a standard light grid color using a new user preference:

overview_light_skin

Performance Reporting

There has always been an enormous amount of valuable performance data contained in the SQLSentry database, but aside from the PA dashboard there really wasn’t an easy way to get at it.  With the new SSRS-based performance reporting, now there is!  To see all of the new reports, select "Reports –> Performance Advisor" from the menu bar:

pa_reporting

Here are a couple of quick shots of the "Waits Analysis" and "File Utilization by Database" reports:

reports_wait_stats  reports_file_database

Not only do we have a great list of the most commonly requested reports, but we have two ways to generate completely customized reports – the Performance Counter History report, which lets you select any combination of servers, counters and instances:

reporting_params_counter_hist

reporting_custom

…as well as the Performance Counter Date Range Comparison report, which lets you do pretty much the same thing, but using two different date ranges.

One of the coolest aspects of our custom reports is that you’re not limited to only standard Windows perf counters – you can report on any of the performance data we collect, including data from DMV’s, WMI, and even internally calculated metrics like Waits by Category!  And just like the PA dashboard, you can select any date range, from 3 minutes to 3 years… and it won't kill your server in the process.  A lot of effort has been put into ensuring that the reports generate quickly and will minimally impact the performance of the SQLSentry database.

You can save any of the custom reports by clicking "Save" on the parameter selection window, as well as deploy the reports to any SSRS instance using the "Deploy Reports" menu item.

Disk Activity View Enhancements

We’ve introduced two significant enhancements to our patent-pending Disk Activity view:  Auto-Misalignment Detection and Mount Point support.  Which you will like best will depend a lot on your environment.

Automatic Disk Misalignment Detection

As you probably know, misalignment is no longer an issue with new Windows Server 2008 installs since it uses a default offset of 1024KB.  However, what we have found is that many servers out there are either still running Windows Server 2003, or were upgraded from Server 2003 to 2008.  As a result this is still a big issue in many environments, and we’ve just made it dramatically easier to detect – you simply watch a server with PA, then look at the Disk Activity screen.  If the logical disk has a red background, it means the offset is suspect:

disk_misalignment

PA shows you exactly what the offset is in the tooltip for the logical disk.  I wish I could say we will then auto-fix it for you, but you probably realize that would not be possible ;-)  However, on this same screen you can clearly see whether or not latency is actually a problem on the misaligned disk, and from there determine whether or not you should actually go through the process of moving database files and rebuilding the partition with a proper offset.  Here's another server where all partitions are misaligned, and where the disks are experiencing pretty severe latency:

disk_misalignment_multi2

BTW, if you haven't seen this screen before, all of those color-coded bars are database files -- data files and transaction logs.  Same color means same database, diagonal hash marks mean transaction log.  Bottom connector pipe for each disk is writes, top pipe is reads.  Wide red fill in the pipes means high latency, thin green means IO activity but acceptable latency.

For more details on the performance implications of disk misalignment and what you can do about it, check out Microsoft's Disk Partition Alignment Best Practices for SQL Server whitepaper and Paul Randal's blog post on this topic.

Mount Point Support

Another common request that we’ve been getting more and more over the past year or so is for mount point support, in large part because of the explosion of VMs and associated LUNs in the environment.  Now when you look at the Disk Activity screen, you’ll see mount points displayed alongside all other disks:

mount_points

PA shows the same latency and volume information for mount points and database files on mount points as it always has for other disk types.

Technically these disk-related changes were no small feat to pull off, but once again the dev team rose to the occasion!

Performance Optimizations

We are always looking for new ways to increase the performance of SQL Sentry and minimize our footprint in the environment… a footprint, I might add, that is already dramatically lower than any competitive product.  (We will actually be proving this very soon… stay tuned.)  However, in this release we’ve made some changes that would be considered significant, even by us.

Redesigned Threading Model

Two changes have been made to the threading model.  Previously, dedicated threads were spawned for each monitored server, and each would consume memory, so the more servers you were monitoring, the more memory would be used by the SQL Sentry Server service process (you need one of these services for approximately every 100 or so monitored servers).  A new thread pooling system is now in place that eliminates most of these dedicated threads, dramatically reducing the process’ memory footprint and increasing scalability at the same time.

Next, for some of the collection subsystems we’ve moved from a fork/join threading model to more of an asynchronous work queue model.  Fork/join threading is similar to parallelism with SQL Server.  There are some benefits to this approach, but a primary drawback is that the process can only be as fast as the slowest thread.  The new model effectively eliminates the possibility of collection delays caused by offline servers, network issues, etc., that we’ve sometimes seen affect other critical SQL Sentry processes like chaining and alerting.

Reduced Plan Cache Footprint

Prior to v5.5, ad hoc SQL was being used in some places by both the SQL Sentry Console and Server, and in certain scenarios over time this could lead to plan cache bloat and some buffer pressure on the server where the SQLSentry database was installed.  In v5.5 we’ve eliminated almost all uses of ad hoc SQL, which dramatically reduces the plan cache footprint of the SQLSentry database.  This will not impact monitored servers, only the server where the SQLSentry database resides (unless of course it is monitored as well ;-)  After you upgrade, I recommend that you run DBCC FREEPROCCACHE on that server so the buffer can immediately reclaim any of the space previously used for the ad hoc cache.

I really hope you enjoy using v5.5.  New users can get bits here, existing customers please login to your portal account.  As always, we fully support upgrading to this build from any earlier SQL Sentry version, and from this build to the final release of v5.5.  Thanks to everyone who has shared their ideas with us, our great beta testers, and of course to the entire SQL Sentry team for your efforts in finally making these great new features a reality!

Friday, February 5, 2010

Analysis Services and the Windows File System Cache

In my last post I revealed a problem one of our customers encountered when SSAS memory limits were inadvertently set to KB instead of bytes, leading to abnormally low limits and some pretty serious performance issues.  I alluded to a “saving grace” that prevented things from being much worse than they actually were, and that’s what I’ll cover in this post.

A little background first, or rather an expansion of something I touched on in my original post -- unlike the SQL Server Database Engine, much of the SSAS data can be resident in physical memory by virtue of the Windows file system cache.  This is simply because SSAS databases consist of a collection of Windows files of various types, and SSAS does not directly manage memory/disk access to those files using Windows APIs as complex memory managers like SQL Server do.

This means that even if the SSAS internal caches are effectively non-existent, you can still have a significant amount of raw SSAS file data loaded into RAM which will not show up as part of the SSAS process memory, or in any way be governed by the SSAS memory limits.  This was the saving grace here.  If the storage engine actually had to read the files from disk for every query, performance would have been significantly worse than it already was.

I’m going to use the some Performance Advisor for Analysis Services dashboard charts to fully illustrate this phenomena.  Since the customer’s server has 32GB of RAM, I’ve blown up the System Memory chart so it’s a little easier to see what’s going on: 

memory_file_cache

The thin grey line shows the physical memory used by the file cache (about 700MB), and the even thinner orange line shows the memory used by the SSAS instance with invalid memory limits (about 450MB).  The large teal-colored area is memory used by a SQL Server instance on the same server.

Assuming there wasn’t much file serving happening on this server, most of the file cache is likely comprised of SSAS database files.  Without file serving activity, usually you will not see the file cache get this large, even though on 64 bit machines like this one it is capable of growing to use up all available memory.  (On 32 bit systems the max file cache size is about 1 GB.)  Read more about the Windows file cache here.

Ok, interesting theory, but how can we confirm that SSAS was able to get it’s data from the file cache?  Fortunately the SSAS Storage\File Read/Write charts make this is pretty easy:

memory_file_read

The File Queries chart on the right shows SSAS is querying database files at a rate of about 20 per sec, and the File Read KB chart shows that even though there is a lot of file read activity on the system (about 40MB per sec), almost none of it is coming off of disk.  This is clear because the dashed orange Total Disk line is flat, and there are also no Windows read faults at that time:

memory_read_faults

Anytime SSAS files are read from disk into physical memory, you should see hard faults.  To better demonstrate this, take a look at a similar chart for a different environment:

memory_file_read_high_disk 

The off-white Total System series shows ALL file read activity flowing through the Windows system (both from memory and disk), the light orange Total SSAS series shows the portion of total read activity directly related to the SSAS process (barely visible here), and the blue series shows the portion of SSAS read activity from data files.  The dashed Total Disk read line is overlaid on top of the stacked area chart so you can quickly see how everything correlates.  It’s clear that a sizable chunk of the SSAS data file reads are coming off disk.  Also note how this time we have significant Windows read faults:

memory_file_read_high_faults

To get a different look at the data as well as the actual file cache hit ratios, if we click on a point on the chart in the middle of the read activity spike, then click the Sample button on the toolbar it loads up the data for that specific point in time:

memory_file_read_high_disk_sample 

What this shows us is that approximately 81% of the file read activity for this sample came from the file system cache.  This is not bad, but in general you always want Total Disk to be at or near zero, which will be the case whenever SSAS is pulling data from either its internal caches or the file system cache.

How can you tell whether the data is coming from SSAS’s internal caches versus the file cache?  That’s easy -– you won’t see any storage engine file queries or associated file read activity at all, and the SSAS cache hit ratios (visible in sample mode, or when using the new Performance Advisor v5.5 reporting) will be high, such as in the shot below where the measure group cache hit ratio is 99%:

memory_high_ssas_cache_hit_ratios 

In addition to the cache hit ratio meters, there is also a visual hit ratio indicator right on the column chart -– the “fullness” of the columns reflect the hit ratios.  The more gray fill you see between he column hash lines, the higher the hit ratio:

memory_high_ssas_cache_hit_ratios_low 

Note that the measure group column is only about half full, reflecting the 52% hit ratio.  The calculation and flat cache hit ratios are both zero, so those columns are empty.  Clever, eh? ;-)  Perhaps even cleverer, on the measure group column there are two types of hash lines, horizontal and diagonal -- the horizontal hashes represent direct cache hits, while diagonal hashes represent filtered hits.  These occur when existing cache entries have to be filtered first to obtain the desired data, and are slightly higher overhead than direct hits.

In an ideal world, aside from when the cache is first warming up, you don’t want to see SSAS file queries at all, except perhaps for the occasional aggregation hit, which is also shown on the File Queries chart.  Bottom line, if you see file queries there were cache misses.  Here’s a quick custom Performance Counter History report using Performance Advisor v5.5 which shows this clearly :

memory_hit_ratio_report 

Even so, the Windows file cache hit ratio may be 100% if all of the referenced files were in the file cache.  But in that scenario, as we saw in the last post, even though performance will be better than if they were read from disk, it will not be nearly as good as if the data had been found in the SSAS cache because of the additional overhead involved when the storage engine retrieves file data from Windows.

If you want to see how your Analysis Services system is performing in this regard, you can download a fully functional 15-day trial of Performance Advisor for Analysis Services here.  Cheers.