When we released SQL Sentry v7.2 with dedicated Windows monitoring last October, we were already hard at work on the AlwaysOn-oriented features that would become v7.5. We wanted to be the first to market with integrated, visual AlwaysOn monitoring and management, and I'm happy to say that we've achieved this goal – v7.5 RTM is now available for download!
I've been continually surprised at how quickly SQL Server's AlwaysOn features, availability groups in particular, have gained acceptance. I don't have any hard numbers, but the rate at which we are seeing them considered for use in production systems seems to be far outpacing adoption of other major new features from past SQL Server releases, including database mirroring in SQL Server 2005. Mirroring established a beachhead for SQL Server in the HA/DR space, and led to significant demand for more, effectively priming the market for AlwaysOn. As you probably already realize, AlwaysOn technologies (as opposed to tools) delivers big on the more.
The tools? Well, that's another story, and that's why I'm here today ;-) If you've spent any time managing more than a couple of availability groups with SSMS, you know that making sense of the environment can be quite a challenge. Some of the questions SSMS may leave you asking:
- How are my Availability Groups (AGs) and Failover Cluster Instances (FCIs) distributed across the Windows Server Failover Cluster (WSFC) nodes?
- Where is the primary replica for this AG?
- To which AG, if any, does my database belong?
- Is the AlwaysOn Dashboard showing me "the truth" about replica health?
- If a replica fails over, how will I know?
You may note a common theme here: a need for better visibility. These questions and many others result from the instance-level and generally static, disconnected view of the world provided by SSMS. SSMS is great at many things, but since it has no active collection or intelligence, it's just not well-suited for this particular task.
Our ultimate objective with this release of SQL Sentry Performance Advisor was to increase visibility and manageability around AlwaysOn in dramatic fashion, thereby enabling it to reach its full potential. This new functionality is so powerful, some of our clients are even using it for resource management across commodity servers, something I'm not sure the AlwaysOn creators ever envisioned. I'll delve into that in a future post.
One of the biggest shortcomings of the native tools for managing AlwaysOn is that, depending on which replica you are viewing, you will see different information about the status and health of an AG. This is because secondary replicas aren't aware of any other replicas – only the primary knows the true status of all replicas in the AG.
So when you are looking at a replica in SSMS, you need to be sure you are on the primary before giving any weight to what you are seeing. If you have a 5-node AG, there can be be a lot of clicking around in order to ascertain accurate status and health info for all replicas.
SQL Sentry actively collects data from the primaries and secondaries as appropriate to present a seamless view of the world. No clicking around, no guessing, just the truth.
If you open AlwaysOn Management from the top level Shared Groups (Global) node in the Navigator Pane, you will see all WSFCs, WSFC nodes, and AG replicas across the entire environment monitored by SQL Sentry:
If you open it from a site or group node, you'll see the nodes in that group, plus any nodes outside of the group that are hosting any related replicas. If you open Performance Advisor and any SQL Server instances on the computer have availability group databases, you will see an integrated AlwaysOn tab that contains a topology for all related replicas.
So, how much of the topology you'll see is dependent on the level at which you are viewing the AlwaysOn interface, making it easy to organize and manage your AlwaysOn assets in a way that works best for you.
Multiple AlwaysOn Views
One thing we learned quickly is that no single view represents all AlwaysOn topologies well. So we ended up with several ;-) All are easily selectable from the Layout Style dropdown at top left.
This shot is the Groups/Replicas layout, zoomed in to a single AG:
The next shot is of the WSFC Members layout, which gives you a high level perspective of the topology, including file share or disk witnesses, as well as aggregated data flow between the WSFC nodes:
Perhaps the most powerful layout is the WSFC Node/Group Matrix, which is the default. It provides a grid-style presentation of the entire WSFC, by node and AG:
Here you are looking at a 5-node WSFC with eight AGs and two FCIs, distributed unevenly across the nodes. You can change sorting instantly by clicking a column (node) or row (AG) header. This allows you to quickly bring into view all AGs on a node, or all nodes hosting an AG's replicas, regardless of the size of the environment.
NOTE: This is a test environment designed to stress the application, and this is is no way a recommended configuration. (If you look closely you can probably detect which of our resident MVPs helped to build it)
There are several other layout styles representing variations of those shown above, each designed to give you a different "angle" on AlwaysOn. I encourage you to try all of them, and let me know which you find most helpful for your environment.
Data Volume and Rate
On all views, the connector pipes and inner flow lines on the diagram provide a visual indicator of both data volume and rate between nodes or replicas. You can get a better look at the connectors by clicking any of the images above. Current Performance Advisor users will immediately note the similarity to our patent-pending Disk Activity screen.
In case you are wondering what it all means, here's a quick overview:
- When in realtime mode (the default) dashed lines move in the direction of flow, and grow wider as the volume increases.
- The lines change color from green to red as the rate of data flow between replicas/nodes slows down.
- Entering a custom date range on the toolbar automatically switches to historical mode – the lines become solid and represent the "average" volume and rate over the range.
This makes it easy to see at a glance where bottlenecks may exist in the environment, over any date range!
The tab to the right of the diagram presents historical data for the range showing data flow to/from replicas as well as send/recovery queue sizes. The charts are context sensitive, so the data shown will change depending on the diagram element or grid row selected. When in realtime mode, they will auto-scroll to show the last 10 minutes of activity.
The bottom area is comprised of several tabs broken out by AlwaysOn element, providing a rich and easily navigable version of the AlwaysOn schema:
Note that as you click elements on the diagram, the tab and row for the specific element will be auto-selected, and vice versa. You can then expand the row for more details – for example, expanding a replica row gives you all of its databases and associated info. Minor features, but ones that can provide huge efficiency gains over time.
Error and State Change Logs
Adjacent to the charts are two tabs which show all error or state changes for the active date range. Just like the charts, these are context-sensitive, so as you click on a different node or replica, the data will be auto-filtered. Unlike the "Health Events" view in SSMS, errors and state changes are listed separately, making it much easier to scan and process the information. In addition, "Message" and other important columns are shown by default, you don't have to manually add them every time.
- Integrated quorum details are shown on the WSFC Node/Group Matrix layout (above):
If a file share or disk witness is configured, it will appear in the upper left on this grid, as well as on the WSFC Members layout.
- A variety of health and configuration details are shown for each replica element on the diagram, so you truly have all of the information you need at a glance:
See our Online User Guide for a full explanation of these and other elements.
- Alerts and other actions are simple to configure for failover and replica health events:
There is more, but I think I've exceeded my daily screenshot quota, so I'll wrap it up here.
"Wow, looks expensive," you may be thinking. "How much does all of this cost?"
Believe it or not, this new functionality is included in the base Performance Advisor for SQL Server license at no additional charge! We did try to come up with a simple and easy way to license it, but were unsuccessful ;-) We also thought, what better way to reward existing customers and incent new customers than by providing for free what we believe will become the new standard for AlwaysOn management and monitoring.
I wanted to thank all of those who gave us valuable feedback along the way, in particular our friends Allan (twitter | blog) and Ben (twitter) over at SQLHA, two of the world's foremost experts on this topic.
Please take it for a test drive, and let me know what you think!