Friday, February 5, 2010

Analysis Services and the Windows File System Cache

In my last post I revealed a problem one of our customers encountered when SSAS memory limits were inadvertently set to KB instead of bytes, leading to abnormally low limits and some pretty serious performance issues.  I alluded to a “saving grace” that prevented things from being much worse than they actually were, and that’s what I’ll cover in this post.

A little background first, or rather an expansion of something I touched on in my original post -- unlike the SQL Server Database Engine, much of the SSAS data can be resident in physical memory by virtue of the Windows file system cache.  This is simply because SSAS databases consist of a collection of Windows files of various types, and SSAS does not directly manage memory/disk access to those files using Windows APIs as complex memory managers like SQL Server do.

This means that even if the SSAS internal caches are effectively non-existent, you can still have a significant amount of raw SSAS file data loaded into RAM which will not show up as part of the SSAS process memory, or in any way be governed by the SSAS memory limits.  This was the saving grace here.  If the storage engine actually had to read the files from disk for every query, performance would have been significantly worse than it already was.

I’m going to use the some Performance Advisor for Analysis Services dashboard charts to fully illustrate this phenomena.  Since the customer’s server has 32GB of RAM, I’ve blown up the System Memory chart so it’s a little easier to see what’s going on: 

memory_file_cache

The thin grey line shows the physical memory used by the file cache (about 700MB), and the even thinner orange line shows the memory used by the SSAS instance with invalid memory limits (about 450MB).  The large teal-colored area is memory used by a SQL Server instance on the same server.

Assuming there wasn’t much file serving happening on this server, most of the file cache is likely comprised of SSAS database files.  Without file serving activity, usually you will not see the file cache get this large, even though on 64 bit machines like this one it is capable of growing to use up all available memory.  (On 32 bit systems the max file cache size is about 1 GB.)  Read more about the Windows file cache here.

Ok, interesting theory, but how can we confirm that SSAS was able to get it’s data from the file cache?  Fortunately the SSAS Storage\File Read/Write charts make this is pretty easy:

memory_file_read

The File Queries chart on the right shows SSAS is querying database files at a rate of about 20 per sec, and the File Read KB chart shows that even though there is a lot of file read activity on the system (about 40MB per sec), almost none of it is coming off of disk.  This is clear because the dashed orange Total Disk line is flat, and there are also no Windows read faults at that time:

memory_read_faults

Anytime SSAS files are read from disk into physical memory, you should see hard faults.  To better demonstrate this, take a look at a similar chart for a different environment:

memory_file_read_high_disk 

The off-white Total System series shows ALL file read activity flowing through the Windows system (both from memory and disk), the light orange Total SSAS series shows the portion of total read activity directly related to the SSAS process (barely visible here), and the blue series shows the portion of SSAS read activity from data files.  The dashed Total Disk read line is overlaid on top of the stacked area chart so you can quickly see how everything correlates.  It’s clear that a sizable chunk of the SSAS data file reads are coming off disk.  Also note how this time we have significant Windows read faults:

memory_file_read_high_faults

To get a different look at the data as well as the actual file cache hit ratios, if we click on a point on the chart in the middle of the read activity spike, then click the Sample button on the toolbar it loads up the data for that specific point in time:

memory_file_read_high_disk_sample 

What this shows us is that approximately 81% of the file read activity for this sample came from the file system cache.  This is not bad, but in general you always want Total Disk to be at or near zero, which will be the case whenever SSAS is pulling data from either its internal caches or the file system cache.

How can you tell whether the data is coming from SSAS’s internal caches versus the file cache?  That’s easy -– you won’t see any storage engine file queries or associated file read activity at all, and the SSAS cache hit ratios (visible in sample mode, or when using the new Performance Advisor v5.5 reporting) will be high, such as in the shot below where the measure group cache hit ratio is 99%:

memory_high_ssas_cache_hit_ratios 

In addition to the cache hit ratio meters, there is also a visual hit ratio indicator right on the column chart -– the “fullness” of the columns reflect the hit ratios.  The more gray fill you see between he column hash lines, the higher the hit ratio:

memory_high_ssas_cache_hit_ratios_low 

Note that the measure group column is only about half full, reflecting the 52% hit ratio.  The calculation and flat cache hit ratios are both zero, so those columns are empty.  Clever, eh? ;-)  Perhaps even cleverer, on the measure group column there are two types of hash lines, horizontal and diagonal -- the horizontal hashes represent direct cache hits, while diagonal hashes represent filtered hits.  These occur when existing cache entries have to be filtered first to obtain the desired data, and are slightly higher overhead than direct hits.

In an ideal world, aside from when the cache is first warming up, you don’t want to see SSAS file queries at all, except perhaps for the occasional aggregation hit, which is also shown on the File Queries chart.  Bottom line, if you see file queries there were cache misses.  Here’s a quick custom Performance Counter History report using Performance Advisor v5.5 which shows this clearly :

memory_hit_ratio_report 

Even so, the Windows file cache hit ratio may be 100% if all of the referenced files were in the file cache.  But in that scenario, as we saw in the last post, even though performance will be better than if they were read from disk, it will not be nearly as good as if the data had been found in the SSAS cache because of the additional overhead involved when the storage engine retrieves file data from Windows.

If you want to see how your Analysis Services system is performing in this regard, you can download a fully functional 15-day trial of Performance Advisor for Analysis Services here.  Cheers.

2 comments:

  1. I was hit with troubleshooting a slow performing SSAS 2005 server and found this detailed article very useful in confirming my understanding of how the performance relates to File Cache usage. Thank you for posting this!

    ReplyDelete
  2. You are quite welcome, glad it was helpful!

    ReplyDelete