Monday, December 5, 2011

SQLIO - Testing, Round 1

For the past couple weeks I've been doing some analysis on the database environment I'm administering. I figured that is a great place to get my feet wet at my new position. It forces me to really dive in and understand the storage subsystem, which for me is an EMC CX3-20 SAN.  The other significant components include the QLogic QLA2460 HBAs and the Brocade 5000 switches.  All-in-all the environment seems more than sufficient to support a Microsoft Dynamics AX implementation.

My research was two-fold: first, I wanted to understand the theoretical limits of each component.  Then I used SQLIO - which isn't really a SQL tool at all - to gather some actual metrics.  I've found that you can't really just line up your expectations with the results because of some rather obscure, yet significant items (cache, for example).

I proceeded to document everything I could, often learning and researching along the way:

First, I started by examining the partition offset and file allocation unit size for each logical disk.  Apparently this used to be a little more cumbersome pre-W2K8, but I'm on W2K8 R2 so the partition offset was 65,536 bytes (64KB) across the aboard.  I retrieved this by doing to a command prompt and typing issuing the following:
wmic partition get BlockSize, StartingOffset, Name, Index
To get the file allocation unit size for each disk, I issue the following command(s):
fsutil fsinfo ntfsinfo <drive:>
In most cases this value should be 65,536 bytes (64K) for partitions on which SQL Server resides.

The HBAs have what is called an "execution throttle", which by definition "signifies the maximum number of outstanding commands that can execute on any one HBA port".  Each of my nodes has two HBA adapters - some had an execution throttle of 16 while others were at 256.  I standardized them all by going with the upper limit and will adjust accordingly.  What I'm trying to accomplish is to develop a baseline from which to compare future results in the event I reconfigure anything.  Also, according to QLogic these HBAs can handle 150,000 IOPS and roughly 750MB/sec throughput in full duplex.

Next, I determined that all devices were communicating at 4Gbps.  The switches have a maximum aggregate of 256Gbps end-to-end.

It's also important to understand queue depth per port on the SAN.  The maximum IOPS that can be handled on a given port at a given time is 2048 for EMC arrays.  That's the maximum, however.  What we can do is calculate the queue depth for each LUN using the following formula (in my case anyway): (#data disks x 14) + 32.  So in a RAID 10, 10 disk array you will have a queue depth of (5 x 14) + 32, or 108.  Ultimately we want to ensure no single LUN receives more parallel requests than it can handle.  That also goes for the port, so once you have calculated the queue depth for each LUN, sum them up to determine the total theoretical queue depth you could support, although very unlikely it would ever happen.

This is going to be a work in-progress, but if you have any feedback or suggestions, this is relatively new territory for me as a DBA.  It's quite interesting though!