So, Customer starts updating all of his VMware ESX hosts and things turn out for the worst. VMs are crawling slow (ping response time from 0 to 1000ms), console access through vSphere client doesn’t always work, and hosts’ CPU percentage is unnaturally high. Cause is apparent: path thrashing.
Path thrashing happens when, for some reason, SCSI LUNs are being continuously reassinged from a controller (Target) to another one. ESX has a hard time “bouncing” I/O back and forth on the right Fibre Channel path. On Active/Passive SAN arrays a LUN can be “owned” by just one controller at a time. If the LUN owner has to be changed because of a hardware failure (path, Controller, SFP/GBIC, FC switch, …) or because the Initiator would like to, the LUN itself has to “trespass” (in EMC parlance), transition to another controller. The “command” to do so can be issued by the Initiator or internally by the storage subsystem.
Back to today’s case, I was dealing with an IBM DS4800 where LUNs flipped like mad between controller A and B. How to stop it quickly?

  • If anything, the flipping shows that failover works as expected (VMs don’t crash despite the chaos).
  • That said, I could just disconnect a controller. Not really because the same storage system hosts an Oracle RAC cluster, humming along happily, unaffected by the issue.
  • I need a way to selectively “hide” a controller from one or more hosts. I can do it easily by tweaking the SAN zoning configuration.

A Zone (much like a VLAN) is basically a group of WWNs (or ports). Objects in the Zone can only talk to each other. While creating Zones, it is common practice to “go minimal”: they should contain as few stuff as possible. I usually name them like this:
HBA Port 1 of HOSTNAME can see Controller A/Port 1 and Controller B/Port 1 of the DS4800.
Thus, going through each ESX server’s Zone, I just remove the Controller that the host shouldn’t see. Path thrashing is temporarily stopped.
The above rant serves mainly as a pro-zoning argument. “If every HBA port has to access every Controller’s port, why implement zoning?”. As you just read, zoning saved me from serious trouble, today.
About the “real” issue, it was ultimately caused by a thing called “Auto Volume Transfer” (AVT)1. Let’s say that a LUN is assigned to controller A, but I/O for the LUN is issued to controller B. With AVT switched on the storage system will automatically transfer the LUN from A to B.
The Customer ESX servers are all (correctly) configured to use the “Most Recently Used” (MRU) path to a LUN. It seems that ESX, from a certain version on, issues I/O on the standby path, causing havoc if AVT is on. I can’t tell if that’s because it is fooled into thinking that the storage is an Active/Active one or if it just sort of periodically “probes” standby paths.
How do you switch AVT off? By using the DS “Storage Manager” and changing the ESX Hosts’ type from “Linux” (or whatever) to “LNXCLVMWARE”. This applies to all of the LSI derived Storage Systems (IBM, SUN StorageTek, Engenio, …). The latter host type is the right one to use when hooking an ESX cluster to an IBM DS Storage System. But “Linux” seems to do just fine on not so new ESX hosts version 4.1.x … When AVT is off, the Storage will decide to trespass LUNs only in the event of an internal hardware failure while, normally, LUN ownership will be handled by the multipathing software on the Host.

More reading on the subject:

[1] Differences between the “Linux” and “LNXCLVMWARE” host types.
[2] How does Auto Volume Transfer (AVT) work? Courtesy of Google’s cache. Lists which SCSI commands trigger AVT.
[3] A really nice blog post about the same issue described here. (Found, of course, when I was writing mine)

  1. or even “Auto Disk Transfer” (ADT)