A tale of NetApp and Wireshark discoveryPosted: February 13, 2015 | |
–==For those interested, Pluralsight has an excellent video training course called Introduction to Wireshark. I highly recommend Pluralsight as the go-to source for IT video training!==–
I was cleaning up a client’s /etc/rc file yesterday while preparing to move some IP addresses to different interfaces and I noticed they had configured the vMotion network as a VLAN interface on both controllers. This isn’t right because the vMotion network only needs to exist between ESXi hosts – the storage array never touches the traffic. Storage vMotion doesn’t use the vMotion network either. It uses the storage network, whether IP- or FC-based.
I wanted to see if the interface was being used at all and fortunately, NetApp has a command for that. The ifstat command shows the count of frames received and transmitted on any or all interfaces, total bytes for each, and the number of multicasts or broadcasts. So in this case, it looked something like:
NETAPP-A> ifstat VIF-A-79 -- interface VIF-A-79 (22 hours, 57 minutes, 50 seconds) -- RECEIVE Total frames: 150k | Total bytes: 10924k | Multi/broadcast: 21869 TRANSMIT Total frames: 4767k | Total bytes: 7177m | Multi/broadcast: 138 Queue overflows: 0 DEVICE Vlan ID: 79 | Phy Iface: VIF-A
I see about 10 MB received and over 7 GB transmitted. NETAPP-B showed similar but opposite numbers: 6.6 GB received and 11 MB transmitted. It had been several months since the counters had been cleared originally (the output above was taken after the fact), so I cleared them to get a good count on what was happening on the interface.
NETAPP-A> ifstat -z VIF-A-79 -- interface VIF-A-79 (23 hours, 21 minutes, 37 seconds) -- NETAPP-A>
And then I started watching the counters. 3 frames received, all broadcasts. Nothing transmitted. 20 frames received, all broadcasts. Nothing transmitted. And on and on. The partner’s interface was similar but opposite. Broadcasts being sent, nothing received. Hmm. I needed to see what was going on here.
I used NetApp’s built in packet capture to look at the frames hitting the interface on each controller. I’m glad I did and I’m glad I forgot to kill the capture before leaving for the day (the default maximum log size when dumping to disk is 1 gigabyte – whew! I looked that up as soon as I realized I had left the capture on. Lucky me! See the pktt man page for details). When I reviewed the 100-200 MB captures, I saw some WINS and NetBIOS traffic that I’ll need to verify is correct (are they really using WINS? Could be…), but scrolling through the trace, I see a lot of TCP traffic, in large bursts, just between the controllers on the vMotion VLAN interface. I was stumped as to what this traffic could be. I used the following filter in Wireshark to verify that no other hosts were communicating to the NetApp on this network:
ip.src != 10.79.10.100 && ip.src != 10.79.10.110
The filter completely emptied the screen, assuring me that packets from no other host were present.
It soon occurred to me that SnapMirror had been set up to move some volumes between controllers some time ago. Looking in the /etc/snapmirror.conf files produced this on the second controller:
NETAPP-B> rdfile /etc/snapmirror.conf #Regenerated by registry Wed Jan 21 19:27:33 GMT 2015 10.79.10.100:DATABASE_A NETAPP-B:DATABASE_A - 30 * * * NETAPP-A:VOL_AXLE_DATA NETAPP-B:VOL2_AXLE_DATA - - - - -
Yep – there it is. There’s a SnapMirror relationship configured to use the vMotion VLAN (79). And another set to use the the management IP address which, in this case, uses e0M. That’s one of the items I was going to change anyways – move the management IP address to an interface group of 1 Gb interfaces.
There are options available in 7-mode that can prevent data and replication traffic from traversing the 100 Mbps e0M management link. This link is considered a low-bandwidth link in a LAN because there are 1 Gbps links available. From an SSH session, run the following command to see them:
NETAPP-A> options interface interface.blocked.cifs interface.blocked.ftpd interface.blocked.iscsi interface.blocked.mgmt_data_traffic off (value might be overwritten in takeover) interface.blocked.ndmp interface.blocked.nfs interface.blocked.snapmirror
You simply list the interfaces you don’t want used for each type of traffic. Remember the WINS and NetBIOS broadcasts I captured? That’s because CIFS is not blocked from *any* interface. I’m sure these broadcasts appear on all interfaces. When configuring the protocol filters above, think about which traffic needs to flow over which interfaces, and block them on interfaces which aren’t supposed to have them. For example,
options interface.blocked.snapmirror e0a,e0b,e0M
I included these options in a NetApp design I shared last year.
All this discovery paid off. In the process of performing the discovery before simply moving an IP address willy-nilly, as easy as it could have been, I was able to find all these items that could use some attention.
- The Service Processor is not configured
- SnapMirror relationships are not using a dedicated replication network when VLANs and interfaces are available (I still need to check about switchports, though!)
- The vMotion VLAN is configured as a VLAN interface on both controllers
- Protocol filters aren’t configured
- iSCSI is enabled on all interfaces