I ran into a very strange issue the other day, and I wanted to share it. On my biggest VMware cluster, we run 7 x IBM 3850 M2’s with 2 x 10GbE public, 2 x 10GbE private and 2 x dualport 4GB FC HBA’s. (Side note, these sorts of conversations remind me of the “gearheads” from old movies talking car speeds & feeds, but I digress.)
The private 10GbE links are used for vMotion & CIFS and sit on a vDS. One of the hosts couldn’t complete a vMotion, and upon further inspection it was unable to vmkping to any other private vMotion address in the cluster. I assumed a misconfiguration somewhere on the vDS and began verifying the usual things: vMotion enabled, vmknic’s with the correct IP addressing, blocked ports, etc. I noticed that these NICs weren’t pulling any CDP info from their upstream physical switches, so I checked for ports that were administratively shut. No go.
Feeling completely frustrated, I decided to reboot the box. If it works for Windows, it’ll work for VMware, right? Right!? 🙂 Well, that didn’t solve it either. After talking with my fellow vNerds @ work about the issue, I decided to remove the host’s physical network adapters from the vMotion portgroup, and re-add them. And that fixed it. Moral of the story: Sometimes you just need to remove & re-add the phy NICs to fix the issue. 🙂