I thought I’d take a moment to write down some frustrations I’ve been having with VMware Host Profiles, and how I resolved it. Before I get into that, let’s take a second to discuss what they’re supposed to do, and how you can get it in your environment.
Host Profiles is a VMware vCenter feature that is available on hosts licensed with Enterprise Plus, and it allows you to take an ESX or ESXi’s configuration profile and apply it to other hosts. It’s a quick and easy way to replicate configurations for hosts in a cluster, and also to monitor a host’s compliance with the selected host profile.
A good use case for Host Profiles is a virtual (non-distributed) switch change, or an internal service policy change that can be applied to all hosts in a cluster, such as an NTP source host. Recently, I used Host Profiles for a fresh install/upgrade from ESX 4.0 to ESXi 4.1 on a large cluster that I administer. Host Profiles eliminated the time necessary to craft a PowerCLI configuration script or even (ugh) manually duplicating host configurations across the cluster (which just opens the door for human failure).
So now that I’ve covered what it does, let’s talk about the oddities that I consider to be bugs within Host Profiles.
First, I performed a fresh install of ESXi 4.1 over the top of an existing ESX 4.0 installation. After the installation completed, I configured the host accordingly to match the rest of the cluster. I then created a new host profile, and chose the newly installed host as the profile reference. Then I attached the newly created profile to that host’s cluster. A quick compliance check showed that all other hosts but the reference host are not in compliance. Applying a host profile to a non-compliant host would complete successfully, but the host would remain non-compliant. You can see in the image below that there are a number of failures against the host profile. No amount of re-applying the profile or rebooting the non-compliant host would fix it.
As you might imagine, this had me pretty frustrated. In addition to the ruleset failures listed above, the Host Profile wasn’t properly disabling Fault Tolerance logging on a VMkernel NIC even though this specific cluster is not FT-capable.
The good news is that there’s a way to fix all of these troubles, although it can be a bit painful, especially if vDS (vNetwork Distributed Switch) is involved. Each host in the cluster needs to be put into maintenance mode. After that completes, disconnect each virtual adapter from the vDS. Navigate to the vDS in the Networking view (CTRL+SHIFT+N), then to the Hosts tab. Right-click the host and Remove from vNetwork Distributed Switch. Go back to the Hosts view (CTRL+SHIFT+H) and right-click the host to Remove from Inventory. Click Yes to accept the loss of past performance data, host-level user permissions, etc. After the removal completes, add the host back into the cluster. Right-click the host and go to Host Profile > Apply Profile. You will be prompted as needed to provide the IP configuration of any VMkernel adapters, etc. After the profile remediation completes, right-click the host again and go to Host Profile > Check Compliance. Your host will now show as compliant.
Notes: All of the hosts were running ESXi 4.1 build 320137. vCenter Server and Client are both 4.1, build 258902. I did submit an SR to VMware Support, and after discovering that removing & re-adding the host to vCenter resolved the issue, I noted it in the ticket. The tech responded saying that he would attempt to reproduce in a test environment, and write a KB article as needed.