One of the best things about VMware is that, if everything is setup correctly, a cluster can experience a host failure or a host isolation with very little impact to the guests within that cluster. However, as nice as it is to experience a failure and not get paged at 3AM, it helps to know what failed and when. By default, VMware vCenter 4.1 doesn’t ship with any HA alarms. Let’s take a minute to get one setup.
First, select the parent of the objects that you’d like to monitor. I chose the vCenter object, since I want to monitor all hosts & clusters within vCenter, regardless of datacenter. Then click the Alarms tab in the right pane, and then the Definitions view. In the whitespace to the right of the existing definitions, right-click and add a new alarm definition.
1.) The General tab: Give your new alarm a name, a definition (if you’d like), and set the monitor type to clusters.
2.) The Triggers tab: Click Add and select the space on the left of the row to enable the dropdown. Scroll down until you see the 5 events below that will trigger this alarm. Add them all, or each one into a separate alarm if you’d like. You don’t have to do anything with the Advanced settings on the triggers.
3.) We’ll skip the Reporting tab and move right to the Actions tab. There are couple different options here, such as run a command, send an SNMP trap, or send a notification email. I chose the email, and then entered the email address that will receive the alerts. Set the notification condition to Warning -> Error.
I hope you’ve found this helpful!
Andrea says
Looks like vC 4.0 has a default HA alarm setup with “HA host isolated”, “All HA hosts isolated” and “HA host failed” as the triggers. The “conditions-> Advanced” that’s blank doesn’t make much sense. The 4.1 HA guide is similarly vague as to setting up alarm conditions. http://www.vmware.com/pdf/vsphere4/r41/vsp_41_availability.pdf
What I’m really looking for is an alarm when the host HA agent has failed and I’m not sure if any of these alarms will trigger that condition.
Damian Karlson says
Andrea – Thanks for commenting. There’s an “HA agent unavailable” trigger at the cluster level and above. That should do what you’re looking for. As for the advanced trigger options, those are there to specify vCenter-internal ID’s (cluster-resource ID, compute-resource ID, etc), various virtual infrastructure names (hosts, clusters, vDSes, etc) or operator usernames — to name a few uses.
There are also a number of additonal Ha & alarm resources available from the community (just posting a couple):
http://www.simonlong.co.uk/blog/2010/09/24/creating-custom-vcenter-alarms/
http://www.virtuallyghetto.com/2010/10/how-to-ack-reset-vcenter-alarm.html
http://www.yellow-bricks.com/vmware-high-availability-deepdiv/
Hope this helps!