Title: Vsevolod (Seva) Semouchin Technical Account Manager
1Reliable and Effective Networking in ESX / ESXi
- Vsevolod (Seva) Semouchin Technical Account
Manager - 26 September 2009
2Disclaimer
This session may contain product features that
are currently under development. This
session/overview of the new technology represents
no commitment from VMware to deliver these
features in any generally available
product. Features are subject to change, and
must not be included in contracts, purchase
orders, or sales agreements of any
kind. Technical feasibility and market demand
will affect final delivery. Pricing and
packaging for any new technologies or features
discussed or presented have not been determined.
These features are representative of feature
areas under development. Feature commitments are
subject to change, and must not be included in
contracts, purchase orders, or sales agreements
of any kind. Technical feasibility and market
demand will affect final delivery.
3Agenda
- HA Cluster Architecture
- HA Cluster Positioning
- HA Cluster Architecture Overview
- HA Cluster and vCenter
- Primary and Secondary Nodes
- Isolation Detection and Restart Mechanisms
4HA Cluster is
- failover Cluster
- N to N Cluster
- uses heartbeat detection mechanism instead of
quorum - May be split in two parts
- Have longer timeouts
- Works together with other VI components
- VMFS against split-brain
- DRS and VMotion to failover application from the
running node - independent from the vCenter
5AutoStart Architecture
Sensor
Trigger
AutoStartaware Process
Resource Group
Proxy Process
rules
Rule
Actuator
AutoStart unaware Process
6VMware HA Architecture
vpxd
VC Server
ESX Host
ESX Host
vpxa
VM
vpxa
VM
VM
VM
VM
VM
VMap
Vmap
AMAgent
AMAgent
7VMware HA and vCenter
ESX Host
vCenter
vCenter Manages VMs
VMwareHA Manages Hosts
VMap
VMap VM aware process is a proxy process
which provides communication between Virtual
center and VMware HA. VMap provides to HA sensor
values, reports to VC state changes and requests
VC to perform on virtual machines actions like
stop and start VM.
8HA and vCenter Data Flow
Host Policy Node Resources
VM Policy NodePolicy, ChangePolicy, AddVmToNode,
DeleteVmFromNode
ESX Host
VC Server
VMap
vpxa
AMAgent
vpxd
Requests StartVM, StopVM, NodePolicyRequest,
NodeResouceRequest
Notification NodeFailure, NodeIsolation,
SlotRequest, PrimariesChanged
9Conclusions
VMware HA and vCenter are loosely coupled.
- When vCenter fails, HA still can restart VMs
- The only functionality lost HA cannot place VMs
according to current resource consumption on
survived hosts - We can put vCenter in VM and let it to be
restarted by the HA, managed by the same vCenter. - This will provide even higher protection level
than vCenter in standalone physical computer
10Primary Node
- Node that holds AutoStart database is the primary
node
/opt/vmware/aam/config/agent_env.Linux Vmkernel
for ESXi /opt/vmware/aam/bin/Cli (ftcli on
earlier versions) AAMgt listnode Node
Type
State ----------------------- ------------
-------------- ha_node_01
Primary Agent Running ha_node_02
Primary Agent Running ha_node_03
Secondary Agent Running
ha_node_04 Primary Agent
Running ha_node_05 Primary
Agent Running ha_node_06
Secondary Agent Running ha_node_07
Primary Agent Running ha_node_08
Secondary Agent Running AAMgt
11Primary Node
- Up to 5 nodes. Limit set to minimize traffic.
Duties are - Trigger and execute rules
- Allow new nodes to join cluster
- When all primary nodes fails we will lost HA
functionality - Node can be promoted to be primary node when
- It is one of the first 5 nodes in the cluster
- One of primary nodes become isolated (that
includes failure!) - One of primary nodes was put in maintenance mode
- One of primary nodes was removed from cluster by
administrator - Promotion is virtually random (based on Managed
Object Ref.)
12Conclusions
- Watch primary nodes when you
- Want to build stretched cluster (two sites)
- Want to spread blade servers in two or more
enclosures (chassis) - Want to spread rack mount servers across two or
more racks - Use following cli commands
- /opt/vmware/aam/bin/Cli or ftcli
- listnode
- demotenode
- Promotenode
- Or just put no more than 4 servers in one
site/chassis/rack
13Agenda
- HA Cluster Architecture
- Isolation Detection and Restart Mechanisms
- Isolation Event
- Cluster Split-Brain Situation
- Cluster Networking
- VM Shutdown and Restart
14HA Node Isolation
- HA is a failover cluster.
- Unplanned failover is caused by the node failure
- HA calls such event node isolation
- Why isolation and not failure
- HA detects both node failure and isolation based
on the lost heartbeat. - Survived nodes could not distinguish among
failure and isolation of the lost node. - Isolated node can understand that it is isolated
and trigger special actions based on isolation
detection event. - Node failure does not require any special
actions, different from those triggered by the
node isolation event.
15Isolation Detection
16Relevant Advanced Parameters
17VMware HA Split Brain Situation
VMware HA Cluster
Service Console Network - Heartbeats
Isolation detection address Service Console
Network Gateway
18Split Brain protection by VMFS
Network
Network
A
B
1. VMs are running on host B
2. Host A considers host B as failed and tries to
start VMs
3. VMFS Lock prevents VMs from being started twice
19Isolated Host Behavior
- Since the isolated host locks all VMs running on
it, it prevents them from being restarted on
other, connected hosts - Power off or shutdown VMs on the isolated host
currently the only way to migrate them from the
isolated host to connected ones. - Turn off VM on the isolated host could be treated
as the cold migration. - We dont need to shutdown VMs when we are sure,
that the client connection is still working. - We could VMotion our VMs when the client network
is broken and VMotion network is still alive. - Currently only cold migration option is
implemented.
20Isolation response vs Restart priority
- Start VMs
- Restart priority
- High priority
- Medium priority
- Do not restart
- Applicable to
- Each single VM
- Whole cluster
- Stop VMs
- Isolation response
- Power off
- Shutdown
- Leave power on
- Applicable to
- Each single VM
- Whole cluster
21VM Restart Behavior
- The HA Cluster restarts VM depending on its
Restart priority - When restart fails, the HA cluster will repeat
its attempts to restart a VM until - The VM is shut down or powered off which frees
its lock - The formerly isolated host is no longer isolated
and reconnects to the vCenter Server. - The formerly isolated host subsequently fails
- The VM gets deregistered from the vCenter Server
22Special Restart Case VM Files are not Available
- When VM configuration file is not available
vCenter can change its state to disconnected - Use case cluster stretched over two sites. One
site (both servers and storage arrays) fails. - This is not an HA feature.
- When the configuration file becomes available
again HA can restart that VM when it detects
changes in resource consumption on one of the
survived hosts. - Use case after storage failover you may power
on and then off a dummy VM. This will force HA
cluster to restart failed over VMs
23Restart Process
- First High priority VM, then Medium priority VM
are being restarted - Assign high priority to your vCenter VM
- The decision which VM to start and where to start
VM is taken by primary node according to its
database. - No primary nodes, no restart
- Each time it restarts VM VMware HA request
information about node resource consumption from
vCenter. - HA sends command to start and stop VM to the
local vCenter agent vpxa - When vpxa fails node cannot fulfill HA commands
24Relevant Advanced Parameters
25HA Catastrophic Outage
Needs 1 minute to rebuild the spanning three
Turned off for maintenance
Service console network designed without single
points of failure
All nodes are isolated
26How to Prevent Such Outage
- Use PortFast feature on Cisco switches
- http//www.cisco.com/application/pdf/paws/10586/65
.pdf - Increase the isolation detection time
- das.failuredetectiontime
- Use the second console network interface as
cluster private network. - Risk we may have a node which is still
considered by HA as connected, when it is
disconnected from VM network
27Cluster Networking Best Practices
ESX Server
vCenter
S C
VMkernel
VMotion
28Advanced Parameter das.allowVmotionNetworks
- Applicable for ESXi only
- Value true or false
- Allows one NIC to be shared by VMotion and
Management networks - ESXi do not have service console. For vCenter and
HA communication a VMkernel port group should be
used. Such group is called Management port
group. - HA during initialization skips VMkernel port
groups with allowed VMotion - This behavior may be overridden by the parameter
das.allowVmotionNetworks true
29Advanced Parameter das.allowNetworkn
- Applicable for both ESX and ESXi
- Value character string port group name
- In fact this parameter disables networks
- Usage
- Once this parameter is used only port groups
whose names are declared will be allowed for the
cluster communication. - The parameter is checked when a new host joins
the cluster
30Example
VMotion
VMotion
HA
Mgmt 1
Mgmt 1
HA
HA
Mgmt 2
Mgmt 2
VMotion
VMotion
HA
Mgmt 3
Mgmt 3
We want to use for HA communications port groups
Mgmt 2 and Mgmt 1, but by default HA picks up
only Mgmt 2
Effect of setting das.allowVmotionNetworks to
true
31Example
VMotion
HA
Mgmt 1
HA
Mgmt 2
VMotion
Mgmt 3
32Network Compatibility
VMotion
HA 10.0.20.1
HA 10.0.30.2
SrvC1
SrvC 1
HA 10.0.10.1 HA
10.0.10.2
SrvC 2
SrvC2
Network compatibility check was introduced in
ESX/ESXi 3.5 U2. The reason on incompatible
networks IP timeout instead of heartbeats should
be used to detect the node isolation. This takes
too much time. Network compatibility check may
be overwritten by the following advanced
parameter (introduced since ESX/ESXi 3.5
U3) das.bypassNetCompatCheck true
33DRS Interaction
Antiaffinity
- You can use DRS antiaffinity rules to increase
the availability of thew application on
infrastructure level - Use Case
- Some critical VMs or Loadd balancing farm
- Solution use antiaffinity rules to force those
VMs to use different hosts
34What is Covered by VMware HA
35Cluster Scalability
- When you need restart your application in
36Conclusion
- VMware HA Cluster is a powerful tool to implement
the high availability on enterprise level - Use it to implement the high availability of your
vCenter server - Design cluster networking properly
- Avoid single points of failure
- Think of network compatibility
- Create proper restart policy for your VMs
- Leverage DRS to increase the application
availability on infrastructure level
37(No Transcript)
38Thank you for coming. Rate your session and
watch for the highest scores!