Distributed Resource Management for Virtualized System Clusters VMware R PowerPoint PPT Presentation

presentation player overlay
1 / 28
About This Presentation
Transcript and Presenter's Notes

Title: Distributed Resource Management for Virtualized System Clusters VMware R


1
Distributed Resource Management for Virtualized
System ClustersVMware RD
2
Outline
  • Distributed Resource Management for Virtualized
    System Clusters
  • Infrastructure
  • Live Migration
  • Architecture for Distributed Communication and
    Control
  • Resource Management Basics
  • Distributed Resource Scheduling
  • Distributed Power Management
  • Distributed High Availability
  • Summary

3
Virtualized Systems Cluster Infrastructure Live
Migration
  • Example VMware VMotion
  • Hot migrate VM across hosts
  • Transparent to guest OS, apps
  • Minimal downtime (sub-second)
  • Requirements current
  • Globally accessible storage (SAN/NAS)
  • Same subnet (no forwarding proxy)
  • Compatible processors
  • Details
  • Bitmap tracks modified pages
  • Pre-copy iteration sends modified pages
  • Repeatedly pre-copy diff until converge
  • Exploit meta-data (shared, swapped)

4
Virtualized Systems Cluster Infrastructure
Example Architecture
5
Outline
  • Distributed Resource Management for Virtualized
    System Clusters
  • Infrastructure
  • Resource Management Basics
  • Goals
  • Controls
  • Distributed Resource Scheduling
  • Distributed Power Management
  • Distributed High Availability
  • Summary

6
Resource Management Goals
  • Performance isolation
  • Prevent virtual machines (VMs) from monopolizing
    resources
  • Guarantee predictable service rates
  • Efficient utilization
  • Exploit undercommitted resources
  • Overcommit with graceful degradation
  • Exploit opportunities to reduce power consumption
  • Easy administration
  • Flexible dynamic partitioning
  • Meet absolute service-level agreements
  • Control relative importance of VMs
  • Respect availability constraints

7
Resource Controls Overview
  • Useful Features
  • Express absolute service rates e.g., 512MHz,
    1GB
  • Express relative importance e.g., VM A to get 2x
    CPU of VM B
  • Grouping VMs for isolation, sharing e.g., VMs
    A,B to share 1GHz
  • Challenges
  • Simple enough for novices
  • Powerful enough for experts
  • Mapping application-level metrics to physical
    resource consumption
  • E.g., What MHz is needed to guarantee 100
    transactions/second?
  • Scaling from single host to cluster of (say)
    32/64/128 servers

8
Basic Resource Controls
  • Shares
  • Specify relative importance
  • Entitlement directly proportional to shares
  • Abstract relative units, only ratios matters
  • Reservation
  • Minimum guarantee, even when system overcommitted
  • Concrete absolute units (MHz, MB)
  • Admission control sum of reservations capacity
  • Limit
  • Upper bound on consumption, even when
    undercommitted
  • Concrete absolute units (MHz, MB)

9
Resource Pools
  • Motivation
  • Allocate aggregate resources for sets of VMs
  • Isolation between pools, sharing within pools
  • Flexible hierarchical organization
  • Access control and delegation
  • What is a resource pool?
  • Named object with permissions
  • Reservation, limit, and shares for each resource
  • Parent pool, child pools, VMs

10
Resource Controls Exploration Areas
  • Additional controls
  • Real-time latency guarantees
  • Application-level metrics
  • Users think in terms of transaction rates,
    response times
  • Labor-intensive, requires detailed
    domain/app-specific knowledge
  • Automate mapping to physical resource controls

11
Outline
  • Distributed Resource Management for Virtualized
    System Clusters
  • Infrastructure
  • Resource Management Basics
  • Distributed Resource Scheduling
  • Overview
  • Example
  • Exploration Areas
  • Distributed Power Management
  • Distributed High Availability
  • Summary

12
Distributed Resource Scheduling Overview
  • Useful features
  • Choose initial host for VM power on
  • Dynamic rebalancing by migrating running VMs
    between hosts
  • Configurable automation and migration threshold
    levels
  • Provide host evacuation for flexible host
    downtime
  • Support optional constraints on VM colocation on
    hosts
  • Preserve resources for failover
  • Challenges
  • Placement and migration decisions involve
    multiple resources
  • Resource pools can span multiple hosts
  • Determining appropriate migration threshold
    controls
  • Assorted failures modes (hosts, connectivity,
    etc.)

13
Distributed Resource Scheduling Example VMware
DRS
  • Cluster-wide resource management
  • Hierarchical organization and delegation
  • Flexible grouping, sharing, and isolation
  • Configurable automation levels, migration
    aggressiveness
  • Configurable VM affinity/anti-affinity rules
  • Preserves unfragmented spare resources for
    failover
  • Automatic virtual machine placement and migration
  • Choose initial host when VM powers on
  • Optimize load balance across hosts
  • Dynamic rebalancing using VMotion
  • React to dynamic load changes
  • Evacuate hosts for maintenance and/or power-off

14
Example VMware DRS Balancing Details
  • Compute VM entitlements
  • Based on resource pool and VM resource settings
  • VM demand includes usage and unsatisfied demand
  • Dont give VM more than it demands
  • Reallocate extra resources fairly
  • Compute host loads
  • Load ? utilization unless all VMs equally
    important
  • Sum entitlements for VMs on host
  • Normalize by host capacity
  • Consider possible migrations
  • Evaluate cluster balance impact and risk-adjusted
    cost/benefit
  • Incorporate migration cost for involved hosts
  • Recommend best moves (meeting specified threshold)

15
VMware DRS Simple Balancing Exampleall VMs in
same resource pool with same shares
Recommendation to improve imbalance migrate VM2
16
Distributed Resource Scheduling Exploration Areas
  • I/O resource management
  • Quality of service for networking, storage
  • End-to-end control difficult, complex
    switching/routing fabric
  • Lack of standards, even in non-virtualized
    environments
  • Proactive migrations
  • Detect longer-term trends
  • Move VMs based on predicted load while minimizing
    impact on current load
  • Large-scale WAN/Grid management

17
Outline
  • Distributed Resource Management for Virtualized
    System Clusters
  • Infrastructure
  • Resource Management Basics
  • Distributed Resource Scheduling
  • Distributed Power Management
  • Overview
  • Example
  • Exploration Areas
  • Distributed High Availability
  • Summary

18
Distributed Power Management Overview
  • Useful features
  • Migrate VMs to allow hosts to be powered-off when
    demand low
  • Power on hosts when demand rises or needed to
    satisfy constraints
  • Work in concert with Distributed Resource
    Scheduling and Distributed High Availability
    goals and constraints
  • Configurable automation and utilization threshold
    levels
  • Challenges
  • Non-homogeneous hosts can result in utilization
    hot or cold spots
  • Sudden unpredicted rise in demand can cause
    performance impact
  • Benefits depend on demand valleys of non-trivial
    duration
  • Variety of host wake methods with usability
    pros/cons

19
Distributed Power Management Example VMware DPM
Response to Reduced Demand
  • Implements useful features
  • Consolidates virtual machines (VMs) onto fewer
    hosts powers hosts off when demand is low
  • Powers hosts back on when needed to meet workload
    demand or to satisfy constraints
  • Optional add-on to VMware Distributed Resource
    Scheduler (DRS)

Power Off
DRS Cluster with DPM enabled
20
Example VMware DPM Operation in VC2.5/ESX 3.5
  • Lightly-used Hosts ? Consider Host Power-Off
  • Conservative considers 40-minutes load history
  • All VMs on selected host are migrated to other
    hosts
  • Weighs trade-offs between costs and benefits of
    power-off
  • Host is powered off
  • Heavily-used Hosts ? Consider Host Power-On
  • Responsive considers 5-minute load history
  • Send wake-on-LAN packet or BMC command 2009 to
    host
  • Host boots up
  • DRS load-balancing kicks in and some VMs migrated
    to host
  • Target Host Utilization
  • Range centered around 63 by default

21
Example VMware DRS/DPM Interactions
  • Responsibilities
  • DRS balances load to satisfy service-level
    agreements
  • DPM reduces running cluster capacity to save
    power
  • Both DRS and DPM respect any resources needed for
    failover
  • Interactions
  • DRS rebalances with DPM-recommended power
    actionsin what-if simulations
  • DPM evaluates the impact of potential power
    actions based onDRS rebalancing results
  • Recommendations
  • Final host power actions and VMotions
  • Recommended to user, or applied automatically

22
Distributed Power Management Exploration Areas
  • Use VM demand prediction to drive proactive host
    power-on
  • Incorporate additional metrics in host off/on
    selection
  • Examples host power efficiency, temperature
  • Operate in cooperation with host-level power
    management
  • Example Choose hosts for off/on based on
    power-management features

23
Outline
  • Distributed Resource Management for Virtualized
    System Clusters
  • Infrastructure
  • Resource Management Basics
  • Distributed Resource Scheduling
  • Distributed Power Management
  • Distributed High Availability
  • Overview
  • Example
  • Exploration Areas
  • Summary

24
Distributed High Availability Overview
  • Useful features
  • Provide various methods to specify resources to
    reserve to restart VMs upon failures of their
    hosts in a virtualized system cluster
  • Express whether failover resource reservation is
    strict or best-effort
  • Decentralized host failure detection and quick VM
    restart
  • Work in concert with DRS and DPM goals and
    constraints
  • Challenges
  • Preserving unfragmented failover resources across
    hosts
  • Avoiding conservative allocation of spare
    resources when running VMs have widely different
    resource reservations and needs
  • Support robust failover operation in many
    possible failure situations
  • Provide cluster failover status information in a
    user-friendly way

25
Distributed High Availability Example VMware HA
  • Specify resources to be set aside for failover
  • Number of host failures to tolerate
  • Percentage of cluster capacity 2009
  • Specific hosts to set aside for failover 2009
  • Detect failover and respond
  • Cluster hosts send each other heartbeats when a
    host fails to do so for some period, failover
    response action is launched
  • For failed hosts, their running VMs are restarted
    on other hosts
  • Safe restart supported by locking in ESX server
    storage system
  • At failover, want unfragmented powered-on
    resources
  • CPU,Memory resources for each VM to failover
    available on 1 host
  • Resources must be on powered-on host
  • DRS/DPM/HA proactively maintain appropriate spare
    resources

26
Distributed High Availability Exploration Areas
  • Provide alternative to VM restart on host failure
    via continuing the VM from a secondary copy
    already executing on another cluster host 2009
    enabled by VM snapshot and record/replay
  • Support DRS/DPM aided failover if decentralized
    restart fails use migrations and host power-on
    if needed to get resources
  • Explore detecting/responding to partial host
    failure modes run on host with diminished
    working capacity

27
Outline
  • Distributed Resource Management for Virtualized
    System Clusters
  • Infrastructure
  • Resource Management Basics
  • Distributed Resource Scheduling
  • Distributed Power Management
  • Distributed High Availability
  • Summary

28
Summary
  • Virtualized System clusters with appropriate
    supporting infrastructure can benefit from
    distributed resource scheduling, distributed
    power management, and distributed high
    availability.
  • Each of these technologies has a number of areas
    open for exploration and innovation.
Write a Comment
User Comments (0)
About PowerShow.com