PhD Dissertation Defense EnergyEfficient Proactive Techniques for Safe - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

PhD Dissertation Defense EnergyEfficient Proactive Techniques for Safe

Description:

PhD Dissertation Defense EnergyEfficient Proactive Techniques for Safe – PowerPoint PPT presentation

Number of Views:1202
Avg rating:3.0/5.0
Slides: 69
Provided by: vikrams
Category:

less

Transcript and Presenter's Notes

Title: PhD Dissertation Defense EnergyEfficient Proactive Techniques for Safe


1
PhD Dissertation DefenseEnergy-Efficient
Pro-active Techniques for Safe Survivable
Cyber-Physical Systems
  • By
  • Tridib Mukherjee
  • Committee
  • Prof. Sandeep Gupta
  • Prof. Karamvir Chatha
  • Prof. Partha Dasgupta
  • Prof. Daniel Stanzione

Sponsors
2
Outline
  • Cyber-Physical Systems (CPS)
  • Crisis response planning and preparedness
  • Energy-efficient job management in data centers
  • Ad hoc Networks
  • Conclusions/ Future Research Directions

3
Cyber Physical Systems (CPS)
From interactive to pro-active systems
Courtesy Vanderbilt University Drexel
University
Courtesy Idealog Magazine
  • Pro-active systems can anticipate an event and
    act in advance to avoid or minimize the
    consequences of the event.
  • Migration from Interactive to Pro-active
    computing for systems intimately connected to
    the world around was suggested in 2000 by David
    Tennenhouse, Director of Intel Research.
  • Pro-active CPS can involve actions in both the
    physical and cyber world.
  • Example of pro-active operations in the physical
    environment
  • pre-setting the cooling in data center to avoid
    equipment redline temperatures.
  • preparedness drills for responding to
    crises/disasters.
  • Dynamic distributed systems to monitor,
    coordinate, control, integrate and facilitate
    physical processes
  • Physical environment can consist of human
    inhabitants
  • Computing entities are autonomic and embedded.
  • Operations in computing entities affect the
    physical environment vice versa.
  • Key Issues
  • Physical Interactions
  • Critical Applications

4
Research Problem and Approach
  • Three major problems of pro-active operations in
    CPS
  • can be difficult to achieve under uncertain
    environments in CPSs (e.g. crisis response)
  • can lead to high cost of operation for large
    scale CPS (e.g. data centers)
  • can be highly energy-inefficient for
    energy-constrained computing entities (e.g.
    ad-hoc networks)
  • Research Approach constraint based optimization
    to balance pro-activity for three different
    applications with different objectives and
    requirements
  • Crisis preparedness pro-active planning and
    evaluation of crisis response when actions
    outcomes are uncertain while meeting real-time
    constraints for human survivability.
  • Data centers pro-active job scheduling to
    dynamically reduce cooling demands while meeting
    thermal constraints for equipment safety.
  • Ad hoc networks pro-active route management to
    meet end-to-end reliability constraints while
    minimizing the energy overhead.

How to balance pro-activity depending on system
requirements?
5
Research Contributions
6
Outline
  • Cyber-Physical Systems (CPS)
  • Crisis response planning and preparedness
  • Energy-efficient job management in data centers
  • Ad hoc Networks
  • Conclusions/ Future Research Directions

7
Importance of Crisis Preparedness
  • In 2004, over 4 billion of Homeland Security
    Grants allocated for assistance to the first
    responders.
  • In 2005, 7.4 billion fund budgeted for Emergency
    Preparedness and Response (around 20 of the
    total budget).
  • over 3.5 billion (50) budgeted for assistance
    to first responders.
  • Since March 1, 2003, approximately 8 billion
    awarded to state, tribal and local governments to
    prevent, prepare for, respond to and recover from
    acts of terrorism and all hazards.

8
Critical Application Fire in Building
Critical Event
Additional Critical Events
Detection
Detection
Crisis
Response
Recovery
Preparedness
Evaluation of Crises Response
Trapped People Rescuers
Detect fire using information from sensors
  • Notify 911
  • provide information to the first responders

Detect trapped people
HUMAN INTERACTION
  • Analyze the Spatial Properties
  • how to reach the source of fire
  • which exits are closest
  • is the closest exist free to get out
  • Determine the required actions
  • instruct the inhabitants to go to nearest safe
    place
  • co-ordinate with the rescuers to evacuate
    (normally using ad hoc networks).
  • Requires pro-active evaluation and planning of
    crisis response

Survivability effectiveness of response plan to
avoid disasters (life/property losses)
How to evaluate and plan actions with uncertain
outcomes?
9
Criticality Critical Event Management
  • Critical events
  • Causes emergencies/crisis.
  • Leads to loss of lives/property.
  • Criticality
  • Effects of critical events on the
    smart-infrastructure.
  • Critical State state of the system under
    criticality.
  • Window-of-opportunity (W) temporal constraint
    for criticality.
  • Survivability effectiveness of the criticality
    response actions in minimizing the disasters.

Critical Event
CRITICAL STATE
NORMAL STATE
Timely Criticality Response within
window-of-opportunity
Mismanagement of any criticality
DISASTER (loss of lives/property)
10
Related Work
Preparedness Measures
Unaware of uncertainties
Cumbersome Documents
Model-based Verification
Reliability
Formal Modeling
QOS
Preparedness Drills
Physical lay-out design
Real-time
Objective Evaluation
Personnel Training
Pro-activity
Synergistic Planning
Human
Cyber
Physical
Cyber- physical
Level of Abstraction
  • Different modeling options
  • Hybrid automata can capture continuous time
    dynamics in physical world.
  • A special case is timed I/O automata which can
    time variation for the system.
  • Recent work has focused on probabilistic timed
    automata.
  • We use Markov Decision Process.
  • Can enable developing stochastic planning
    policies.

11
Background on Model based Verification/Analysis
Markov Decision Process based Criticality
Response Model (CRM)
  • Model based analysis normally used to verify
    critical systems such as avionics.
  • no need for actual scenario generation putting
    lives/property at risk.
  • Formal models for abstraction of the system
    behavior.
  • Expected system properties depend on the
    requirements.
  • Formal models analyzed through model checking to
    verify the system properties.
  • We use model based analysis to evaluate
    effectiveness of crisis response processes.

System Behavior
System Requirements
Formal Models
Expected Properties
Model Checking
Property Verification
Requirement Verification
Criticality Response Evaluation Tool (CRET)
CRM can also be used to develop Criticality
Response Planning (CRP) policies
12
Proposed Markov Decision Process Criticality
Response behavior Model (CRM)
  • State-based stochastic model
  • System in different critical states
  • A state represents the combination of
    criticalities in the system
  • States are organized in a hierarchical manner
  • A level in the hierarchy represents the number of
    criticalities in each state in that level
  • Normal state has a level 0 (i.e. there are no
    criticalities in the normal state)
  • Critical Events
  • Makes state transitions down the hierarchy
  • Associated with criticality characteristics
  • window-of-opportunity)
  • Probability of the critical event
  • Time to detect the criticality
  • Mitigative Link
  • Corresponds to response actions
  • Makes state transitions up the state hierarchy.
  • Associated with response action characteristics
  • probability of actions success considering
    uncertainties due to human involvement.
  • Time to complete the action

NORMAL STATE
Mitigative Link (ML)
Critical Event
Survivability probability of reaching normal
state depend on MLs success probabilities,
additional criticality probabilities and
conformity to window-of-opportunity.
T. Mukherjee, K. Venkatasubramanian and S. K. S.
Gupta, Performance Modeling of Critical Event
Management for Ubiquitous Computing Applications,
Proceedings of ACM MSWiM (MSWiM'06),
Terromolinos, Spain, October 2006
13
Reachability to the Normal State
  • Reachability to the normal state from any
    arbitrary critical state s
  • s an immediate upstream state when action a is
    performed.

NORMAL STATE
  • An actions Q-value (qualifiedness) determined by
    probability of reaching normal state when the
    action is performed
  • s an immediate upstream state when action a is
    performed.

sn
WOOP met
s
p(s, a, s)
WOOP NOT met
a
s
Probability of reaching the normal state from
state i
Actions Qualifiedness (Q-value)
s sn
s ? sn WOOP met
WOOP NOT met
Probability of reaching normal state if NO
additional criticality occurs at state i
Probability of reaching normal state if ANY
additional criticality occurs at state i
Probability of a criticality at state i
Normal state is stochastically reachable from a
state iff maximum Q-value from that state is
non-zero.
14
CRP strategies
  • Optimal at each state select action with max
    Q-value.
  • Greedy at each state, select action with
    optimum values of immediate parameters
  • e.g. Minimum Time (MT), Maximum Probability (MP),
    Maximum number of Mitigated Criticalities (MMC).
  • Markov Decision Planning (MDP) At each state,
    select action with maximum utility
  • utility uses the state-based stochastic model
    parameters.

15
MDP-based CRP strategies
  • At a state, an action has utility based on
    actions probability and reward
  • Actions reward function can be a combination of
    the associated parameters
  • Locally maximum criticality Mitigation Per unit
    Time (LMPT)
  • No knowledge of subsequent criticalities in the
    reward.
  • Subsequent Criticality Aware locally maximum
    criticality Mitigation Per unit Time (SCAMPT)
  • Actions reward in LMPT is enhanced with the
    knowledge of probable subsequent criticalities.

expected maximum utility from next state
reward
Reward number of criticality mitigated
per unit time
Reward is same as LMPT except that probabilities
of subsequent criticalities are taken into
account
Tridib Mukherjee, and Sandeep K. S. Gupta,  CRM
A Formal Method to Model Evaluate Crises
Response of Distributed Cyber-Physical Systems,
Under Review in TPDS.
16
CRM for fire emergencies in Offshore Oil Gas
Production Platforms (OGPP)
  • Criticalities
  • c1 Fire Alarm.
  • c2 Imminent danger e.g. health hazards.
  • c3 Assistance required to others e.g. trapped
    personnel.
  • c4 Evacuation path not tenable.

0.5375
0.0154
Fire Alarm
0.0311
0.1849
0.4319
0.5
0.1977
0.2011
Fire Alarm Imminent Danger
Fire Alarm Non-tenable Path
Fire Alarm Assistance Required
0.5562
0.5827
0.371
0.2953
0.449
0.0635
0.3661
0.4764
Window-of-opportunity
Fire Alarm Imminent Danger Assistance Required
Fire Alarm Imminent Danger Non-tenable Path
Fire Alarm Assistance Required Non-tenable
Path
  • survival time under asphyxiation.

0.5447
0.4242
0.5447
State transition probabilities derived from
established probability distribution in 1.
0.4242
0.3803
0.4172
0.0311
Fire Alarm Imminent Danger Non-tenable Path
Assistance Required
Fire Alarm Imminent Danger Assistance
Required Non-tenable Path
1 D. G. DiMattia, F. I. Khan, and P. R.
Amyotte, Determination of human error
probabilities for offshore platform musters,
Journal of Loss Prevention in the Process
Industries, vol. 18, pp. 488501, 2005.
Tridib Mukherjee, and Sandeep K. S. Gupta, A
Modeling Framework for Evaluating Effectiveness
of Smart-Infrastructure Crises Management
Systems , 2008 IEEE International Conference on
Technologies for Homeland Security (HST'08),
Waltham, MA, USA, April 2008
Enables Objective Evaluation of Criticality
Response in OGPP to Improve Crisis Preparedness
17
Sample Q-value Analysis
  • Preparedness Q-value based analysis allow
    comparison among plans for
  • Different number of criticalities
  • Different detection and action completion times
  • Different states (i.e. different combination of
    simultaneous criticalities)

Other applications Resource access control to
facilitate the planned actions under emergencies.
18
Criticality Response Evaluation Tool (CRET)
AADL based Criticality Response System
Architecture Specification
Model based decision
Model Representation
AADL based CRP Specification
AADL based CRM Specification
Model Parsing
AADL OSATE Analysis Plug-ins
XML Representation and Analysis Software using
Matlab
Can specify any response planning policy
transcending beyond the proposed CRP strategies
Q-value Analysis
Model Processing
Preparedness Check Reachability to normal state
based on Q-value analysis
Tridib Mukherjee, and Sandeep K. S. Gupta, CRET
A Crisis Response Evaluation Tool to Improve
Crisis Preparedness, 2009 IEEE International
Conference on Technologies for Homeland Security
(HST'09), Waltham, MA, USA, May 2009
19
Summary of Contributions
  • Crisis Response Model (CRM)
  • Markov decision process based modeling of crisis
    response
  • Development of Q-value as evaluation criteria for
    reachability to normal state
  • Crisis Response Planning (CRP)
  • Optimal and naïve (greedy) strategies
  • Markov decision planning strategies
  • Crisis Response Evaluation Tool
  • Objective evaluation of crisis response

T. Mukherjee, K. Venkatasubramanian and S. K. S.
Gupta, Performance Modeling of Critical Event
Management for Ubiquitous Computing Applications,
Proceedings of ACM MSWiM (MSWiM'06),
Terromolinos, Spain, October 2006
Tridib Mukherjee, and Sandeep K. S. Gupta,  CRM
A Formal Method to Model Evaluate Crises
Response of Distributed Cyber-Physical Systems,
Under Review in TPDS.
Tridib Mukherjee, and Sandeep K. S. Gupta, A
Modeling Framework for Evaluating Effectiveness
of Smart-Infrastructure Crises Management
Systems , 2008 IEEE International Conference on
Technologies for Homeland Security (HST'08),
Waltham, MA, USA, April 2008
Tridib Mukherjee, and Sandeep K. S. Gupta, CRET
A Crisis Response Evaluation Tool to Improve
Crisis Preparedness, 2009 IEEE International
Conference on Technologies for Homeland Security
(HST'09), Waltham, MA, USA, May 2009
K. Venkatasubramanian, T. Mukherjee, and S. K. S.
Gupta, ''CAAC - An Adaptive and Proactive Access
Control Approach for Emergencies for Smart
Infrastructures", Accepted in the Special Issue
on Adaptive Security Systems in ACM Transactions
on Autonomic and Adaptive Systems (TAAS).
S. K. S. Gupta , T. Mukherjee, and K.
Venkatasubramanian, Criticality Aware Access
Control Model For Pervasive Applications",
Proceedings of 4th IEEE Conf. on Pervasive
Computing (PERCOM), Pisa, Italy, 2006.
20
Outline
  • Cyber-Physical Systems (CPS)
  • Crisis response planning and preparedness
  • Energy-efficient job management in data centers
  • Ad hoc Networks
  • Conclusions/ Future Research Directions

21
Importance of the Problem
  • Cooling is the chief driver of increased data
    center construction cost, costing up to 5000 per
    square foot in initial purchase price.
  • Cooling is one of the leading contributors to
    ongoing total cost of ownership, costing one half
    to one watt per watt spent on computation.
  • If we can eliminate even 25 of total cooling
    costs, that can translate to a 1-2 million
    annual cost reduction in a single large data
    center.

22
Related Work
Proactive Approach
Reactive Solutions
23
Heat Interferences in Data Centers
Safetyinlet should be within the red-line
temperature to avoid equipment failure.
Problemcooling has to be pro-actively set very
low to have all inlet temperatures under redline.
Solutionproactive spatio-temporal job scheduling
to minimize interference cooling demands.
24
Typical HPC Job Characteristics
  • Job execution times are usually overestimated
    during submission in HPC data centers.
  • Jobs can be spread over time to reduce peak
    utilization
  • Trade-off with throughput, turn-around time and
    resource utilization.

From job traces at ASU HPC data center
25
Conventional Spatial and Temporal Scheduling
26
Balancing Utilization Over Time
27
Conceptual overview of thermal-aware job
scheduling
Balancing utilization over time reduces the peak
computing resource utilization leaving room for
thermal-aware spatial scheduling at all time
Peak air inlet temperaturedetermines upper bound
toCRAC temperature setting
CRAC temperature settingdetermines its
efficiency(Coefficient of Performance)
Spatial job scheduling (placement) determines
temperature distribution at any time using a
linear thermal model
Coefficient of Performance(source HP)
The lower the peak inlet temperature the higher
the CRAC efficiency
Q. Tang, T. Mukherjee, S. K. S. Gupta, and P.
Cayton, ''Sensor-based Fast Thermal Evaluation
Model for Energy-efficient High-performance
Datacenters", In the International Conf.
Intelligent Sensing Info.Proc. (ICISIP2006), Dec
2006.
Temperature distributiondetermines the
equipmentpeak air inlet temperature
T. Mukherjee, G. Varsamopoulos, S. K. S. Gupta,
and S. Rungta, 'Measurement-based Power
Profiling of Datacenter Equipment", (Extended
Abstract) In the Workshop on Green Computing
(with CLUSTER2007), Austiin, USA, Sep 2007.
There is a spatio-temporal job schedule that
minimizes the total energy (cooling computing)
consumption. Find it!
28
Thermal-aware Job Scheduling Problem
  • PROBLEM Given a set of incoming jobs, find a job
    scheduling (i.e. job start times) and placement
    (i.e. server assignment) to minimize the total
    data center energy consumption subject to meeting
    of job deadlines (submitted times for execution)
    requires 3D (job x server x time)
    decision-making.

Cooling Energy
Supply Temperature Upper Bound
Computing Energy
Job Migration Overhead
Capacity Constraint server assigned less server
available
Server Required Required no. of servers assigned
for jobs
Deadline Constraint job finish time less than
deadline
Arrival Constraint job start time later than
arrival
T. Mukherjee, A. Banerjee, G. Varasamopoulos, and
S. K. S. Gupta, Spatio-temporal Thermal-Aware
Job Scheduling to Minimize Energy Consumption in
Virtualized Heterogeneous Data Centers", Elsevier
Journal on Computer Networks (ComNet), Special
Issue on Virtualized Data Centers, ACCEPTED
(2009).
29
Thermal-aware Job Scheduling Algorithms
  • SCINT Algorithm Heuristic solution (genetic
    algorithm)
  • Take a feasible solution and perform mutations
    until certain number of iterations.
  • Spreads the jobs over time while meeting the
    deadline.
  • Offline in nature requiring the job backlog
    information
  • Takes hours of operation.
  • EDF-LRH Algorithm Tries to mimic the behavior of
    SCINT by spreading jobs using the Earliest
    Deadline First (EDF) scheduling approach.
  • Place jobs to servers contributing the Lowest
    Recirculated Heat (LRH)
  • Online in nature maintaining EDF job queues as
    and when jobs arrive
  • Takes milliseconds of operation.
  • FCFS Algorithm Does not conventional temporal
    scheduling approach but uses thermal-aware job
    placement techniques for energy-savings.
  • Place jobs to servers contributing the Lowest
    Recirculated Heat (LRH)
  • Online in nature taking milliseconds of
    operations

T. Mukherjee, A. Banerjee, G. Varasamopoulos, and
S. K. S. Gupta, Spatio-temporal Thermal-Aware
Job Scheduling to Minimize Energy Consumption in
Virtualized Heterogeneous Data Centers", Elsevier
Journal on Computer Networks (ComNet), Special
Issue on Virtualized Data Centers, ACCEPTED
(2009).
30
Total Energy Consumption
  • SCINT saves up to 60 of energy consumption.
  • EDF-LRH mimics the behavior of SCINT specially
    for low average data center utilization.

31
Summary of Contributions
  • Problem Formulation to minimize
    energy-consumption in data centers
  • Spatio-temporal thermal-aware job scheduling
    algorithms to
  • Offline algorithm SCINT
  • Online algorithm EDF-LRH
  • Measurement based power profiling of data center
    equipment
  • Linear power model
  • Preliminary software architecture
  • Configure MOAB for thermal-aware job placement.

Q. Tang, T. Mukherjee, S. K. S. Gupta, and P.
Cayton, ''Sensor-based Fast Thermal Evaluation
Model for Energy-efficient High-performance
Datacenters", In the International Conf.
Intelligent Sensing Info.Proc. (ICISIP2006), Dec
2006.
T. Mukherjee, G. Varsamopoulos, S. K. S. Gupta,
and S. Rungta, 'Measurement-based Power
Profiling of Datacenter Equipment", (Extended
Abstract) In the Workshop on Green Computing
(with CLUSTER2007), Austiin, USA, Sep 2007.
T. Mukherjee, A. Banerjee, G. Varasamopoulos, and
S. K. S. Gupta, Spatio-temporal Thermal-Aware
Job Scheduling to Minimize Energy Consumption in
Virtualized Heterogeneous Data Centers", Elsevier
Journal on Computer Networks (ComNet), Special
Issue on Virtualized Data Centers, ACCEPTED
(2009).
T. Mukherjee, Q. Tang, C. Ziesman, S. K. S.
Gupta, and P. Cayton, Spftware Architecture for
Dynamic Thermal Management in Data Centers",
International Conference on Communication Systems
Software (COMSWARE), Bangalore, India, Jan, 2007.
T. Mukherjee, Q. Tang, C. Ziesman, and S. K. S.
Gupta, Dynamic Thermal Control and Management
towards Reducing Utility Cost in Data Centers ",
International Workshop on Feedback Control
Implementation and Design in Computing Systems
and Networks (FeBID), 2006.
T. Mukherjee, S. K. S. Gupta, and P. Cayton, emo
- Temparature-aware job placement in data centers
using Moab cluster management software ",
Research_at_Intel Day, Intel, Santa Clara, June,
2006.
32
Outline
  • Cyber-Physical Systems (CPS)
  • Crisis response planning and preparedness
  • Energy-efficient job management in data centers
  • Ad hoc Networks
  • Conclusions/ Future Research Directions

33
Optimum Tuning of Pro-active Route Maintenance in
ad-hoc networks
34
Application-aware Adaptive Optimization Sub-layer
35
Proactive Routing Protocol Classification and
Research Contributions
Employs Beacons, Triggered Updates
Employs only Beacons
Employs Beacons, Periodic Updates
Employs Beacons, Periodic, Triggered Update
WRP, OLSR etc.
BFST, SS-SPST etc.
FSR, IARP etc.
DSDV, TBRPF etc.
  • Contributions
  • Analytical Model for determining optimum ß f
    for different proactive protocols.1,2,3
  • Developing a PPB type of protocol maintaining
    energy-efficient routes.
  • Improves Self-Stabilizing Shortest Path Spanning
    Tree (SS-SPST) for energy-efficiency. 4,5

1T. Mukherjee, S. K. S. Gupta, and G.
Varasamopoulos, ''Energy Optimization for
Proactive Unicast Route Maintenance in MANETs
under End-to-End Reliability Requirements", In
Elsevier Journal on Performance Evaluation, Vol.
66, Issue 3-5, Pages 141-157, Mar, 2009.
2T. Mukherjee, S. K. S. Gupta, and G.
Varasamopoulos, ''Analytical Model for Optimizing
Periodic Route Maintenance in Proactive Routing
for MANETs", In the Proc of ACM MSWiM, Crete
Island, Greece, Oct 2007.
3T. Mukherjee, S. K. S. Gupta, and G.
Varasamopoulos, ''Application-Aware Adaptive
Tuning of Proactive Routing Protocols for
MANETs", Under review in Transactions on
Autonomic and Adaptive Systems (TAAS).
4T. Mukherjee, G. Varasamopoulos, and S. K. S.
Gupta, ''Self-Managing Energy-Efficient Multicast
Support in MANETs under End-to-End Reliability
Constraints", In Elsevier Journal on Computer
Networks (ComNet), Vol. 53, Issue 10, Pages
1603-1627, July, 2009.
5T. Mukherjee, G. Sridharan, and S. K. S. Gupta,
''Energy-Aware Self-Stabilization in Mobile Ad
Hoc Networks A Multicasting Case Study", In the
21st IEEE Int'l Parallel and Distributed
Processing Symposium (IPDPS), Long Beach,
California, 26-30th March, 2007.
36
Outline
  • Cyber-Physical Systems (CPS)
  • Crisis response planning and preparedness
  • Energy-efficient job management in data centers
  • Ad hoc Networks
  • Conclusions/ Future Research Directions

37
Conclusions
  • Pro-activity need to be incorporated in a
    synergistic manner to ensure safety and
    survivability in the CPSs.
  • Pro-activity require handling of uncertain
    outcomes of the pro-active actions
  • Pro-activity leads to high energy consumption.
  • Crisis preparedness and planning for human
    survivability under crisis
  • Abstracting the crisis response behavior as
    system-as-a-whole can take into account the human
    uncertainties in the physical world.
  • Facilitates stochastic planning and evaluation of
    the crisis response
  • Model based verification and analysis enables the
    crisis response evaluation in an objective
    manner.
  • Dynamic determination of the period of route
    maintenance in the ad hoc networks can
    effectively balance the energy-reliability
    trade-off.
  • Data center thermal management for thermal safety
    of the equipment
  • Dynamically reducing cooling demands through
    thermal-aware job scheduling and placement can
    save up to 60 of the energy consumption while
    ensuring the users perception of job completion.

38
Future Research Directions
  • Abstract modeling for CPS
  • physical interference modeling
  • can be governed by differential equations for
    physical dynamics.
  • Crisis preparedness
  • Considering action cost in the analysis of
    response processes
  • Enhance the actions Q-value with the cost
  • Model dynamics in complex scenarios
  • dynamic unpredictable state-space instead of
    static predictable state-space
  • Model composition in distributed and composite
    systems
  • derive system-level global stochastic model by
    combining multiple sub-system-level local
    stochastic models (e.g. fire in a hospital
    require two sub-systems i) fire management and
    ii) medical emergency management
  • Data center
  • Integration of power management with
    thermal-aware job scheduling
  • Integration of cooling control with thermal-aware
    job scheduling to develop a synergistic control
    architecture.

39
Questions ??
Impact Lab (http//impact.asu.edu) Creating
Humane Technologies for Ever-Changing World
40
Background
  • Pro-active systems can anticipate an event and
    act in advance to avoid or minimize the
    consequences of the event.
  • Pro-active CPS is necessary to address the
    following design requirements
  • Safety impact of the physical interactions
    should remain within a desirable limit to avoid
    any damage to the physical and computing
    entities.
  • Survivability the operations in the physical and
    cyber subsystems ensure and/or incur no harm to
    the human inhabitants.
  • Migration from Interactive to Pro-active
    computing for systems intimately connected to
    the world around was suggested in 2000 by David
    Tennenhouse, Director of Intel Research.
  • Pro-active CPS can involve actions in both the
    physical and cyber world.
  • Example of pro-active operations in the physical
    environment
  • Safety pre-setting the cooling in data center to
    avoid equipment redline temperatures.
  • Survivability preparedness drills for responding
    to crises/disasters.
  • Example of pro-active operations in the cyber
    world
  • Safety schedule jobs in data centers such that
    equipment redline temperatures avoided.
  • Survivability pro-active route maintenance in ad
    hoc networks employed for crisis response to
    ensure low latency for information exchange.

41
Example Cyber-Physical Systems
  • Utilities
  • Advanced Electric Power Grid
  • Water Distribution
  • Pressure Pipes Gas/Oil
  • Search Rescue
  • Crisis Response, etc.
  • Monitoring Systems
  • Pervasive Health Monitoring
  • Monitoring of fire and chemical radiation plumes
  • Wild-life Monitoring
  • Forest Monitoring

42
Design Decisions
Critical applications should be able to
avoid/handle dangerous physical conditions (e.g.
life/property losses).
Security
Survivability
Reliability
Real-time
Safety
Quality
Interactions between physical and cyber
components should not detrimentally impact the
physical conditions.
This dissertation focuses on the safety
survivability of CPS
43
Physical Interactions (Interference)
44
Reachability Metric
NORMAL STATE
  • Reachability in terms of Q-value or Qualifiedness
    of actions
  • probability of reaching normal state based on
  • Probabilities of MLs.
  • Probabilities of CLs at intermidiate states.
  • Conformity to timing requirements.

Q-value is a quantitative measure to evaluate
crises response.
Critical Link (CL)
Mitigative Link (ML)
45
Execution Times
46
AADL based criticality response system
architecture specification
47
Criticality Specification
48
State and State Transition Specification
49
State and State Transition Specification
Criticalities
Events in System
Critical States
System Modes
Event Dependent Mode Transition
State Transitions
Response Actions
Windows of Opportunity
Mode Properties
Action Times
mapped to
AADL Constructs
MCMA Components
50
Sample Schema for Intermediate XML representation
Allows any expressions to specify policies
51
Thermal issues in Data Centers
  • Heat recirculation
  • Hot air from the equipment air outlets is fed
    back to the equipment air inlets
  • Hot spots
  • Effect of Heat Recirculation
  • Areas in the data center with alarmingly high
    temperature
  • Consequence
  • Cooling has to be set very low to have all inlet
    temperatures in safe operating range
  • Solution
  • Jobs to be placed to minimize heat-recirculation
  • Linear thermal model developed previously to
    predict the chassis inlet from equipment
    utilization.

Courtesy Intel Labs
Q. Tang, T. Mukherjee, S. K. S. Gupta, and P.
Cayton, ''Sensor-based Fast Thermal Evaluation
Model for Energy-efficient High-performance
Datacenters", In the International Conf.
Intelligent Sensing Info.Proc. (ICISIP2006), Dec
2006.
T. Mukherjee, G. Varsamopoulos, S. K. S. Gupta,
and S. Rungta, 'Measurement-based Power
Profiling of Datacenter Equipment", (Extended
Abstract) In the Workshop on Green Computing
(with CLUSTER2007), Austiin, USA, Sep 2007.
52
Instrumentation
On-site Set-up
Remote Power Meter Reading
Chassis
NETWORK
DualCom Power Meter
SNMP Control
Power Supply (208 V)
T. Mukherjee, G. Varasamopoulos, S. K. S. Gupta,
and Sanjay Rungta, ''Measurement based Power
Profiling of Data Center Equipment, In the First
International Worshop of Green Computing (in
conjunction with CLUSTER 2007), Austin, USA,
Sept, 2007
53
Equipment Power Consumption
Power Supply
Blade Server Power
Empty Chassis Power
Memory Power
Hard Disk Power
CPU Power
54
Power Model
  • Power Consumption is mainly affected by the CPU
    utilization
  • Power consumption is linear to the CPU
    utilization

P a U b
T. Mukherjee, G. Varsamopoulos, S. K. S. Gupta,
and S. Rungta, 'Measurement-based Power
Profiling of Datacenter Equipment", (Extended
Abstract) In the Workshop on Green Computing
(with CLUSTER2007), Austiin, USA, Sep 2007.
55
Linear Thermal Model
  • Heat Recirculation Coefficients
  • Analytical
  • Matrix-based
  • Properties of model
  • Granularity at air inlets
  • Assumes steadiness of air flow

P a U b
Max(Tin) lt Tred
Tin
Tsup
D
P
Tsup lt Tred Max(DxP)



Q. Tang, T. Mukherjee, S. K. S. Gupta, and P.
Cayton, ''Sensor-based Fast Thermal Evaluation
Model for Energy-efficient High-performance
Datacenters", In the International Conf.
Intelligent Sensing Info.Proc. (ICISIP2006), Dec
2006.
heat distribution
powervector
inlettemperatures
supplied airtemperatures
56
Thermal-aware Job Scheduling
  • PROBLEM Given a set of incoming jobs, find a job
    scheduling (i.e. job start times) and placement
    (i.e. server assignment) to minimize the total
    data center energy consumption subject to meeting
    of job deadlines (submitted times for execution)
    requires 3D (job x server x time)
    decision-making.
  • SCINT Algorithm Heuristic solution (genetic
    algorithm)
  • Take a feasible solution and perform mutations
    until certain number of iterations.
  • Spreads the jobs over time while meeting the
    deadline.
  • Offline in nature requiring the job backlog
    information
  • Takes hours of operation.
  • EDF-LRH Algorithm Tries to mimic the behavior of
    SCINT by spreading jobs using the Earliest
    Deadline First (EDF) scheduling approach.
  • Place jobs to servers contributing the Lowest
    Recirculated Heat (LRH)
  • Online in nature maintaining EDF job queues as
    and when jobs arrive
  • Takes milliseconds of operation.

T. Mukherjee, A. Banerjee, G. Varasamopoulos, and
S. K. S. Gupta, Spatio-temporal Thermal-Aware
Job Scheduling to Minimize Energy Consumption in
Virtualized Heterogeneous Data Centers", Elsevier
Journal on Computer Networks (ComNet), Special
Issue on Virtualized Data Centers, ACCEPTED
(2009).
57
Energy Consumption
  • Total Power Computing Cooling Power
  • Cooling power depends on the computing power and
    the COP.
  • Energy consumption is the total power multiplied
    by the observed period of time.

Ptot Pcomp Pcool
Ptot Pcomp Pcomp/COP(Tsup) Pcomp
Pcomp/COP(Tred max(D x P))
E Ptot x time
58
Software Architecture
Presentation
Scheduling Control
Access data from the chassis level sensors
Datacenter Servers
59
Modularized Implementation of Thermal Awareness
in Task Scheduling
T. Mukherjee, Q. Tang, C. Ziesman, S. K. S.
Gupta, and P. Cayton, ''Software Architecture for
Dynamic Thermal Management in Datacenters", In
the International Conf. Communication System
Software Middleware (COMSWARE), Bangalore, India,
Jan 2007.
60
Proactive Route Maintenance Operations in MANETs
  • Overhead
  • Periodic beacon messages for link state
    maintenance.
  • Periodic route update bcast.
  • Triggered route update bcast with each link
    change.

E x N x ?logN? / ß
E x N3 x ?logN? / f
E x N3 x ?logN? for each triggered update
High Energy Overhead in Maintenance Operations
Reduces Applicability
Low Scalability
Reduce maintenance operations and find optimum ß
f to minimize energy overhead.
61
PDR Constraint
  • Derived through Packet Deliver Ratio Required
  • P Probability of packet loss due to single link
    failure.
  • P ? x route-reconstruction delay.
  • Packet Delivery Ratio (1 - P)D.
  • (1 - P)D gt ?.

P
Function of link change and application traffic
distribution
Application reliability requirement
route-reconstruction delay lt 1 ?1/D / ?
62
Optimizations for different Pro-active Protocols
63
Sample Results
64
Sample Results
65
Self-stabilization in Distributed Computing
  • A distributed system is self-stabilizing if it
  • Guarantees convergence to valid global state in
    finite time from any invalid state based on local
    actions in distributed nodes.
  • Ensures closure by keeping the system in valid
    state unless faults occur.
  • Self-stabilization can adapt to topological
    changes and node failures in MANETs based on
    local actions.

Topological Changes and Node Failures for MANETs.
Fault
Closure
Invalid State
Valid State
Convergence
Local actions in distributed nodes.
Applied to Multicasting in MANETs
66
PPB Type of Multicast Routing using
Self-Stabilization
Multicast source
  • Maintains source-based multi-cast tree.
  • Actions based on local information in the nodes
    and neighbors.
  • Pro-active neighbor monitoring through periodic
    beacon messages.
  • Neighbor check at each round (with at least one
    beacon reception from all the neighbors)
  • Execute actions only in case of changes in the
    neighborhood.

Topological Change
Convergence Based on Local actions
Problem energy-efficiency
is not considered
Self-Stabilizing Spanning Tree
67
Energy Aware Self-Stabilizing Protocol (SS-SPST-E)
  • Actions at each node
  • (parent selection)
  • Identify potential parents.
  • Estimate additional cost after joining potential
    parent.
  • Select parent with minimum additional cost.
  • Change distance to root.

Loop Detected
E
Not in tree
F
A
B
D
C
X
AdditionalCost (B ? X) TB R
Potential Parents of X
AdditionalCost (A ? X) TA 2R
  • Action Triggers
  • Parent disconnection.
  • Parent cost not minimum.
  • Change in distance of parent to root.

Select Parent with minimum Additional Cost
Minimum overall cost when parent is locally
selected
Execute action when any action trigger is on
Tree validity Tree will remain connected
with no loops.
68
Sample Result
Write a Comment
User Comments (0)
About PowerShow.com