Addressing Complexity in Emerging Cyber-Ecosystems - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Addressing Complexity in Emerging Cyber-Ecosystems

Description:

Addressing Complexity in Emerging Cyber-Ecosystems Experiments with Autonomic Computational Science Manish Parashar* Center for Autonomic Computing – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 49
Provided by: nescAcUk91
Category:

less

Transcript and Presenter's Notes

Title: Addressing Complexity in Emerging Cyber-Ecosystems


1
Addressing Complexity in Emerging
Cyber-Ecosystems Experiments with Autonomic
Computational Science
  • Manish Parashar
  • Center for Autonomic Computing
  • The Applied Software Systems Laboratory
  • Rutgers, The State University of New Jersey
  • In collaboration with S. Jha O. Rana

2
Outline of My Presentation
  • Computational Ecosystems
  • Unprecedented opportunities, challenges
  • Autonomic computing A pragmatic approach for
    addressing complexity!
  • Experiments with autonomics for science and
    engineering
  • Concluding Remarks

3
The Cyberinfrastructure Vision
  • Cyberinfrastructure integrates hardware for
    computing, data and networks, digitally-enabled
    sensors, observatories and experimental
    facilities, and an interoperable suite of
    software and middleware services and tools
  • - NSFs Cyberinfrastructure Vision for 21st
    Century Discovery
  • A global phenomenon several LARGE deployments
  • UK National Grid Service (NGS) /European Grid
    Infrastructure (EGI), TeraGrid, Open Science Grid
    (OSG), EGEE, Cybera, DEISA, etc., etc.
  • New capabilities for computational science and
    engineering
  • seamless access
  • resources, services, data, information,
    expertise,
  • seamless aggregation
  • seamless (opportunistic) interactions/couplings

4
Cyberinfrastructure gt Cyber-Ecosystems
  • 21st century Science and Engineering
  • New Paradigms Practices
  • Fundamentally data-driven/data intensive
  • Fundamentally collaborative

5
Unprecedented opportunities for
Science/Engineering
  • Knowledge-based, information/data-driven,
    context/content-aware computationally intensive,
    pervasive applications
  • Crisis management, monitor and predict natural
    phenomenon, monitor and manage engineered
    systems, optimize business processes
  • Addressing applications in an end-to-end manner!
  • Opportunistically combine computations,
    experiments, observations, data, to manage,
    control, predict, adapt, optimize,
  • New paradigms and practices in science and
    engineering?
  • How can it benefit current applications?
  • How can it enable new thinking in science?

6
The Instrumented Oil Field (with UT-CSM, UT-IG,
OSU, UMD, ANL)
Detect and track changes in data during
production. Invert data for reservoir
properties. Detect and track reservoir
changes. Assimilate data reservoir properties
into the evolving reservoir model. Use
simulation and optimization to guide future
production.
Data Driven
Model Driven
7
Many Application Areas .
  • Hazard prevention, mitigation and response
  • Earthquakes, hurricanes, tornados, wild fires,
    floods, landslides, tsunamis, terrorist attacks
  • Critical infrastructure systems
  • Condition monitoring and prediction of future
    capability
  • Transportation of humans and goods
  • Safe, speedy, and cost effective transportation
    networks and vehicles (air, ground, space)
  • Energy and environment
  • Safe and efficient power grids, safe and
    efficient operation of regional collections of
    buildings
  • Health
  • Reliable and cost effective health care systems
    with improved outcomes
  • Enterprise-wide decision making
  • Coordination of dynamic distributed decisions for
    supply chains under uncertainty
  • Next generation communication systems
  • Reliable wireless networks for homes and
    businesses
  • Report of the Workshop on Dynamic Data Driven
    Applications Systems, F. Darema et al., March
    2006, www.dddas.org

Source M. Rotea, NSF
8
The Challenge Managing Complexity, Uncertainty
  • System
  • Very large scales
  • Disruptive trends
  • many/multi-cores, accelerators, clouds
  • Heterogeneity
  • capability, connectivity, reliability,
    guarantees, QoS
  • Dynamics
  • Ad hoc structures, failure
  • Distributed system!
  • Lack of guarantees, common/complete knowledge,
  • Emerging concerns
  • Power, resilience,
  • Data and Information
  • Scale, heterogeneity
  • Availability, resolution, quality
  • Semantics, meta data, data models, provenance
  • Trust in data, .
  • Application
  • Compositions
  • Dynamic behaviors
  • Dynamic and complex couplings
  • Software/systems engineering issues
  • Emergent rather than by design

9
The Challenge Managing Complexity, Uncertainty
(I)
  • Increasing application, data/information, system
    complexity
  • Scale, heterogeneity, dynamism, unreliability,
  • New application formulations, practices
  • Data intensive and data driven, coupled, multiple
    physics/scales/resolution, adaptive,
    compositional, workflows, etc.
  • Complexity/uncertainty must be simultaneously
    addressed at multiple levels
  • Algorithms/Application formulations
  • Asynchronous/chaotic, failure tolerant,
  • Abstractions/Programming systems
  • Adaptive, application/system aware, proactive,
  • Infrastructure/Systems
  • Decoupled, self-managing, resilient,

10
The Challenge Managing Complexity, Uncertainty
(II)
  • The ability of scientists to realize the
    potential of computational ecosystems is being
    severely hampered due to the increased complexity
    and dynamism of the applications and computing
    environments.
  • To be productive, scientists often have to
    comprehend and manage complex computing
    configurations, software tools and libraries as
    well as application parameters and behaviors.
  • Autonomics and self- can help ?
  • (with the plumbing for starters)

11
Outline of My Presentation
  • Computational Ecosystems
  • Unprecedented opportunities, challenges
  • Autonomic computing A pragmatic approach for
    addressing complexity!
  • Experiments with autonomics for science and
    engineering
  • Concluding Remarks

12
The Autonomic Computing Metaphor
  • Current paradigms, mechanisms, management tools
    are inadequate to handle the scale, complexity,
    dynamism and heterogeneity of emerging systems
    and applications
  • Nature has evolved to cope with scale,
    complexity, heterogeneity, dynamism and
    unpredictability, lack of guarantees
  • self configuring, self adapting, self optimizing,
    self healing, self protecting, highly
    decentralized, heterogeneous architectures that
    work !!!
  • Goal of autonomic computing is to enable
    self-managing systems/applications that addresses
    these challenges using high level guidance
  • Unlike AI duplication of human thought is not the
    ultimate goal!

Autonomic Computing An Overview, M. Parashar,
and S. Hariri, Hot Topics, Lecture Notes in
Computer Science, Springer Verlag, Vol. 3566, pp.
247-259, 2005.
13
Motivations for Autonomic Computing
Sourcehttp//www.almaden.ibm.com/almaden/talks/Mo
rris_AC_10-02.pdf
2/27/07 Dow fell 546. Since worst plunge took
place after 230 pm, trading limits were not
activated
Source httpidc 2006
8/3/07 (EPA) datacenter energy use by 2011 will
cost 7.4 B, 15 power plants, 15 Gwatts/hour peak
8/1/06 UK NHS hit with massive computer outage.
72 primary care 8 acute hospital trusts
affected.
8/12/07 20K people 60 planes held at LAX after
computer failure prevented customs from screening
arrivals
Key Challenge Current levels of scale,
complexity and dynamism make it infeasible for
humans to effectively manage and control systems
and applications
14
Autonomic Computing A Pragmatic Approach
  • Separation Integration Automation !
  • Separation of knowledge, policies and mechanisms
    for adaptation
  • The integration of selfconfiguration, healing,
    protection,optimization,
  • Self- behaviors build on automation concepts and
    mechanisms
  • Increased productivity, reduced operational
    costs, timely and effective response
  • System/Applications self-management is more than
    the sum of the self-management of its individual
    components

M. Parashar and S. Hariri, Autonomic Computing
Concepts, Infrastructure, and Applications, CRC
Press, Taylor Francis Group, ISBN
0-8493-9367-1, 2007.
15
Autonomic Computing Theory
  • Integrates and advances several fields
  • Distributed computing
  • Algorithms and architectures
  • Artificial intelligence
  • Models to characterize,
  • predict and mine
  • data and behaviors
  • Security and reliability
  • Designs
  • and models of
  • robust systems
  • Systems and software architecture
  • Designs and models of
  • components at different IT layers
  • Control theory
  • Feedback-based control and estimation
  • Systems and signal processing theory
  • System and data models and optimization methods
  • Requires experimental validation
  • (From S. Dobson et al.,
  • ACM Tr. on Autonomous Adaptive Systems,
  • Vol. 1, No. 2, Dec. 2006.)

16
Some Information Sources
  • Autonomic Computing Concepts, Infrastructure
    and Applications, M. Parashar and S. Hariri
    (Ed.), CRC Press, ISBN 0-8493-9367-1 (Available
    at http//www.crcpress.com/)
  • NSF Center on Autonomic Computing
  • http//nsfcac.rutgers.edu
  • http//www.nsfcac.org
  • Autonomic Computing Portal
  • http//www.autnomiccomputing.org
  • IEEE International Conference on Autonomic
    Computing
  • http//www.autonomic-conference.org
  • IEEE Task Force on Autonomous and Autonomic
    Systems
  • http//tab.computer.org/aas/

17
Autonomics for Science and Engineering ?
  • Autonomic computing aims at developing systems
    and application that can manage and optimize
    themselves using only high-level guidance or
    intervention from users
  • dynamically adapt to changes in accordance with
    business policies and objectives and take care of
    routine elements of management
  • Separation of management and optimization
    policies from enabling mechanisms
  • allows a repertoire of a mechanisms to be
    automatically orchestrated at runtime to respond
    to heterogeneity, dynamics, etc.
  • E.g., develop strategies that are capable of
    identifying and characterizing patterns at design
    and at runtime and, using relevant (dynamically
    defined) policies, managing and optimizing the
    patterns.
  • Application, Middleware, Infrastructure
  • Manage application/information/system complexity
  • not just hide it!
  • Enabling new thinking, formulations
  • how do I think about/formalize my problem
    differently?

18
A Conceptual Framework for ACS (GMAC 07, with
S. Jha and O. Rana)
  • Hierarchical
  • Within and across level

19
Crosslayer Autonomics
20
Existing Autonomic Practices in Computational
Science (GMAC 09, SOAR 09, with S. Jha and O.
Rana)
Autonomic tuning of the application
Autonomic tuning by the application
21
Spatial, Temporal and Computational Heterogeneity
and Dynamics in SAMR
Temperature
Simulation of combustion based on SAMR (H2-Air
mixture ignition via 3 hot-spots)
OH Profile
Courtesy Sandia National Lab
22
Autonomics in SAMR
  • Tuning by the application
  • Application level when and where to refine
  • Runtime/Middleware level When, where, how to
    partition and load balance
  • Runtime level When, where, how to partition and
    load balance
  • Resource level Allocate/de-allocate resources
  • Tuning of the application, runtime
  • When/where to refine
  • Latency aware ghost synchronization
  • Heterogeneity/Load-aware partitioning and
    load-balancing
  • Checkpoint frequency
  • Asynchronous formulations

23
Outline of My Presentation
  • Computational Ecosystems
  • Unprecedented opportunities, challenges
  • Autonomic computing A pragmatic approach for
    addressing complexity!
  • Experiments with autonomics for science and
    engineering
  • Concluding Remarks

24
Autonomics for Science and Engineering
Application-level Examples
  • Autonomic to address complexity in science and
    engineering
  • Autonomic as a paradigm for science and
    engineering
  • Some examples
  • Autonomic runtime management multiphysics,
    adaptive mesh refinement
  • Autonomic data streaming and in-network data
    processing coupled simulations
  • Autonomic deployment/scheduling HPC Grid/Cloud
    integration
  • Autonomic workflows simulation based
    optimization
  • (Many system level examples not presented here )

25
Adaptive Methods in Science and Engineering
26
Autonomic (Physics/Model/System Driven) Runtime
Management
Hybrid Runtime Management of Space-Time
Heterogeneity for Dynamic SAMR Applications, X.
Li and M. Parashar, IEEE TPDS 18(8), pp. 1202
1214, August 2007.
27
Cross-layer Adaptations for SAMR
When resources are under-utilized
When resources are scarce
ALP Trade in space (resource) for time
(performance) ALOC Trade in time (performance)
for space (resource)
28
Experimental Results - ALP
Performance gain up to 40 on 512 processors
Experiment Setup IBM SP4 cluster (DataStar at
San Diego Supercomputing Center, total 1632
processors) SP4 (p655) node 8 processors(1.5
GHz), memory 16 GB, 6.0 GFlops
29
Effects of Finite Memory - ALOC
Intel Pentium 4 CPU 1.70GHz, Linux 2.4
kernel Cache size 256 KB, Physical memory 512
M, Swap space 1 G.
30
Experimental Results - ALOC
Boewulf Cluster (Frea at Rutgers, 64
processors) Intel Pentium 4 CPU 1.70GHz, Linux
2.4 kernel Cache size 256 KB, Physical memory
512 M, Swap space 1 G.
31
Coupled Fusion Simulations A Data Intensive
Workflow
32
Autonomic Data Streaming and In-Transit
Processing for Data-Intensive Workflows
  • Large-scale distributed environments and data
    intensive workflows
  • Applications entities separated in space and time
  • Seamless interactions and couplings across
    entities
  • Distributed application entities need to interact
    at runtime
  • Data processing, interactive data monitoring,
    online data analysis, visualization,
    data/service/vm migration, data archiving,
    collaboration, etc.
  • Large data volumes and rates, heterogeneous data
    types
  • Must be streamed efficiently and effectively
    between distributed application components
  • Application-specific manipulations need to be
    applied in-transit

An Self-Managing Wide-Area Data
Streaming Service, V. Bhat, M. Parashar, H.
Liu, M. Khandekar, N. Kandasamy, S. Klasky, and
S. Abdelwahed, Cluster Computing The Journal of
Networks, Software Tools, and Applications,
Volume 10, Issue 7, pp. 365 383, December 2007.
33
Autonomic Data Streaming and In-Transit
Processing for Data-Intensive Workflows
  • Workflow with coupled simulation codes, i.e., the
    edge turbulence particle-in-cell (PIC) code (GTC)
    and the microscopic MHD code (M3D) -- run
    simultaneously on separate HPC resources
  • Data streamed and processed enroute -- e.g. data
    from the PIC codes filtered through noise
    detection processes before it can be coupled
    with the MHD code
  • Efficiently data streaming between live
    simulations -- to arrive just-in-time -- if it
    arrives too early, times and resources will have
    to be wasted to buffer the data, and if it
    arrives too late, the application would waste
    resources waiting for the data to come in
  • Opportunistic use of in-transit resources

An Self-Managing Wide-Area Data
Streaming Service, V. Bhat, M. Parashar, H.
Liu, M. Khandekar, N. Kandasamy, S. Klasky, and
S. Abdelwahed, Cluster Computing The Journal of
Networks, Software Tools, and Applications,
Volume 10, Issue 7, pp. 365 383, December 2007.
34
Autonomic Data Streaming In-Transit Processing
  • Application level
  • Proactive QoS management strategies using
    model-based LLC controller
  • Capture constraints for in-transit processing
    using slack metric
  • In-transit level
  • Opportunistic data processing using dynamic
    in-transit resource overlay
  • Adaptive run-time management at in-transit nodes
    based on slack metric generated at application
    level
  • Adaptive buffer management and forwarding

35
Autonomics for Coupled Fusion Simulation Workflows
36
Autonomic Streaming Implementation/Deployment
  • Simulation Workflow
  • SS Simulation Service (GTC)
  • ADSS Autonomic Data Streaming Service
  • CBMS LLC Controller based buffer management
    service
  • DTS Data Transfer service
  • DAS Data Analysis Service
  • SLAMS Slack Manager Service
  • PS Processing Service
  • BMS Buffer Management Service
  • ArchS Archiving data at sink
  • Simulations executes on leadership class machines
    at ORNL and NERSC
  • In-transit nodes located at PPPL and Rutgers

37
Adaptive Data Transfer
  • No congestion in intervals 1-9
  • Data transferred over WAN
  • Congested at intervals 9-19
  • Controller recognizes this congestion and advises
    the Element Manager, which in turn adapts DTS to
    transfer data to local storage (LAN).
  • Adaptation continues until the network is not
    congested
  • Data sent to the local storage by the DTS falls
    to zero at the 19th controller interval.

38
Adaptation of the Workflow
  • Create multiple instances of the Autonomic Data
    Streaming Service (ADSS)
  • Effective Network Transfer Rate dips below the
    threshold (our case around 100Mbs)

Transfer
Simulation
ADSS-0
Buffer
Data Transfer
ADSS-1
Buffer
Data Transfer
Network throughput is difference between the
max and current network transfer rate
ADSS-2
Buffer
Data Transfer
39
Buffer Occupancy _at_ In-Transit Nodes w w/o
Coupling
  • Buffer occupancy at in-transit nodes before
    congestion is around 50
  • During congestion application level controller
    throttles data items
  • Buffer occupancy at in-transit nodes reduces
    from 80 without coupling to 60.8 with coupling
  • Higher buffer occupancies at in-transit nodes
    lead to failures loss of data

40
Reservoir Characterization EnKF-based History
Matching (with S. Jha)
  • Black Oil Reservoir Simulator
  • simulates the movement of oil and gas in
    subsurface formations
  • Ensemble Kalman Filter
  • computes the Kalman gain matrix and updates the
    model parameters of the ensembles
  • Hetergeneous, dynamic workflows
  • Based on Cactus, PETSc

41
Experiment Background and Set-Up (2/2)
  • Key metrics
  • Total Time to Completion (TTC)
  • Total Cost of Completion (TCC)
  • Basic assumptions
  • TG gives the best performance but is relatively
    more restricted resource.
  • EC2 is a relatively more freely available but is
    not as capable.
  • Note that the motivation of our experiments is to
    understand each of the usage scenarios and their
    feasibility, behaviors and benefits, and not to
    optimize the performance of any one scenario.

42
Establishing Baseline Performance
Baseline TTC for EC2 and TG for a 1-stage, 128
ensemble member EnKF run. The first 4 bars
represent the TTC as the number of EC2 VMs
increase the next 4 bars represent the TTC as
the number of CPUs (nodes) used increases.
43
Autonomic Integration of HPC Grids Clouds
(with S. Jha)
  • Acceleration Clouds used as accelerators to
    improve the application time-to-completion
  • alleviate the impact of queue wait times or
    exploit an additionally level of parallelism by
    offloading appropriate tasks to Cloud resources
  • Conservation Clouds used to conserve HPC Grid
    allocations, given appropriate runtime and budget
    constraints
  • Resilience Clouds used to handle unexpected
    situations
  • handle unanticipated HPC Grid downtime,
    inadequate allocations or unanticipated queue
    delays

44
Objective I Using Clouds as Acceleratorsfor HPC
Grids (1/2)
  • Explore how Clouds (EC2) can be used as
    accelerators for HPC Grid (TG) work-loads
  • 16 TG CPUs (1 node on Ranger)
  • average queuing time for TG was set to 5 and 10
    minutes.
  • the number of EC2 nodes from 20 to 100 in steps
    of 20.
  • VM start up time was about 160 seconds

45
Objective I Using Clouds as Acceleratorsfor HPC
Grids (2/2)
The TTC and TCC for Objective I with 16 TG CPUs
and queuing times set to 5 and 10 minutes. As
expected, more the number of VMs that are made
available, the greater the acceleration, i.e.,
lower the TTC. The reduction in TTC is roughly
linear, but is not perfectly so, because of a
complex interplay between the tasks in the work
load and resource availability
46
Objective II Using Clouds for ConservingCPU-Time
on the TeraGrid
  • Explore how to conserve fixed allocation of CPU
    hours by offloading tasks that perhaps dont need
    the specialized capabilities of the HPC Grid

Distribution of tasks across EC2 and TG, TTC and
TCC, as the CPU-minute allocation on the TG is
increased.
47
Objective III Response to Changing Operating
Conditions (Resilience) (1/4)
  • Explore the situation where resources that were
    initially planned for, become unavailable at
    runtime, either in part or in entirety
  • How can Cloud services be used to address this
    situations and allow the system/application to
    respond to a dynamic change in availability of
    resources.
  • Initially 16 TG CPUs for 800 minutes allocated.
    After about 50 minutes of execution (i.e., 3
    Tasks were completed on the TG), available CPU
    time is change to only 20 CPU minutes remain

48
Objective III Response to Changing Operating
Conditions (Resilience) (2/4)
Allocation of tasks to TG CPUs and EC2 nodes for
usage mode III. As the 16 allocated TG CPUs
become unavailable after only 70 minutes rather
than the planned 800 minutes, the bulk of the
tasks are completed by EC2 nodes.
49
Objective III Response to Changing Operating
Conditions (Resilience) (3/4)
Number of TG cores and EC2 nodes as a function of
time for usage mode III. Note that the TG CPU
allocation goes to zero after about 70 minutes
causing the autonomic scheduler to increase the
EC2 nodes by 8.
50
Objective III Response to Changing Operating
Conditions (Resilience) (4/4)
Overheads of resilience on TTC and TCC.
51
Autonomic Formulations/Programming
52
LLC-based Self Management in Accord
  • Element/Service Managers are augmented with LLC
    Controllers
  • monitors state/execution context of elements
  • enforces adaptation actions determined by the
    controller
  • augment human defined rules

53
The Instrumented Oil Field
  • Production of oil and gas can take advantage of
    installed sensors that will monitor the
    reservoirs state as fluids are extracted
  • Knowledge of the reservoirs state during
    production can result in better engineering
    decisions
  • economical evaluation physical characteristics
    (bypassed oil, high pressure zones) productions
    techniques for safe operating conditions in
    complex and difficult areas

Application of Grid-Enabled Technologies for
Solving Optimization Problems in Data-Driven
Reservoir Studies, M. Parashar, H. Klie, U.
Catalyurek, T. Kurc, V. Matossian, J. Saltz and M
Wheeler, FGCS. The International Journal of Grid
Computing Theory, Methods and Applications
(FGCS), Elsevier Science Publishers, Vol. 21,
Issue 1, pp 19-26, 2005.
54
Effective Oil Reservoir Management Well
Placement/Configuration
  • Why is it important
  • Better utilization/cost-effectiveness of existing
    reservoirs
  • Minimizing adverse effects to the environment

Bad Management
Better Management
Less Bypassed Oil
Much Bypassed Oil
55
Autonomic Reservoir Management Closing the
Loop using Optimization
Dynamic Decision System
Dynamic Data-Driven Assimilation
  • Optimize
  • Economic revenue
  • Environmental hazard
  • Based on the present subsurface knowledge and
    numerical model

Subsurface characterization
Management decision
Data assimilation
Acquire remote sensing data
Update knowledge of model
Plan optimal data acquisition
Experimental design
START
Autonomic Grid Middleware
Grid Data Management
Processing Middleware
56
An Autonomic Well Placement/Configuration Workflow
Oil prices, Weather, etc.
57
Autonomic Oil Well Placement/Configuration
Contours of NEval(y,z,500)(10)
Pressure contours 3 wells, 2D profile
permeability
Requires NYxNZ (450) evaluations. Minimum appears
here.
VFSA solution walk found after 20 (81)
evaluations
58
Autonomic Oil Well Placement/Configuration (VFSA)
An Reservoir Framework for the Stochastic
Optimization of Well Placement, V. Matossian, M.
Parashar, W. Bangerth, H. Klie, M.F. Wheeler,
Cluster Computing The Journal of Networks,
Software Tools, and Applications, Kluwer Academic
Publishers, Vol. 8, No. 4, pp 255 269, 2005
Autonomic Oil Reservoir Optimization on the
Grid, V. Matossian, V. Bhat, M. Parashar, M.
Peszynska, M. Sen, P. Stoffa and M. F. Wheeler,
Concurrency and Computation Practice and
Experience, John Wiley and Sons, Volume 17, Issue
1, pp 1 26, 2005.
59
Summary
  • CI and emerging computational ecosystems
  • Unprecedented opportunity
  • new thinking, practices in science and
    engineering
  • Unprecedented research challenges
  • scale, complexity, heterogeneity, dynamism,
    reliability, uncertainty,
  • Autonomic Computing can address complexity and
    uncertainty
  • Separation Integration Automation
  • Experiments with Autonomics for science and
    engineering
  • Autonomic data streaming and in-transit data
    manipulation, Autonomic Workflows, Autonomic
    Runtime Management,
  • However, there are implications
  • Added uncertainty
  • Correctness, predictability, repeatability
  • Validation

60
Thank You!
Email parashar_at_rutgers.edu
Write a Comment
User Comments (0)
About PowerShow.com