Component-based Grid Environment for Programming Scientific Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Component-based Grid Environment for Programming Scientific Applications

Description:

Component-based Grid Environment for Programming Scientific Applications Maciej Malawski – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 48
Provided by: homeAghE6
Category:

less

Transcript and Presenter's Notes

Title: Component-based Grid Environment for Programming Scientific Applications


1
Component-based Grid Environment for Programming
Scientific Applications
  • Maciej Malawski

2
Outline
  • Problem programming applications on Grid
  • Programming models and virtualization
  • CCA H2O
  • Extensions to the environment
  • Applications and tests
  • Summary and future work

3
Experience (CrossGrid) Grid is complex
4
Problem how to program grid applications
  • Scientific applications
  • Compute intensive
  • May be data-intensive
  • Often custom-made
  • Written in many programming languages (e.g.
    Fortran)
  • Collaborative
  • Current practice on Grid
  • Write a JDL scripts which submits a shell script
    as a batch job, which uses SSH to launch a
    process on the head node of the cluster to serve
    as a proxy for communication... (from CGW'06
    presentation by ICM)
  • Submit a shell script which queries the LFC
    catalog, retrieves TAR archive from SE using
    GRIDFTP, unpacks the archive, runs another
    computing script, stores the output on SE and
    registers in LFC catalog. - a biomedical
    application (CGW'06)
  • Problems with scientific computing (IPDPS'05
    panel discussion)
  • Software
  • Software
  • Software... engineering

5
Two key challenges
  • Programming model
  • Suitable for the distributed environment
  • Allowing to manage complex applications
  • Supported by standards
  • Supporting scientific applications
  • Facilitating programming
  • Virtualization
  • Hiding the complexity of heterogeneous
    environment
  • Allowing to dynamically create/acquire pools of
    resources on demand

6
Research objectives
  • Concept of programming environment for scientific
    applications on Grid
  • Analysis of programming models for grid
    applications
  • Identification of desired features of programming
    environment
  • Prototype implementation and feasibility study
  • Verification of the model and prototype with
    typical applications
  • Thesis (provisional)
  • Extended Component model may be used for creating
    grid environment for programming and running
    complex scientific applications.

7
Many programming models
  • MPI, PVM
  • Custom protocols
  • Tuple spaces, HLA
  • Distributed objects
  • Active objects
  • Components
  • Skeletons
  • Service Oriented Architectures, Web Services

8
Virtualization state of the art (incomplete)
  • Globus GRAM, Condor, VDT, gLite, Unicore
  • large-scale batch job oriented submission
    systems
  • Virtual Workspaces using Globus to submit VMWare
    (or other type) virtual machines to create a
    Condor pool of resources, which can be in turn
    accessible using Globus Toolkit
  • Cannot call it lightweight solution!
  • SOA everything accessible as Web Service
  • Efforts to support dynamic service deployment
  • Component model a container provides a
    virtualization layer for hosting components
  • Dynamic deployment directly embedded into a
    programming model - (component unit of
    deployment)

9
What are components?
  • A unit of software development/deployment/reuse
  • i.e. has interesting functionality
  • Ideally, functionality someone else might be able
    to (re)use
  • Can be developed independently of other
    components
  • Interacts with the outside world only through
    well-defined interfaces
  • Can be composed with other components
  • Plug and play model to build applications
  • Composition based on interfaces
  • Hosted in a framework/container responsible for
    other services (communication, security)

10
Benefits of Component-based Approach
  • Enables composing applications from blocks which
    originally were not designed to be combined
  • Addresses software complexity issues
  • Many frameworks provide language interoperability
  • Enformcement of separation of interface from
    implementation
  • Facilitates managing third party libraries
  • Allows easy swapping of implementation
  • Increases software productivity
  • Mature and successful technology in business and
    desktop applications

11
Components vs. Web Services
  • Component
  • Formal models for component programming (e.g.
    Fractal)
  • May be created on-demand, e.g. more components
    deployed when needed
  • Explicitly declare required interfaces (uses
    ports) can be directly connected no need to
    pass invocation data via central workflow engine
  • May have parallel connections
  • Does not require SOAP as a protocol

12
Proposed approach to building grid environment
  • Use a component model
  • Apply a virtualization layer
  • Design a base component environment with a set of
    desired features
  • Extend the environment features

13
Desired features of Grid components
  • Scalable to different environments (from laptops
    to HPC clusters)
  • lightweight platform
  • dynamic, pluggable, reconfigurable at runtime
  • Facilitated deployment on shared resources
  • Virtualization (creating dynamic workspaces)
  • Dynamic (hot) deployment
  • Communication adjusted to various levels of
    coupling
  • P2P, WANs, LANs, intercluster connections, direct
    binding in one process
  • supporting parallelism
  • Supporting multiple languages
  • allowing easy adaptation of legacy code
  • combining Java flexibility with optimized Fortran
    libraries
  • Facilitating programming
  • composable in space and in time
  • taking advantage of semantic description and
    reasoning
  • Adapted to unreliable Grid environment
  • supporting dynamic and interactive
    reconfiguration of connections, locations,
    bindings
  • providing support for migration and checkpointing
  • Interoperability with grid standards

14
State of the art examples of solutions
(incomplete)
  • Scalable to different environments (from laptops
    to HPC clusters)
  • HPC CCAFFEINE, GridCCM
  • Lightweight XCAT, ProActive, ICENI
  • Facilitated deployment on shared resources
  • ProActive, XCAT (using Globus)
  • Communication adjusted to various levels of
    coupling
  • CCAFFEINE direct binding, MPI XCAT SOAP
  • optimized communication IBIS, GridCCM
  • Parallel, collective communication GridCCM,
    IBIS, ProActive
  • Supporting multiple languages
  • legacy code BABEL
  • Interoperability CORBA, SOAP
  • Facilitating programming
  • composable in space and in time XCAT, ICENI, GCM
    hierarchical
  • Skeleton approach HOC, ASSIST
  • taking advantage of semantic description and
    reasoning ICENI, Semantic Web Services
  • Adapted to unreliable Grid environment
  • dynamic and interactive reconfiguration
    ProActive, XCAT, Web Services model
  • migration and checkpointing Proactive, XCAT

15
Base for the Solution CCA and H2O
  • Common Component Architecture (CCA)
  • Component standard for HPC
  • Uses and provides ports described in SIDL
  • Support for scientific data types
  • Existing tightly coupled (CCAFFEINE) and loosely
    coupled, distributed (XCAT) frameworks
  • H2O
  • Java-based distributed resource sharing platform
  • Providers setup H2O kernel (container)
  • Allowed parties can deploy pluglets (components)
  • Separation of roles decoupling
  • Providers from deployers
  • Providers from each other
  • RMIX efficient multiprotocol RMI extension

16
Example scenarios of H2O
17
Features of the environment
  • Scalable to different environments (from Laptops
    to HPC clusters)
  • lightweight platform use H2O
  • dynamic, pluggable, reconfigurable at runtime
    dynamic CCA model H2O kernel facilities
  • Facilitated deployment on shared resources
  • Static virtualization by using H2O kernel as a
    daemon
  • Dynamic virtualization using a pool of transient
    H2O kernels created on-demand
  • Communication adjusted to various levels of
    coupling
  • Offered by RMIX library of H2O
  • Parallel extensions for CCA multiple ports
  • Facilitating programming
  • Composition in time Low-level Python or Ruby
    Scripting, High-level Virolab/GridSpace
    programming environment
  • Semantic description under development within
    Virolab
  • Supporting multiple languages
  • Integration of RMIX with Babel
  • Integration of MOCCA with Babel pending
  • Interoperability with grid standards
  • Web Services future work (technically feasible
    either RMIX of embedded server Xfire)
  • Grid Component Model (ProActive/Fractal)
    interoperability recent work
  • Adapted to unreliable Grid environment

18
MOCCA a basic component framework
  • Each component is a separate pluglet
  • Dynamic remote deployment of components
  • Components packaged as JAR files
  • Security Java sandboxing, detailed access policy
  • Using RMIX for communication efficiency,
    multiprotocol interoperability
  • Flexibility and multiple scenarios as in H2O
  • MOCCA_Light pure Java implementation
  • Java API or Jython and Ruby scripting for
    application asssembly
  • http//www.icsr.agh.edu.pl/mambo/mocca

19
Dynamic virtualization
  • A pool of computing resources may be created by
    submitting a number of H2O kernels on many Grid
    sites
  • Application components may be deployed on the
    kernels belonging to the pool
  • Virtual resource pool may be used by a single
    user or shared for collaboration
  • Interaction with cluster nodes in private network
    JXTA transport (needs more testing)

20
Communication extension RMIX over JXTA
  • Fully operational RMI implementation running over
    JXTA P2P network
  • Methods can be invoked on remote objects located
    behind firewalls or NATs
  • Our implementation of JXTA socket factories
    manages all the JXTA connectivity transparently
    from users point of view

21
Parallelism Extensions of CCA for Multiple Ports
and Connections
  • Multiple users of one provides port (easy part)
  • Single provides port
  • Naming convention for client components (client1,
    client2, ...)
  • Single client of multiple providers
  • Need multiple uses ports on the client side
  • Use ParameterPort of CCA to parametrize the
    number of uses ports
  • Client component creates a required number of
    uses ports
  • Naming convention for server components and uses
    port names
  • Extension of CCA BuilderService MultiBuilder
  • Creation of multiple components
  • Handling multiple connections

22
Support for composition in space and in time
  • Declarative vs. imperative programing
  • Composition in space
  • Graph of component connections
  • ADL Application Description Language
  • Supported by MOCCAccino
  • Composition in time
  • Workflow model (script)
  • Centralized execution
  • Currently supported low-level scripting in Jython
    and JRuby
  • High-level scripting developed within Virolab

23
Composition in space - Moccaccino
  • ADLM (ADL for MOCCAccino) XML based language
    for
  • Describing types and number of components and
    their connections
  • Concept of hierarchical component groups
  • Optional information to specify resources
  • Hints for deployment of components (whether they
    are computation intensive or communication
    intensive).
  • Application Manager responsible for
  • Discovering available kernel pool
  • Planning optimal location of components
  • Deploying components in specified kernels
  • Connecting components

24
Moccacino usage
25
Motivation for multiprotocol and multilanguage
interoperability
  • Grids are heterogeneous
  • Multiple programming languages in single
    application
  • Java for middleware
  • C for system programming
  • FORTRAN for computing
  • Python for scripting
  • Multiple protocols in single application
  • High speed local networks (Myrinet)
  • TCP/SSL/TLS in WAN
  • SOAP for loosely coupled message exchange
  • Overlay P2P networks for traversing private
    network boundaries (NATs)
  • Context MOCCA component framework

26
Multilanguage Solution - Babel
  • SIDL Scientific Interface Definition Language
  • Standard for CCA Components
  • Supports arrays and complex types
  • Focus on interfaces
  • Babel
  • SIDL parser
  • Code generator
  • Runtime library
  • Intermediate ObjectRepresentation (IOR)
  • Core of Babel object
  • Array of function pointers
  • Generated code in C

package example version 1.2 class Hello
string hello( in string hello)
// user defined non-static methods /
Method hello / public java.lang.String
hello_Impl ( /in/ java.lang.String hello )
// DO-NOT-DELETE splicer.begin(example.Hello.
hello) // Insert-Code-Here example.Hello.hello
(hello) return Server says hello
// DO-NOT-DELETE splicer.end(example.Hello.hello)

/ Method hello / char example_Hello_he
llo( /in/ example_Hello self, /in/ const
char hello)
27
Currently Babel for Local Applications
  • All Babel objects in one process
  • Implemented in CCAFFEINE framework
  • Existing multilanguage CCA components see CCA
    tutorial

Java application
Babel IOR
Babel IOR
28
Our Solution
  • Babel RMIX
  • Implementation of Babel RMI extensions
  • generic mechanism of method invocation
    (reflection)
  • Dynamic loading of communication library
  • No need for code generation and compilation

RMIX library
RMIX library
Babel IOR
Babel IOR
Java application
Fortran native library
29
Interoperability with Grid Component Model
(CoreGRID)
  • Based on Fractal Model
  • Deployment Functionalities
  • Asynchronous and extensible port semantics
  • Collective Interfaces
  • Autonomicity and adaptivity thanks to
    autonomic and dynamic controllers
  • Support for language neutrality and
    interoperability

30
Motivation for interoperability
  • Framework interoperability is an important issue
    for GCM
  • Existing component models and frameworks for
    Grids
  • CCA, CCM
  • Already existing legacy components
  • ProActive/Fractal and H2O/MOCCA alternative
    Java-based frameworks for distributed computing
    can they interoperate?

31
Fractal vs. CCA
  • Similarities general for most component models
  • Separation of interface from implementation
  • Composition by connecting interfaces
  • Differences
  • Fractal components are reflective (introspection)
    vs. the CCA components are given initiative to
    add/remove ports at runtime
  • BindingController in Fractal vs. BuilderService
    in CCA
  • No ContentController in CCA (and no hierarchy)
  • Factory interface in Fractal vs. BuilderService
    in CCA
  • AttributeController in Fractal vs. ParameterPort
    in CCA
  • No ADL in CCA

32
Approaches to integration
  • Single component integration
  • Wrapping a CCA component into a primitive GCM one
  • Allow to use a CCA component in a GCM framework
  • Framework interoperability
  • Ability for two component frameworks to
    interoperate
  • Allow to connect a CCA component assembly
    (running in a CCA framework) to a GCM component
    application

33
Solutions to typing issues
  • Generate the type of a wrapped CCA component at
    runtime (at initialization)
  • Pros fully automated
  • Cons restricts to usage of ports which are
    declared by CCA component during initialization
    (at setServices() call)
  • Manual description of a CCA component in ADL
    format
  • Pros Generic solution
  • Cons Require additional task from developer
  • (Semi)automatic generation of ADL
  • May combine approach 1. and 2.
  • Reuse existing CCA type specifications (SIDL,
    CCAFFEINE scripting, others not standardized)

34
Technical approach CCA controller
  • Creates glue components for all ports (client and
    server)
  • Connects glue to CCA system (using CCA builder)
    and to membrane (using BC)

35
Glue Components
  • Server Glue
  • Deployed as Fractal component
  • Uses MOCCA client code to delegate invocation to
    CCA interface
  • Can be also deployed on H2O kernel
  • Client Glue
  • Deployed as CCA component in H2O kernel
  • Launches ProActive runtime in H2O kernel
  • Creates Fractal component in this runtime
  • Both
  • Can be generated from the interface type (TODO)

36
ProActive MOCCA
  • MOCCA invocations are synchronous
  • Composite (membrane) should be synchronous to
    avoid deadlocks
  • Or, we may consider generating glue with wrapped
    types (IntWrapper, etc) this changes types of
    interfaces
  • Class loading issues
  • The classes generated by ProActive runtime must
    be visible to the code running in H2O kernel
  • The RMI class loading works fine if the codebase
    is set properly on ProActive side

37
Communication Intensive Application Benchmark
  • Simplified scenario
  • 2 components
  • Provides port receive and send-back array of
    double (ping-pong)
  • Tested on local Gigabit Ethernet and on
    transatlantic Internet between Atlanta and Krakow
  • 2.4 GHz Linux machines
  • Comparison with XCAT

38
Small Data Packets
  • Factors
  • SOAP header overhead in XCAT
  • Connection pools in RMIX

39
Large Data Packets
  • Encoding (binary vs. base64)
  • CPU saturation on Gigabit LAN (serialization)
  • Variance caused by Java garbage collection

40
Automatic Flow Composer Example
  • Compose application graph from initial data (e.g.
    initial ports) or incomplete graph
  • First implemented for XCAT framework
  • Easy migration to MOCCA
  • Modification of code required (xcat.Port)
  • Similar performance for XCAT and MOCCA (exchange
    of text documents)

41
Other applications
  • Domain decomposition (some student toy apps)
  • Data mining using Weka (as a Virolab example)

42
Gold Cluster Application
  • Components
  • Starter a driver component for the
    application, provides a Go port
  • Configuration generator random initial
    configurations
  • Simulated annealing compute intensive
    simulation component
  • Storeroom used for keeping results and
    statistics
  • Gather auxiliary component for passing
    molecules
  • Ports
  • Molecule offers getMolecule() method
  • Control ports for steering the application

43
Resources and Results
  • Using heterogeneous infrastructure available
    ad-hoc
  • Local machine
  • SSH access
  • Cluster in CYFRONET
  • PBS
  • CrossGrid tesbed (LCG based middleware)
  • Clusters in PSNC Poznan and IFCA Santander
  • Java VMs already installed
  • Cluster nodes allow remote point-to-point
    communication (MPICH-enabled no firewalls!)
  • Problem size grows with number of nodes (weak
    scaling)

44
Future work
  • Optimization algorithms (scheduling) for ADL and
    scripting models
  • Monitoring support (Gemini)
  • Formal model (adapted from GCM)
  • Further integration with Babel
  • More applications

45
Summary
  • Analysis of programming models for Grid,
    selection of component model
  • Design and implementation of CCA framework based
    on H2O platform
  • Extending applicability of H2O for dynamically
    created pools of resources (user-centric or
    ad-hoc created Vos)
  • Extensions for parallel-distributed CCA
    components
  • Support for time and space composition modes by
    high-level scripting and ADL-based application
  • Towards multilanguage interop
  • Supporting interoperability between component
    models

46
Key papers
  • Maciej Malawski, Dawid Kurzyniec, and Vaidy
    Sunderam. MOCCA towards a distributed CCA
    framework for metacomputing. In Proceedings of
    the 10th International Workshop on High-Level
    Parallel Programming Models and Supportive
    Environments (HIPS2005), 2005. IEEE Computer
    Society
  • Maciej Malawski, Marian Bubak, Michal Placek,
    Dawid Kurzyniec, and Vaidy Sunderam. Experiments
    with distributed component computing across Grid
    boundaries. In Proceedings of the
    HPC-GECO/CompFrame workshop in conjunction with
    HPDC 2006, 2006.
  • P. Jurczyk, M. Golenia, M. Malawski, D.
    Kurzyniec, M. Bubak, V. S. Sunderam, Enabling
    Remote Method Invocations in Peer-to-Peer
    Environments RMIX over JXTA, in Roman
    Wyrzykowski, Jack Dongarra, Norbert Meyer, Jerzy
    Wasniewski (Eds.), Parallel Processing and
    Applied Mathematics 6th International
    Conference, PPAM 2005, Poznan, Poland, September
    11-14, 2005, Revised Selected Papers, Lecture
    Notes in Computer Science, 3911, Springer, 2006,
    pp. 667-674
  • M. Malawski, D. Harezlak, M. Bubak, Towards
    Multiprotocol and Multilanguage Interoperability
    Experiments with Babel and RMIX, in M. Bubak, M.
    Turala, K. Wiatr (Eds.), Proceedings of Cracow
    Grid Workshop - CGW'05, November 20-23 2005,
    ACC-Cyfronet UST, 2006, Kraków, pp. 266-278.
  • M. Bubak, M. Malawski, M. Placek, Using MOCCA
    Component Environment for Simulation of Gold
    Clusters, in M. Bubak, M. Turala, K. Wiatr
    (Eds.), Proceedings of Cracow Grid Workshop -
    CGW'05, November 20-23 2005, ACC-Cyfronet UST,
    2006, Kraków, pp. 295-299.

47
Acknowledgements
  • Vaidy Sunderam, Dawid Kurzyniec Emory
    University, Atlanta
  • Daniel Harezlak, Michal Placek
  • Tomek Bartynski, Eryk Ciepiela, Joanna Kocot,
    Przemyslaw Pelczar, Iwona Ryszka
  • Pawel Jurczyk, Maciej Golenia
  • Tomasz Gubala, Marek Kasztelnik, Piotr Nowakowski
  • Ludovic Henrio, Matthieu Morel, Francoise Baude,
    Denis Caromel Sophia-Antipolis, France
  • Marian Bubak
Write a Comment
User Comments (0)
About PowerShow.com