Distributed Analysis: Motivation - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Analysis: Motivation

Description:

... with the external world only via an abstract interface ... .but the design must be LB-aware. partition the initial data set and assign data chunks to tasks ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 42
Provided by: cseminar
Category:

less

Transcript and Presenter's Notes

Title: Distributed Analysis: Motivation


1
(No Transcript)
2
Distributed Analysis Motivation
this is the view of analysis application provider
  • why do we want distributed data analysis?
  • move processing close to data
  • for example ntuple
  • job description kB
  • the data itself MB, GB, TB ...
  • rather than downloading gigabyte data let the
    remote server do the job
  • do it in parallel faster
  • clusters of cheap PCs

3
Computing Models
  • desktop computing
  • personal computing resource
  • may lack CPU, high-speed access to networked
    databases,...
  • "mainframe" computing
  • shared supercomputer in a LAN
  • expensive and may have scalability problems
  • cluster computing
  • a collection of nodes in a LAN
  • complex and harder to manage
  • grid computing
  • a WAN collection of computing elements
  • even more complex

4
Cluster Computing at CERN
  • batch data analysis
  • e.g. lxbatch currently in production
  • workload management system (e.g. LSF)
  • automatic scheduling and load-balancing
  • batch jobs hours, days to complete
  • interactive data analysis
  • currently desktop, will have to be distributed
    for LHC
  • tried in the past for ntuple analysis
  • PIAF (Parallel Interactive Analysis Facility)
  • running copies of PAW on behalf of the user.
  • 8 nodes and tight coupling with the application
    layer (PAW)
  • semi-interactive analysis becomes more important
    minutes... hours

5
(No Transcript)
6
Topology of I/O Intensive App.
  • ntuple mostly I/O intensive rather than CPU
    intensive
  • fast DB access from cluster
  • slow network from user to cluster
  • very small amount of data exchanged between the
    tasks in comparison to"input" data

7
Parallel Ntuple Analysis
  • data driven
  • all workers perform same task (similar to SPMD)
  • synchronization quite simple (independent
    workers)
  • master/worker model

8
(No Transcript)
9
(No Transcript)
10
Master/Worker model
  • applications share the same computation model
  • so also share a big part of the framework code
  • but have different non-functional requirements

11
What DIANE is?
  • RD project in IT/API
  • semi-interactive parallel analysis for LHC
  • middleware technology evaluation choice
  • CORBA, MPI, Condor, LSF...
  • also see how to integrate API products with GRID
  • prototyping (focus on ntuple analysis)
  • time scale and resources
  • Jan 2001 start (lt 1 FTE)
  • June 2002 running prototype exists
  • sample Ntuple analysis with Anaphe
  • event-level parallel Geant4 simulation

12
What DIANE is?
  • framework for parallel cluster computation
  • application-oriented
  • master-worker model common in HEP applications
  • application-independent
  • apps dynamically loaded in a plugin style
  • callbacks to applications via abstract interfaces
  • component-based
  • subsystems and services packaged into component
    libraries
  • core architecture uses CORBA and CCM (CORBA
    Component Model )
  • integration layer between applications and the
    GRID
  • environment and deployment tools

13
What DIANE is not ?
  • DIANE is not
  • a replacement for a GRID and its services
  • a hardwired analysis toolkit

14
DIANE and GRID
  • DIANE as a GRID computing element
  • ...via a gateway that understands Grid/JDL
  • ... Grid/JDL must be able to descibe parallel
    jobs/tasks
  • DIANE as a user of (low level) Grid services
  • ...authentication, security, load balancing...
  • and profit from existing 3rd party
    implementations
  • python environment is a rapid prototyping
    platform
  • and may provide a convinient connection between
    DIANE and Globus Toolkit via pyGlobus API

15
Architecture Overview
  • layering abstract middleware interfaces and
    components
  • plugin-style application loading

16
Client Side DIANE
  • thin client / lightweight XML job description
    protocol
  • just create a well-formed job description in XML
  • send and read the results back as XML data
    messages
  • connection scenarios
  • standalone clients C, python client apps
  • explicit connection from a shell prompt
  • flexibility and choice of command-line tools
  • clients integrated into analysis framework e.g.
    Lizard/python
  • hidden connection behind-the-scenes
  • Web access Java-CORBA binding, SOAP (?)
  • universal and easy access

17
Data Exchange Protocol (1)
  • XDR concept in C
  • Specify data format
  • Type and order of data fields
  • Data messages
  • Sender and receiver agree on the format
  • Message is send as opaque object (any)
  • C type may be different at each side
  • Interfaces with flexible data types
  • E.g. store list of identifiers (unknown type)

18
Data Exchange Protocol (2)
class A public DXPDataObject public
DXPString name // predefined fundamental
types DXPLong index DXPSequenceDataObje
ctltDXPplain_Doublegt ratio B b
// nested complex object A(DXPDataObject
parent) DXPDataObject(parent), name(this),
index(this), ratio(this), B(this)
19
Data Exchange Protocol (3)
  • External streaming supported, e.g
  • Serialize as CORBAbyte_sequence
  • Serialize to XML (ascii string)
  • Visitor pattern new formats easy
  • Handles
  • Opaque objects (any)
  • Typed objects safe casts

DXPTypedDataObjectltAgt a1,a2 // explicit
format DXPAnyDataObject x a1 // opaque
object a2 x if(a1.isValid()) // "cast
successful"
20
Server Side Architecture
  • Corba Component Model (CCM)
  • pluggable components services
  • make a truly component system on the core
    architecture level
  • common interface to the service components
  • difficult due to different nature of the services
    implementations
  • example load-balancing service
  • Condor - process migration
  • LSF - black-box load balancing
  • custom PULL implenetation - active load balancing
  • but first results show that it is feasible

21
DIANE CORBA
  • CORBA
  • industry standard (mature and tested)
  • scalable (we need 1000s of nodes and processes)
  • language and platform independent (IDL)
  • C, C, Java, python,...
  • many implementations commercial and open source
  • directly supports OO, abstract interfaces
  • CORBA facilities
  • naming service, trading service etc.
  • Corba Component Model
  • supports component programming (evolution of OO)

22
Component Technology
  • components are not classes!
  • components are deployment units
  • they live in libraries, object files and binaries
  • they interact with the external world only via an
    abstract interface
  • total separation from underlying implementation
  • classes are source code organization units
  • they exist on different design levels and support
    different semantics
  • utility classes (e.g. STL vectors or smart
    pointers)
  • mathematical classes (e.g. HepMatrix)
  • complex domain classes (e.g. FMLFitter)
  • but a class may implement a component
  • OO fails to reuse, component technology might
    help (hopefully)

23
(No Transcript)
24
Server Side DIANE
25
Server Side DIANE
26
CORBA and XML in Practice
  • inter-operability (shown in the prototype ntuple
    application)
  • cross-release (muchos gracias XML!)
  • client running Lizard/Anaphe 3.6.6
  • server running 4.0.0-pre1
  • cross-language (muchos gracias CORBA!)
  • python CORBA client (30 lines)
  • C CORBA server
  • compact XML data messages
  • 500 bytes to server, 22k bytes from server of XML
    description
  • factor 106 less than original data (30 MB ntuple)
  • thin client no need to run Lizard on the client
    side as an alternative use case scenario

27
Load balancing service
  • Black-box (e.g. LSF)
  • limited control -gt submit jobs (black box)
  • job queues with CPU limits
  • automatic load balancing, scheduling (task
    creation and dispatch)
  • prototype deployed (10s workers)
  • Explicit PULL LB
  • custom daemons
  • more control -gt explicit creation of tasks
  • load balancing callbacks into specific
    application
  • prototype custom PULL load-balancing (10s
    workers)

28
Dedicated Interactive Cluster (1)
  • Daemons per node
  • Dynamic process allocation

29
Dedicated Interactive Cluster (2)
  • Daemons per user per node
  • Thread pools, per-user policies

30
Error Recovery Service
  • The mechanisms
  • daemon control layer
  • make sure that the core framework process are
    alive
  • periodical ping need to be hierarchized to be
    scalable
  • worker sandbox
  • protect from the seg-faults in the user
    applications
  • memory corruption
  • exceptions
  • signals
  • based on standard Unix mechanisms child
    processes and signals

31
(No Transcript)
32
Other Services
  • Interactive data analysis
  • connection-oriented vs connectionless
  • monitoring and fault recovery
  • User environment replication
  • do not rely on the common filesystem (e.g. AFS)
  • distribution of application code
  • binary exchange possible for homogeneous clusters
  • distribution of local setup data
  • configuration files, etc
  • binary dependencies (shared libraries etc)

33
Optimization
  • Optimizing distributed I/O access to data
  • clustering of the data in the DB on the per-task
    basis
  • depends on the experiment-specific I/O solution
  • Load balancing
  • framework is not directly addressing low level
    issues
  • ...but the design must be LB-aware
  • partition the initial data set and assign data
    chunks to tasks
  • how big chunks?
  • static/adaptive algorithm?
  • push vs pull model for dispatching tasks
  • etc.

34
Further Evolution
  • expect full integration and collaboration with
    LCG according to their schedule
  • software evolution and policy
  • distributed technology (CORBA, RMI, DCOM,
    sockets, ...)
  • persistency technology (LCG RTAGs -gt ODBMS,
    RDBMS, RIO)
  • programming/scripting languages (C, Java,
    python,...)
  • evolution of GRID technologies and services
  • Globus
  • LCG, DataGrid, CrossGrid (interactive apps)
  • ...

35
Limitations
  • Model limited to Master/Worker
  • More complex synchronization patterns
  • some particular cpu-intensive applications
    require fine-grained synchronization between
    workers - this is NOT provided by the framework
    and must be achieved by other means (e.g MPI)
  • Intra-cluster scope NOT a global metacomputer
  • Grid-enabled gateway to enter Grid universe
  • otherwise the framework is independent thanks to
    Abstract Interfaces

36
Similar Projects in HEP
  • PIAF (history)
  • using PAW
  • TOP-C
  • G4 examples for parallelism at event-level
  • BlueOx
  • Java
  • using JAS for analysis
  • some space for communality via AIDA
  • PROOF
  • based on ROOT

37
Summary
  • first prototype ready and working
  • proof of concept for up to 50 workers
  • 1000 workers needs to be checked
  • initial deployment
  • integration with Lizard analysis tool
  • Geant 4 simulation
  • active RD in component architecture
  • relation to LCG to be established

38
That's about it
  • cern.ch/moscicki/work
  • cern.ch/anaphe
  • aida.freehep.org

39
Facade for end-user analysis
  • 3 groups of user roles
  • developers of distributed analysis applications
  • brand new applications e.g. simulation
  • advanced users with custom ntuple analysis code
  • similar to Lizard Analyzer
  • execute custom algorithm on the parallel ntuple
    scan
  • interactive users
  • do the standard projections
  • just specify the histogram and ntuple to project
  • user-friendly means
  • show only the relevant details
  • hide the complexity of the underlying system

40
Facade for end-user analysis
41
Ntuple Projection Example
  • example of semi-interactive analysis
  • data 30 MB HBOOK ntuple / 37K rows / 160 columns
  • time minutes .. hours
  • timings
  • desktop (400Mhz, 128MB RAM) - c.a. 4 minutes
  • standalone lxplus (800Mhz, SMP, 512MB RAM) - c.a.
    45 sec
  • 6 lxplus workers - c.a. 18 sec
  • why 6 18 45 ?
  • job is small, so big fraction of the time is
    compilation and dll loading, rather than
    computation
  • pre-installing application would improve the
    speed
  • caveat example running on AFS and public machines
Write a Comment
User Comments (0)
About PowerShow.com