Title: Distributed Analysis: Motivation
1(No Transcript)
2Distributed Analysis Motivation
this is the view of analysis application provider
- why do we want distributed data analysis?
- move processing close to data
- for example ntuple
- job description kB
- the data itself MB, GB, TB ...
- rather than downloading gigabyte data let the
remote server do the job - do it in parallel faster
- clusters of cheap PCs
3Computing Models
- desktop computing
- personal computing resource
- may lack CPU, high-speed access to networked
databases,... - "mainframe" computing
- shared supercomputer in a LAN
- expensive and may have scalability problems
- cluster computing
- a collection of nodes in a LAN
- complex and harder to manage
- grid computing
- a WAN collection of computing elements
- even more complex
4Cluster Computing at CERN
- batch data analysis
- e.g. lxbatch currently in production
- workload management system (e.g. LSF)
- automatic scheduling and load-balancing
- batch jobs hours, days to complete
- interactive data analysis
- currently desktop, will have to be distributed
for LHC - tried in the past for ntuple analysis
- PIAF (Parallel Interactive Analysis Facility)
- running copies of PAW on behalf of the user.
- 8 nodes and tight coupling with the application
layer (PAW) - semi-interactive analysis becomes more important
minutes... hours
5(No Transcript)
6Topology of I/O Intensive App.
- ntuple mostly I/O intensive rather than CPU
intensive - fast DB access from cluster
- slow network from user to cluster
- very small amount of data exchanged between the
tasks in comparison to"input" data
7Parallel Ntuple Analysis
- data driven
- all workers perform same task (similar to SPMD)
- synchronization quite simple (independent
workers) - master/worker model
8(No Transcript)
9(No Transcript)
10Master/Worker model
- applications share the same computation model
- so also share a big part of the framework code
- but have different non-functional requirements
11What DIANE is?
- RD project in IT/API
- semi-interactive parallel analysis for LHC
- middleware technology evaluation choice
- CORBA, MPI, Condor, LSF...
- also see how to integrate API products with GRID
- prototyping (focus on ntuple analysis)
- time scale and resources
- Jan 2001 start (lt 1 FTE)
- June 2002 running prototype exists
- sample Ntuple analysis with Anaphe
- event-level parallel Geant4 simulation
12What DIANE is?
- framework for parallel cluster computation
- application-oriented
- master-worker model common in HEP applications
- application-independent
- apps dynamically loaded in a plugin style
- callbacks to applications via abstract interfaces
- component-based
- subsystems and services packaged into component
libraries - core architecture uses CORBA and CCM (CORBA
Component Model ) - integration layer between applications and the
GRID - environment and deployment tools
13What DIANE is not ?
- DIANE is not
- a replacement for a GRID and its services
- a hardwired analysis toolkit
14DIANE and GRID
- DIANE as a GRID computing element
- ...via a gateway that understands Grid/JDL
- ... Grid/JDL must be able to descibe parallel
jobs/tasks - DIANE as a user of (low level) Grid services
- ...authentication, security, load balancing...
- and profit from existing 3rd party
implementations - python environment is a rapid prototyping
platform - and may provide a convinient connection between
DIANE and Globus Toolkit via pyGlobus API
15Architecture Overview
- layering abstract middleware interfaces and
components - plugin-style application loading
16Client Side DIANE
- thin client / lightweight XML job description
protocol - just create a well-formed job description in XML
- send and read the results back as XML data
messages - connection scenarios
- standalone clients C, python client apps
- explicit connection from a shell prompt
- flexibility and choice of command-line tools
- clients integrated into analysis framework e.g.
Lizard/python - hidden connection behind-the-scenes
- Web access Java-CORBA binding, SOAP (?)
- universal and easy access
17Data Exchange Protocol (1)
- XDR concept in C
- Specify data format
- Type and order of data fields
- Data messages
- Sender and receiver agree on the format
- Message is send as opaque object (any)
- C type may be different at each side
- Interfaces with flexible data types
- E.g. store list of identifiers (unknown type)
18Data Exchange Protocol (2)
class A public DXPDataObject public
DXPString name // predefined fundamental
types DXPLong index DXPSequenceDataObje
ctltDXPplain_Doublegt ratio B b
// nested complex object A(DXPDataObject
parent) DXPDataObject(parent), name(this),
index(this), ratio(this), B(this)
19Data Exchange Protocol (3)
- External streaming supported, e.g
- Serialize as CORBAbyte_sequence
- Serialize to XML (ascii string)
- Visitor pattern new formats easy
- Handles
- Opaque objects (any)
- Typed objects safe casts
DXPTypedDataObjectltAgt a1,a2 // explicit
format DXPAnyDataObject x a1 // opaque
object a2 x if(a1.isValid()) // "cast
successful"
20Server Side Architecture
- Corba Component Model (CCM)
- pluggable components services
- make a truly component system on the core
architecture level - common interface to the service components
- difficult due to different nature of the services
implementations - example load-balancing service
- Condor - process migration
- LSF - black-box load balancing
- custom PULL implenetation - active load balancing
- but first results show that it is feasible
21DIANE CORBA
- CORBA
- industry standard (mature and tested)
- scalable (we need 1000s of nodes and processes)
- language and platform independent (IDL)
- C, C, Java, python,...
- many implementations commercial and open source
- directly supports OO, abstract interfaces
- CORBA facilities
- naming service, trading service etc.
- Corba Component Model
- supports component programming (evolution of OO)
22Component Technology
- components are not classes!
- components are deployment units
- they live in libraries, object files and binaries
- they interact with the external world only via an
abstract interface - total separation from underlying implementation
- classes are source code organization units
- they exist on different design levels and support
different semantics - utility classes (e.g. STL vectors or smart
pointers) - mathematical classes (e.g. HepMatrix)
- complex domain classes (e.g. FMLFitter)
- but a class may implement a component
- OO fails to reuse, component technology might
help (hopefully)
23(No Transcript)
24Server Side DIANE
25Server Side DIANE
26CORBA and XML in Practice
- inter-operability (shown in the prototype ntuple
application) - cross-release (muchos gracias XML!)
- client running Lizard/Anaphe 3.6.6
- server running 4.0.0-pre1
- cross-language (muchos gracias CORBA!)
- python CORBA client (30 lines)
- C CORBA server
- compact XML data messages
- 500 bytes to server, 22k bytes from server of XML
description - factor 106 less than original data (30 MB ntuple)
- thin client no need to run Lizard on the client
side as an alternative use case scenario
27Load balancing service
- Black-box (e.g. LSF)
- limited control -gt submit jobs (black box)
- job queues with CPU limits
- automatic load balancing, scheduling (task
creation and dispatch) - prototype deployed (10s workers)
- Explicit PULL LB
- custom daemons
- more control -gt explicit creation of tasks
- load balancing callbacks into specific
application - prototype custom PULL load-balancing (10s
workers)
28Dedicated Interactive Cluster (1)
- Daemons per node
- Dynamic process allocation
29Dedicated Interactive Cluster (2)
- Daemons per user per node
- Thread pools, per-user policies
30Error Recovery Service
- The mechanisms
- daemon control layer
- make sure that the core framework process are
alive - periodical ping need to be hierarchized to be
scalable - worker sandbox
- protect from the seg-faults in the user
applications - memory corruption
- exceptions
- signals
- based on standard Unix mechanisms child
processes and signals
31(No Transcript)
32Other Services
- Interactive data analysis
- connection-oriented vs connectionless
- monitoring and fault recovery
- User environment replication
- do not rely on the common filesystem (e.g. AFS)
- distribution of application code
- binary exchange possible for homogeneous clusters
- distribution of local setup data
- configuration files, etc
- binary dependencies (shared libraries etc)
33Optimization
- Optimizing distributed I/O access to data
- clustering of the data in the DB on the per-task
basis - depends on the experiment-specific I/O solution
- Load balancing
- framework is not directly addressing low level
issues - ...but the design must be LB-aware
- partition the initial data set and assign data
chunks to tasks - how big chunks?
- static/adaptive algorithm?
- push vs pull model for dispatching tasks
- etc.
34Further Evolution
- expect full integration and collaboration with
LCG according to their schedule - software evolution and policy
- distributed technology (CORBA, RMI, DCOM,
sockets, ...) - persistency technology (LCG RTAGs -gt ODBMS,
RDBMS, RIO) - programming/scripting languages (C, Java,
python,...) - evolution of GRID technologies and services
- Globus
- LCG, DataGrid, CrossGrid (interactive apps)
- ...
35Limitations
- Model limited to Master/Worker
- More complex synchronization patterns
- some particular cpu-intensive applications
require fine-grained synchronization between
workers - this is NOT provided by the framework
and must be achieved by other means (e.g MPI) - Intra-cluster scope NOT a global metacomputer
- Grid-enabled gateway to enter Grid universe
- otherwise the framework is independent thanks to
Abstract Interfaces
36Similar Projects in HEP
- PIAF (history)
- using PAW
- TOP-C
- G4 examples for parallelism at event-level
- BlueOx
- Java
- using JAS for analysis
- some space for communality via AIDA
- PROOF
- based on ROOT
37Summary
- first prototype ready and working
- proof of concept for up to 50 workers
- 1000 workers needs to be checked
- initial deployment
- integration with Lizard analysis tool
- Geant 4 simulation
- active RD in component architecture
- relation to LCG to be established
38That's about it
- cern.ch/moscicki/work
- cern.ch/anaphe
- aida.freehep.org
39Facade for end-user analysis
- 3 groups of user roles
- developers of distributed analysis applications
- brand new applications e.g. simulation
- advanced users with custom ntuple analysis code
- similar to Lizard Analyzer
- execute custom algorithm on the parallel ntuple
scan - interactive users
- do the standard projections
- just specify the histogram and ntuple to project
- user-friendly means
- show only the relevant details
- hide the complexity of the underlying system
40Facade for end-user analysis
41Ntuple Projection Example
- example of semi-interactive analysis
- data 30 MB HBOOK ntuple / 37K rows / 160 columns
- time minutes .. hours
- timings
- desktop (400Mhz, 128MB RAM) - c.a. 4 minutes
- standalone lxplus (800Mhz, SMP, 512MB RAM) - c.a.
45 sec - 6 lxplus workers - c.a. 18 sec
- why 6 18 45 ?
- job is small, so big fraction of the time is
compilation and dll loading, rather than
computation - pre-installing application would improve the
speed - caveat example running on AFS and public machines