BlueOx: A Java Framework for Distributed Data Analysis PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: BlueOx: A Java Framework for Distributed Data Analysis


1
BlueOx A Java Framework for Distributed Data
Analysis
  • Jeremiah Mans
  • Princeton University
  • CHEP 2003 San Diego, CA

2
Outline
  • Overview of BlueOx and its goals
  • Design and Structure of BlueOx
  • Current Status
  • Future Directions

3
Code to Data
Users Analysis Code
4
Goals of BlueOx
  • Generic code-to-data analysis framework
  • User writes analysis code on his or her desktop
    or uses an analysis code generator program
  • Framework is responsible for distributing and
    executing the analysis
  • Provide support for debugging of code
  • Expandable and adaptable framework
  • Allow addition of new data formats, communication
    protocols, authentication systems

5
Where BlueOx Fits
Immediately Interactive
Remote Batch Processing
10000 sec
1000000 sec
1 sec
100 sec
Monte Carlo Production
ROOT
Histogram Browsing
Database Queries
Arbitrary Code
Arbitrary Executables
6
Actors in BlueOx
Agent
Agent class which represents the User and
coordinates the analysis process
User the physicist at a local institute
Servers data analysis servers located locally
and around the world
7
Job Lifecycle Discovery
Discovery
  • The Agent uses the Discovery interface to obtain
    a list of the available datasets, possibly based
    on a query provided by the User.

8
Job Lifecycle Brokering
Brokerage
  • The User supplies a Job class and a list of
    datasets. The Agent uses an instance of the
    Brokerage interface to obtain service contracts
    assigning the analysis of each dataset to a
    Server.
  • In general, the Brokerage will either contact a
    subset of Servers directly, or will contact
    Proxies which handle the job assignments for a
    group of Servers (such as a cluster).

9
Job Lifecycle Execution
DataSource
Communications Schemes
DataSource
DataSource
  • The Agent distributes split copies of the Job to
    the various Servers, and monitors the progress of
    execution.

DataSource
DataSource
DataSource
10
Job Lifecycle Merging
  • When the split Jobs complete, the Agent gathers
    them from the remote Servers and merges them into
    a single Job, which it returns to the User.
  • If the job crashes on one or more Servers, the
    Agent reports the details of the exception which
    terminated the Job to the User.

11
Abstract Interfaces
  • The main actors in BlueOx are the Agent which
    represents the user and the Servers.
  • The interaction between the main actors is
    defined by abstract interfaces which support
    multiple implementations
  • Interfaces in BlueOx
  • User authentication Password-based, certificate
    based,
  • Dataset discovery Direct server contact, LDAP,
  • Data Access/Storage ROOT, HBOOK, custom,
  • Communications Scheme

12
Abstract InterfacesCommunications Scheme
  • Allows communication between the Agent and Server
    during job execution
  • Responsible for transporting analysis objects and
    classes (code)
  • Transports debugging and monitoring information
  • Arbitrary textual messages from users code
  • Exceptions which occur within the users code
  • Implemented with several different technologies
  • Packet-based two-way communication
  • Client-connection-only communication (polling)
  • Two-way SOAP

13
Mergeability
  • For BlueOx to be able to distribute an analysis
    and combine the results afterwards, the analysis
    must support mergeability the ability to merge
    or add one object of a given class to another of
    the same class.
  • Simple operation for many objects
  • Counters and histograms add arithemetically
  • Lists concatenate (and possibly re-sort)
  • BlueOx contains several utility classes which
    support Mergeability
  • An object which contains only mergeable,
    transient, or static member variables can be
    merged automatically Automergeable

14
BlueOx and AIDA
  • BlueOx supports the use of the Abstract
    Interfaces for Data Analysis (AIDA), particularly
    for histograms and similar objects.
  • BlueOx provides the implementation of Mergeable
    for AIDA and manages the serialization of AIDA
    objects.
  • BlueOx does not provide a full implementation of
    AIDA, but rather provides a wrapper
    implementation, which aims to provide the merge
    and serialization functionality to any AIDA
    implementation.
  • Currently, the use of JAIDA 3.0 is supported in
    BlueOx.

15
Doing Analysis in BlueOx
  • User writes an Analyzer class, which may employ
    various extensions to enable configuration,
    startup and completion tasks, and interfacing
    with a GUI.
  • Experiment-specific information ( such as the
    data format used for the experiments data ) must
    be provided separately to the user.
  • Future revisions of BlueOx may support XML
    schemas.
  • The user employs either a command-line or
    graphical interface to select data sets, start
    the job, and follow it to its completion.

16
Demonstration Analysis
  • (Quick overview here come to the demonstration
    for more details!)
  • Data Source Pythia-generated events for 500 GeV
    e/e- collider, followed by smearingreconstructio
    n program
  • Dataset discovery LDAP database
  • Brokering Direct-contact brokering
  • Authentication SSH-like key files
  • Analysis focus ee- ? ZH ? nnbb

17
Choosing Datasets
  • H

18
Plots from HZ ? nnbb Analysis
19
Testing BlueOx
  • 40 Servers, running on four machines
  • 100 Clients, running on one machine
  • 100 Jobs/Client (10,000 total jobs executed)

20
Future Development
  • Increase sophistication of brokering
  • Current implementation is not scalable, but we
    have some ideas on how to improve it.
  • Test analysis framework with more real analysis
  • We are developing a DataSource which would allow
    analysis of DØ data by our undergraduates
  • Enhance user tools
  • Integration with JAS v3.0
  • Schema-based analysis-creation wizard

21
Further Information
  • Web site including BlueOx Companion and JavaDoc
    documentation http//flywheel.princeton.edu/Blu
    eOx/
Write a Comment
User Comments (0)
About PowerShow.com