Interactive Data Analysis on the - PowerPoint PPT Presentation

About This Presentation
Title:

Interactive Data Analysis on the

Description:

Tech-X Corporation, Boulder, CO. Interactive Data Analysis. on the 'Grid' ... Svetlana Shasharina, Ph.D. and Vice-President of Distributed Technologies ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 17
Provided by: osgdocdbO
Category:

less

Transcript and Presenter's Notes

Title: Interactive Data Analysis on the


1
Interactive Data Analysison the GridDataset
Analysis Grid Service (DAGS)
Tech-X Corporation, Boulder, CO
  • Tech-X/SLAC/PPDGCS-11
  • David Alexander (alexanda_at_txcorp.com)
  • Balamurali Ananthan (bala_at_txcorp.com)
  • Tony Johnson (tony_johnson_at_slac.stanford.edu)
  • Victor Serbo (serbo_at_slac.stanford.edu)
  • Presented at the Open Science Grid Applications
    Meeting, SLAC, June 2005

2
Outline
  • Motivation for Dataset Analysis Grid Service
  • Demo
  • Structure Architecture
  • Running on the OSG

3
Motivations
  • Enable the well-developed JAS3 to use Grid by
    developing a set of grid services
  • Aim to produce complete interactive data analysis
    system/framework
  • Based as much as possible on CS-11 APIs
  • Web Services can use latest Globus (GT4.0) and
    can utilize science grid compute elements via
    OSG/VDT.
  • JAS3 is only one implementation of a client that
    can use the service interface

4
Focus on Interactive Analysis
  • Intermediate Results in Real Time
  • User waits seconds for results
  • Plots update as analysis proceeds
  • High Degree of Control
  • Stop and restart at anytime
  • Change cuts/binning etc. and see immediate
    results without restarting

5
Focus Moving Code to Data
  • Need portable code (java, scripting, Root Macros,
    etc.)
  • Need an interfaces for analysis (some CS-11,
    AIDA, some of our own)

6
The Most Basic Picture
Grid Resources
The Grid
Client Application
REQUEST FOR DATA
Slave Service Node
Main Service Node
Dataset
AVAILABLE DATA IDs
Slave Service Node
STAGING
CODE DATA IDs
RESULTS
Other Service Nodes
Slave Service Node
RESULTS
CODE
7
Usage Procedure
  • Create Proxy (Grid sign-on)
  • Connect to Service
  • Start Session
  • Choose Stage Dataset
  • Load Analysis Code
  • Run Analysis See Results

8
Dataset Catalog Service
  • Allows user to browse dataset hierarchy
  • Allows user to search using meta-data
    associated with each dataset
  • Output is Grid Service Handle (GSH) of the
    Dataset Locator
  • The Locator service that knows the actual
    location of the Dataset.
  • String ID of the Dataset (An opaque string
    interpreted only by the dataset locator)

9
DEMO(Bala Ananthan)
10
Structure Architecture
  • We wanted to use just Globus Web Services
    everywhere, but.
  • OSG/VDT is not using Web Service
  • Performance is not good enough for interactivity
  • Trivial Service Invocation (Globus Toolkit Sample
    Service) over 10Mbit LAN
  • RMI 100 calls - 96ms
  • Globus (non-secure) 100 calls - 22 seconds
  • (excluding first invocation)
  • Globus (secure) 100 calls - 133 seconds

11
(No Transcript)
12
Key design goals
  • Dataset Analysis Grid Service
  • Only install analysis services on a specific node
    called portal. (Compute Element software
    dynamically deployed from portal node)
  • Portal can use any OSG site that gives it RMI
    access
  • access to portal not client
  • may need a head-node configuration
  • No Grid software on Client node (needs JavaVM)

13
Running on OSG
  • We know how to use GRAM and GridFTP servers on
    OSG sites.
  • RMI connectivity needs to be investigated.
  • firewalls between service node and compute
    elements will be an issue
  • may need relay at OSG site that is a bridge
    across the firewall to the compute elements
  • We are looking for test-sites, where we can
    install the portal services on one node that has
    the RMI port exposed to the internet

14
Appendix
  • Tech-X Corporation Information

15
Tech-X Corporation Origins
  • Founded in 1994 by
  • John R. Cary, CEO and Professor of Physics at the
    University of Colorado
  • Svetlana Shasharina, Ph.D. and Vice-President of
    Distributed Technologies
  • Headquarters in Boulder, Colorado
  • Employee-owned
  • 31 employees (75 holding a Ph.D.) with physics
    and computer science expertise

16
Employee Expertise
  • Scientific computation
  • Particle accelerator modelling
  • Computational electromagnetics
  • Fusion plasma modelling
  • Nano-materials modelling
  • Infrastructure for scientific computation and
    data management
  • Middleware grid technologies, XML data
    integration, bridges (4th Generation Languages
    and distributed technologies), CORBA
  • High-performance computing MPI, cluster
    computing
  • Visualization graphical user interfaces for
    scientific software, remote visualization of
    large datasets generated at remote clusters and
    supercomputers
Write a Comment
User Comments (0)
About PowerShow.com