Presentacin de PowerPoint - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Presentacin de PowerPoint

Description:

Jesus Marco, Rafael Marco, Oscar Ponce. David Rodriguez. IFCA-CSIC (SPAIN) CROSSGRID WP1 MEETING ... Distributed data-mining techniques (mainly Neural Networks, ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 21
Provided by: jesusmarc
Category:

less

Transcript and Presenter's Notes

Title: Presentacin de PowerPoint


1
Interactive High Energy Physics Application
Celso Martinez-Rivero Jesus Marco, Rafael
Marco, Oscar Ponce David Rodriguez IFCA-CSIC
(SPAIN) CROSSGRID WP1 MEETING 18-III-2002 KRAKOW
2
Distributed Physics Analysis in HEP
  • Final user applications for interactive physics
    analysis in a GRID aware environment for LHC
    experiments
  • Access to large O/R DBMS through middleware
    server vs. access to catalogue distributed root
    data files
  • Distributed data-mining techniques (mainly
    Neural Networks, also self organizing maps, ...)
  • Integration of user-friendly interactive access
    via portals
  • CSIC UAB FZK INP INS

3
HEP interactive analysis
  • USER REQUIREMENTS
  • LHC experiments
  • Other taking data? CDF, BaBar
  • Physics analysis physics results (TDR,
    publications)
  • HLT Trigger optimisation
  • TASKS
  • 1.3.1 Interactive Distributed Data Access
  • 1.3.2 Data Mining Techniques
  • 1.3.3 Integration and Deployment
  • 1.3.4 Application to LHC Physics TDR
  • DEVELOPPERS
  • CSIC D.Rodriguez (O/R DBMS),O.Ponce,
    C.Martinez,J.Marco
  • FZK M.Kunze people on ROOT side
  • CMSCSIC (Santander) C.Martinez
  • ALICEFZK M.Kunze
  • ATLAS UAB A.Pacheco, M.Bosman CSIC (Valencia)
    E.Ros, J.Salt, INP P.Malecki
  • LHCB INP M.Witek

4
The final product GUI
Monitoring
Graphic Output/(Input?)
DATASET Dictionary (Classes) Basic
Object Derived Procedures
Alphanumeric Output
Analysis Scripts
Work Persistency
5
(XML) Info Work flow
UI
Replica/Data Manager Web Service Finder?
Broker
Master CE
CACHED
6
Storage Element as WebService?
  • Current SE in EDG
  • GridFTP server
  • WebService approach
  • Passive SE GridFTP, or /grid , etc...
  • Active SE
  • SQL QUERY (ResultSet in XML) SELECT FROM
  • (Three tier servlet running, like Spitfire)
    ready! (IBM IDS)
  • ROOT query (does this make sense? Paw query
    does make sense, implemented...)
  • PROCESSING QUERY ( Agent) Stored Procedure or
    XML description (SOAP like?)
  • SQL QUERY ok for NN in HEP
  • PROCESSING QUERY (Agent-like approach) needed
    likely for SOM

7
Data Mining
  • Basic case 1
  • Neural Network, BFGS
  • Objective Parallel Training
  • Balance Data Load (How? Split AFTER DB Query)
  • MPI works ok for distributed calculus !
  • How to do in ROOT environment (PROOF?)
  • First Results...
  • Basic case 2
  • SOM (Unsupervised learning)
  • Objective Cluster Analysis
  • Wait for Meteo 1.4.b experience

8
Parallel NN
  • Prototype
  • MLP package in PAW (databasen-tuples)
  • Used for DELPHI Higgs search (REALISTIC)
  • BFGS method
  • Current setup
  • Scan ntuples to filter events variables
  • Result set in XML, split according to CEs
  • 1 Master n slaves, in LOCAL cluster, using
    MPICH G2
  • NN architecture 16-10-10-1
  • Master sends initial weights, each node returns
    gradient and errors to master who adds and sends
    new weights...
  • Scaling ok in local cluster, now moving to
    non-local environment with latency and QoS in
    mind!
  • Obvious solution adapt NN load in each node to
    latency time

9
NN scaling
Lattency curve
644577 events, 16 variables 16-10-10-1
architecture 1000 epochs for training
10
Integration
  • Final User applications should run on simulated
    data from the 4 LHC experiments.
  • TBD

11
Example of Trigger Analysis in CMS
  • HLT Trigger almost equivalent to offline physics
    analysis
  • 3 Trigger levels under study
  • L1 Hardware
  • L2 Software standalone
  • L3 Software full reconstruction

12
Example Single m stream
13
Fine Tuning of HLT using Crossgrid
  • Some basic topologies with no need of high level
    analysis (isolated high Pt muons)
  • However other topologies much more difficult to
    separate from background
  • More sophisticated analysis mandatory to keep
    background rate low while high efficiency!
  • Crossgrid will help here!

14
Trigger CPU Time
  • L2 780 ms average
  • Main processgt Trajectory Builder
  • Important queues!
  • L3 1689 ms average large fluctuations mostly
    combinatorial.
  • Reduce muon cone?

15
Trigger CPU Time
  • What to do with large CPU consuming events?
  • Use Grid to avoid collapsing the standard
    trigger farms.
  • All events requiring more time to process than a
    certain limit time are automatically sent to
    Crosssgrid for triggering

16
What do we need from Tools Services
  • GRID Application Programming Environment
  • Verification of MPI use YES
  • Performance prediction YES
  • Monitoring YES (action if a node is down)
  • Services Tools
  • User friendly portals YES (we go for all XML)
  • Roaming access YES (but implicit in portal)
  • Efficient distributed data access YES (need close
    contact)
  • Specific resource management YES (general
    question, how to manage interactive parallel
    jobs)

17
Short Answer to WP2 questions
  • Programming languages
  • C, C,Java
  • HLA no
  • CCA could be, not priority
  • Component structure
  • Matrix involved in gradient for NN
  • Granularity
  • Important if NN becomes too big, and in SOMs
  • Performance problems
  • Main Latency
  • Monitoring
  • Check dead nodes, lets think what to do
  • Node time for processing, Latency
  • Storage element performance

18
Short Answer to WP2 questions
  • MPI yes
  • Calls (initial list)

Now MPI_Init(argc, argv) MPI_Comm_size(MPI_COM
M_WORLD, nproc) MPI_Comm_rank(MPI_COMM_WORLD,
rank) MPI_Get_processor_name(processor_name,nam
elen) MPI_Barrier(MPI_COMM_WORLD) MPI_Bcast(buff
er_int, 1, MPI_INT, 0, MPI_COMM_WORLD) MPI_Finali
ze() MPI_Recv(buffer_dbl,num_var , MPI_DOUBLE,
i, 30, MPI_COMM_WORLD, status) MPI_Send(buffer_d
bl, num_var, MPI_DOUBLE, 0, 30,
MPI_COMM_WORLD) Soon Next MPI_Reduce(buffer_dbl
_slave, buffer_dbl_master, num_datos, MPI_DOUBLE,
MPI_SUM , 0, MPI_COMM_WORLD)
19
What do we need from Testbed
  • Definition of Interactive resources?
  • Reserved/Priorized?
  • Cached storage(?)
  • Submission or Interactive session?
  • Check latency and if possible impact of QoS (see
    map)
  • Need about 3-5 testbed sites each one with 50
    processors to make a realistic test of the
    potential.

20
CrossGrid WP4 - International Testbed Organisation
Network (Geant) setup
Write a Comment
User Comments (0)
About PowerShow.com