Distributed Data Management and Processing 2.3

About This Presentation

Title:

Distributed Data Management and Processing 2.3

Description:

Online System. CERN Computer Center. US Center. Fermilab. France Regional Center ... available with similar goals (LSF, PBS, DQS, Condor, ...) but at the moment ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 42

Provided by: ygap5

Learn more at: https://uscms.org

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Data Management and Processing 2.3

1
Distributed Data Management and Processing 2.3

Ian Fisk
U.C. San Diego

2
Introduction to 2.3, Distributed Data Management
and Processing

2.3 Distributed Data Management and Processing
develops software tools to support the CMS
Distributed Computing Model.

Tier 01
100 MBytes/sec
Online System
PBytes/sec
US Center Fermilab
Italy Regional Center
Tier 1
France Regional Center
UK Regional Center
Tier 2
Tier 3
Institute
Institute
Institute
Institute
Physics data cache
Tier 4
3
Introduction to 2.3, Distributed Data Management
and Processing

Supporting the CMS Distributed Computing Model is
a daunting task.
1/3 Processing Capability at Tier0, 1/3 Tier1s,
and 1/3 Tier2s
Centers are spread globally over networks of
variable bandwidth
Most physicists will be performing analysis at
remote centers that locally have only a portion
of the raw or reconstructed data.
Creating tools that will allow efficient access
to data at local sites and to resources at remote
sites is a complicated task.
This is a larger dataset, using more computing
power, spread over a greater distance, than HEP
has previously attempted and it requires a more
advanced set of tools.

4
Introduction to 2.3, Distributed Data Management
and Processing

DDMP attempts to break the project into 5
manageable pieces for efficient development
2.3.1 Distributed Process Management
2.3.2 Distributed Database Management
2.3.3 Load Balancing
2.3.4 Distributed Production Tools
2.3.5 System Simulation
The combination of these should allow CMS to
take full advantage of the grid of Distributed
Computing Resources.
CMS has immediate needs for simulation for use in
completing TDRs (HLT, Physics, ). Wherever
possible the project attempts to develop tools
that are useful in production at existing
facilities while developing more advanced tools
for the future.

5
Introduction to 2.3 Distributed Data Management
and Processing

Distributed Data Management and Processing
attempts to integrate software developed
elsewhere whenever possible. Exploiting a number
of tools being developed for grid computing.

6
Introduction to Distributed Process Management
2.3.1

The Goal of Distributed Process Management is to
develop tools that enable physicists to make
efficient use of computing resources distributed
world wide.
There are a number of tools available with
similar goals (LSF, PBS, DQS, Condor, ) but at
the moment none are judged to be adequate to meet
long term needs of CMS.
Important Issues
Keeping track of long running jobs
Support collaboration among multiple physicists
Conserving limiting network bandwidth
Maintaining high availability
Tolerating partition failures that are common to
WANs

7
Distributed Process Management Prototype
Development

Prototype introduces the concept of a session
which is a container for interrelated jobs. This
allows submission, monitoring, and termination
with a single command. Sessions can be shared
Processors can be chosen based on data
availability,processor type, and load.

Replicated states are maintained so that
computations
will not be lost if a server fails.
Prototype is based on functional language ML and
Group Communications Toolkit. The Group
Communications Toolkit aids writing
distributed
programs.

8
Distributed Process Management Current Status

Working prototype exists with features described
on previous slide.
The system has been tested with 32 processors
performing CMS ORCA production.
Some scalability issues were encountered and
repaired.
System has been tested on 65 processors with no
scalability problems encountered.

9
Distributed Process Management Prototype Future
Plans

In the next few months Distributed Process
Management will move development efforts to the
CMS Tier2 Prototype Computation Center at
Caltech/UCSD.
The unique split nature of the center and large
number of processors makes it a nearly ideal
place to work on scalability, remote submission,
and more complex ORCA scenarios.
Spring 2001 there are plans for support for
multiple users and development of a queuing
system when resources are unavailable.
First Prototype is expected to be complete in the
summer of 2001.

10
Distributed Process Management Fully Functional
Development

Milestones tied to deliverables to CMS for use in
production.
Program starts with algorithm development for use
in Process Management including data aware Self
Organizing Neural Network agents for scheduling.
Fully Functional should be completed sometime in
2003.

11
Introduction to Distributed Database Management
2.3.2

Distributed Database Management develops tools
external to the ODBMS that control replication
and synchronization of data over the grid as well
as monitoring and improving the performance of
database access.
As event production becomes less CERN centric
there is an immediate need for tools to replicate
data produced at remote sites. There is also a
need to evaluate and improve the performance of
database access.
In the future for Distributed Production there is
a need for tools to automatically synchronize
some databases over all sites analyzing results
and a need to replicate databases on demand for
remote analysis jobs.
Distributed Database Management attempts to meet
both these needs.

12
Distributed Database Management Prototype
Development

To meet the long and short term goals two paths
were pursued an investigational prototype
written in Perl and development with the Grid
Data Management Pilot (GDMP) on a functional
prototype based on Globus Middleware.
Both require high speed transfers, secure data
access, transfer verification, integration of the
data upon arrival, and remote catalogue querying
and publishing.

13
Investigational Prototype Goals

one-way bulk replication of read-only (static)
datafiles
simple prototype using available software
RFIO from HPSS to disk at CERN
SCP from disk at CERN to disk at Fermilab
FMSS to archive from disk at Fermilab to tape
Objectivity tools (oodumpcatalog, oodumpschema,
oonewfd, ooschemaupgrade, ooattachdb)
all wrapped up in Perl with HTTP and TCP/IP
aim for automation, not performance
transferring 1 2GB file is easy, transferring
1000 is not
objective is to clone (part of) a federation from
CERN in Fermilab.
automated MSS ? MSS transfer via (small) disk
pools
use as a possible fallback solution.
documentation at http//home.cern.ch/wildish

14
Investigational Prototype Steps

The basic steps
Create an empty federation with the right schema
and pagesize.
Get the schema directly from the source
federation via a web-enabled ooschemadump.
Find out what data is available.
Use a web-enabled oodumpcatalog to list the
source federation catalogue.
Determine what is new w.r.t. your local
federation.
Use a catalogue-diff, based on DB Name or ID.
Request the files you want from a server at the
source site.
Server will stage files from HPSS to a local disk
buffer, then send them to you.
Process files as they arrive.
Attach them to your federation, archive them to
MSS, purge them when your local disk buffer fills
up.
Repeat steps 2,3,4 as desired, and step 5 as
desired.

15
Exporting data prototype design
http
Catalogue-server
Remote client
ooschemadump oodumpcatalog
ooschemaupgrade oonewfd
User federation
New cloned federation
(firewall?)
(firewall?)
CERN
Fermilab
16
Distributed Database Management Investigational
Prototype Exporting
http
Catalogue-server
Catalogue-diff
TCP socket
DBServer
oodumpcatalog
HPSS
Local cloned federation
rfcp
(firewall?)
MSS
disk pool
ooattachdb
scp (Secure Shell Copy)
archive
Process new DBs
(firewall?)
CERN
disk pool
Fermilab
17
Distributed Database Management Investigational
Prototype Results

600GB transferred in 9 days
SHIFT20 (200GB disk) ? CMSUN1 (280GB disk)
federation built automatically as data arrived
data archived automatically to FMSS
peak rate 2.7MB/sec sustainable for several hours
performance unaffected by batch jobs running on
Fermilab client or CERN server.
best results with ? 40 simultaneous copies
running
monitored with the production-monitoring system
monitor Fermilab client from a desktop at CERN
Investigation Prototype development frozen, but
parts of the code are being reused for
Distributed Production Tools, updated monitoring
system, and database comparisons.

18
Distributed Database Management Functional
Prototype

Flexible, layered, and modular architecture
designed to be able to support modifications and
extensions using Globus as the basic Middleware.
Data Model
Export Catalog
Contains information about the new files produced
which are ready to be accessed by other sites.
Export catalog is published to all the subscribed
sites.
A new export catalog is generated, every time a
site wants to publish its files, which contains
the newly generated files only.
Import Catalog
Contains the information about the files which
have been published by other sites but not yet
transferred locally.
As soon as the file is transferred locally,
validated and attached to the federation, it is
removed from the import catalog.
Subscription Service
All the sites that subscribe to a particular site
get notified whenever there is an update in its
catalog. Supports both a push and pull
mechanism.

19
Database Replicator Functional Prototype
Architecture

Communication
Control Messages
Data Mover
File Transfers
Logging Incoming and Outgoing Files
Resuming File Transfers
Progress Meters
Error Checks
Security
Authentication and authorization
Replica Manager
Handling Replica Catalogue
Replica Selection and Synchronization

Information Service
Publish data and network resources at sites.
DB Manager
Backend to database specific functions.
Request Manager
Generating Request on the client side and
handling requests on the server side.
Application
Multi-threaded Server handling clients.

Application
Globus-threads
Request Manager
Globus-dc
Globus Rep. Manager
gssapi
GIS
Objy API
DB Manager
Information Service
Replica Manager
Security
Control Comm.
Data Mover
Globus-ftp
Globus_io
Layered Architecture for Distributed Data
Management
21
Integration into the CMS Environment
Site A
CMS environment
Physics software
CheckDB script
GDMP system
Write DB
DB completeness check
CMS/GDMP interface
Production federation
catalog
Site B
Purge file
Copy file to MSS
Stage Purge scripts
Stage Purge scripts
Copy file to MSS
MSS
MSS
Transfer attach
Update catalog
Purge file
User federation
User federation
catalog
catalog
wan
Stage file (opt)
trigger
trigger
trigger
read
GDMP export catalog
Subscribers list
GDMP import catalog
Replicate files
write
Generate import catalog
Publish new catalog
Generate new catalog
GDMP server
22
Database Replicator Functional Prototype Current
Status

The decision was made to use the Functional
Prototype in the fall ORCA production.
This required adding some features and making it
more fault tolerant.
Parallel Transfers to improve performance.
Resumption of file transfer from checkpoints to
handle network interruptions.
Catalogue filtering to allow more choices for
files to import and export to remote sites.
User Guide
Being used at remote centers for ORCA fall
production to handle replication of Objectivity
files.

23
Database Replicator Prototype Future Plans

When the GDMP tools were written they were
tightly coupled to Objectivity applications and
they were unable to replicate non-Objectivity
files. With the addition of Globus Replica
Catalogue, they should be able to perform file
format independent replication in January of
2001.
In May 2001 integration and development of Grid
Information Services should begin. At the moment
the data replicator cannot make an intelligent
choice as to which copy to access given choices.
This decision should be made based on current
network bandwidth, latency between two given
nodes, load on the data servers, etc.

24
Fully Functional Prototype Development

Development toward a fully functional prototype
is foreseen starting after the summer of 2001 and
continuing until 2003.
This involves the testing and integration of grid
tools currently under development
Mobile agents that float on the network
independently, communicate, and make intelligent
decisions when triggered.
Use of virtual data, the concept that all except
irreproducible raw experimental data need exist
only as specifications for how to derive them.

25
Request Redirection Protocol

The second goal of Distributed Database
Management was to evaluate and improve database
access.
The performance and capabilities of the
Objectivity AMS server can be improved by writing
plugins that conform to a well defined interface.
To improve the availability of the database
servers, one such plugin, the Request Redirection
Protocol has been implemented. When the
Federated Database has determined that an AMS has
crashed (due to a disk failure, etc.), jobs can
be automatically transferred to an alternate
server. This has been implemented on the CERN
AMS servers for a month.
In early 2001, a security protocol plugin will be
implemented.

26
Introduction to Load Balancing 2.3.3

Balancing the use of resources in a distributed
computing environment is difficult and requires
the integration and augmentation of elements
Distributed Process Management and Distributed
Database Management with intelligent algorithms
to determine the most efficient course of action.
In a distributed computing system jobs can be
submitted to the computing resources where the
data is available or the data can be moved to
available computing resources.
Deciding between these two cases to efficiently
complete all requests and balance the load over
all the computing grid requires good algorithms
and lots of information about network traffic,
CPU loads, and data availability.

27
Load Balancing Current Status

While there has been considerable work on
Distributed Process Management and Distributed
Database Management and some effort on
information services and algorithm development,
most of the work on Load Balancing is still to
come.
Preliminary Work has been done on a prototype of
Grid Information Services using Globus
Middleware.
Publish outside domain resources that can be
accessed inside domain.

Static
CPU Power
Operating System Details
Software Versions
Available Memory

Dynamic
CPU Load
Network Bandwidth
Network Latency
Updates every few seconds

28
Load Balancing Future Plans

Algorithm Development should start in the summer
of 2001 using conventional and Self Organizing
Neural Network techniques.
Integration of Distributed Process Management and
Distributed Database Management should begin as
those projects enter the fully functional
prototype phase.

29
Introduction to Distributed Production Tools 2.3.4

The Goal of Distributed Production Tools is to
develop tools for immediate use to aid CMS
production at existing computing facilities.
Job submission
Transferring and Archiving Results
System Monitoring
US-CMS until recently had no dedicated production
facilities. Production in the US was performed
on existing facilities with a wide variety of
capabilities, platforms, and configurations.
CMS has an immediate need for simulated events to
complete the Trigger TDR and later the Physics
TDR. This project helps to meet the immediate
need, while lessons learned help long term goals
as well.

30
Distributed Production Tools Current Status

Based on the database replicator investigative
prototype, tools have been designed to
automatically record and archive results of
production performed at remote sites and to
transfer these results to the CERN mass storage
system. This has been primarily used for
archiving CMSIM production performed at Padua,
Moscow, IN2P3, Caltech, Fermilab, Bristol, and
Helsinki.
Tools have been developed to utilize existing
facilities in the US. The aging HP X-class
Exemplar System has been used for CMSIM
production and the Wisconsin Condor system, which
is a scavenger system using spare cycles of
Linux systems has been used for CMSIM production
and will be used for ORCA production this fall.

31
Distributed Production Tools Current Status of
System Monitoring

Tools have been developed to monitor production
systems.
This helps to evaluate and repair bottlenecks in
the production systems.
This provides realistic input parameters to the
system simulation tools and improves the quality
of simulation.
This provides information to make intelligent
choices about requirements of future production
facilities.
Monitoring uses Perl/bash scripts running on each
node
Information is generated in a netlogger
inspired format.
UDP datagrams transmit results to collection
machines
Numerical quantities are histogrammed every n
minutes and put on the web.
During Spring Production it was used to monitor
150 nodes with 25MB ASCII logging per day.

32
Distributed Production Tools Current Status of
System Monitoring

Goals of the project are to try to understand how
best to arrange the data for fast access.
Monitor standard things on data servers
CPU, network, disk I/O, paging, swap, load
average etc.
Monitor the AMS.
Which files the user reads (includes those
already on disk).
Number of open filehandles (also for Lockserver).
Monitor the lockserver
Transaction ages, hosts holding locks etc.
Monitor the staging system
Names of files staged in.
Time it takes for them to arrive.
Names of purged files.

33
System Monitoring Results

This shows the AMS activity on 6 AMS servers by
the simple means of counting the number of
filehandles that each server had open at a given
time.

34
Distributed Production Tools Future Plans

Tools are being developed to support generic job
submission over diverse existing computing
facilities to improve the ease of use. The first
of these which is based on LSF will be available
in the spring of 2001.
It is a relatively small extension of the system
monitoring tools to initiate an action if the
monitoring measures certain kinds of problems.
Already a system exists to send e-mail to the
appropriate people. Tools are being developed so
that all the jobs in a batch queue should be able
to be cleanly stopped or paused if the system
monitoring tools determine that a server has
crashed or that a disk has filled up.

35
Introduction to System Simulation 2.3.5

Distributed Computing Systems of the scope and
complexity proposed by CMS do not yet exist. The
System Simulation project attempts to evaluate
distributed computing plans by performing
simulations of large scale computing systems.
The MONARC simulation toolkit is used. The goals
of MONARC are
To provide realistic modeling of distributed
computing systems, customized for specific HEP
applications.
To reliably model the behavior of computing
facilities and networks, using specific
application software and usage patterns.
To offer a dynamic and flexible simulation
environment.
To provide a design framework to evaluate a range
of possible computing systems as measured by the
ability to provide physicists with the requested
data within the required time.
To narrow down a region of parameter space in
which viable models can be chosen.
The toolkit is Java based to take advantage of
Javas built in support for multi-threaded for
concurrent processing.

36
System Simulation Current Status

MONARC is currently in its third phase and was
recently updated to be able to handle larger
scale simulations.
The simulation of the spring 2000 ORCA production
served as a nice validation of the tool kit.
Using inputs from the system monitoring tools the
simulation was able to accurately reproduce the
behavior of the computing farm CPU utilization,
network traffic and total time to complete jobs.
As an indication of the maturity of the
simulation tools, there is a simulation being
performed of Distributed Process Management using
Self Organizing Neural Networks. Since full
scale production facilities will not be available
for some time, it is nice to get a head start on
algorithm development using the simulation.

37
System Simulation Current Status

Simulation GUI

38
System Simulation Spring HLT Production
Below are simulation examples of network traffic
and CPU efficiency
Measurement
Simulation
39
System Simulation Future Plans

Plans to update the estimated CMS computing needs
in December.
In early 2001 there are plans to update the
MONARC package to have modules for Distributed
Process Management and Distributed Database
Management.

40
System Simulation Future Plans

The upgraded package should allow better
simulation of Distributed Computing Systems. Two
are planned for Spring 2001
A study of the role of tapes in Tier1-Tier2
interactions, which should help describe
interactions and evaluate storage needs.
A complex study of Tier0-Tier1-Tier2 interactions
to evaluate a complete CMS data processing
scenario, including all the major tasks
distributed among regions centers.
During the remainder of 2001 the System
Simulation Project will aid in the development of
load balancing schemes.

41
Conclusions

The CMS Distributed Computing model is complex
and advanced software is needed to make it work.
Tools are needed to submit, monitor and control
groups of jobs at remote and local sites.
Data needs to be moved over the computing grid to
the processes that need it.
An intelligent system needs to exist to determine
the most efficient split of moving data and
exporting processes.
CMS has TDRs due which require large numbers of
simulated events for analysis and tools are
needed to facilitate production.
We are trying to deliver both.