Data queuing in SAMGrid

About This Presentation

Title:

Data queuing in SAMGrid

Description:

D0 is trying to 'fit' into resources scarcely available at participating ... came up with this idea battling inefficiencies in reconstruction and merging at ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 22

Provided by: andrewba6

Learn more at: https://cd-docdb.fnal.gov

Category:

more less

Transcript and Presenter's Notes

Title: Data queuing in SAMGrid

1
Data queuing in SAMGrid

A. Baranovski

D0 is relatively small experiment with
substantial computing requirements
In particular, wrt. Data movement and data
access.
D0 is trying to fit into resources scarcely
available at participating institutions.

Critical mission of the D0 aware middleware
deployment (SAMGRid )is to enable access to data
And do it efficiently .
Efficiency in organizing access to data is the
topic of these slides

SAMGrid supports several types of computing
workflow (branches)
Reprocessing
Montecarlo generation
Merging

Each workflow branch consists of several steps
that may either download or uploading set of
files .
Each step refers to different hardware resources
and different types of data
Download runtime environment( from disk)
Download montecarlo min bias/phase datasets(from
disk)
Download raw data (from tape)
Upload reprocessed/merged/montecarlo data (to
disk/tape)

The variety suggests several use case scenarios
optimization of use of the storage resources
prioritization of the operational tasks

7
Efficiency Use case 1

Different steps within the same branch have
different priority
The further you move down the chain of steps the
greater the impact of failure or delay is going
to be.
Each step consumes resources and we dont want to
waste them
Low efficiency waste of resources

8
Application Use case 2

Branches themselves have different operational
priority
Merging takes over data generation
Data reconstruction takes over montecarlo

9
Efficiency Use case 3

Non homogeneous networks
TCP connection properties vary depending on
destination/source of the data
Data requests that group around similar
source/destination should be also grouped with
respect to their data access priority.

A no brainier solution is to mix all data access
requests into one queue.
A queue that would limit resource use (prevent
overloading) by dispatching requests on first
come first serve basis.
A resource can only serve this many requests at a
time - queue
Offers No prioritization , no distinction among
tasks or network endpoints.

11
The bad

Time to access data is averaged for all processes
in the system.
No way to push certain tasks ahead of others.
Interdependence between steps
High priority tasks that require little data will
have to wait for lower priority tasks that
require more data.

12
Worse

Increased granularity in number of processes that
start and end at the same time. Reduction of the
pipeline effect.
The pipe line (conveyer by H. Ford)
Prepare prerequisite resources (data) before
acquiring critical section (CPU)
The more jobs that start at the same time - the
more CPU resources are unnecessary wasted because
of inability to properly prepare (stage to the
worker) data.
Ideally 1 job starts to replace another that
exists.

13
The worst

Effectively all data resources (grid/non grid
enabled) work in the above modes.
i.e. assuming one size fits all

14
The fix

More flexible way of organizing the data flow
Define independent data access queues per each
branch , sub task and TCP endpoint to fully
control the priority and data resource
consumption at any time.
came up with this idea battling inefficiencies in
reconstruction and merging at Lyon (2 years back)
Explores similarities between data and job
scheduling policies (been around for some time).

15
Example

D0ReprocessingStorageDeploymentMap.ppt

16
Our implementation

Processes (jobs) use fcp to reserve a place in
the desired data queue.
Fcp is a part of the large package samcp
Fcp is the scheduler of actual data transfer
requests.
Data transfers are done via protocol of a choice
(gridftp in our case)
client waits for its turn as signaled by fcpd
daemon running on the machine that the client
want to access data.

Each fcpd instance effectively defines an
independent queue that has name and size.
Size share of the resource this queue is
allowed to get hold of.
We set configuration end environment of the
client such that particular fcpd control
processed is picked.
Name of the queue is the config. qualifier

18
Summary

Each workflow branch/step/TCP endpoint
combination can now resolve into a unique fcpd
queue name
Large number of queues !

19
Main issues

Consistency of configuration that parameterizes
the above function is very important aspect that
needs to be paid attention.
can be hard to maintain or respond to changes in
the deployment.
Fcp runs on data nodes that need to be in sync
with central(forwarding) node configuration
Static configuration of queues may leave
resources underutilized

20
Issues summary

Weve managed to stabilize data configurations
for OSG and LCG (took some time)
Usual tradeoff when using old technology (fcp) in
the new way.
There is an opportunity to develop an abstract
solution by further generalizing our experience
Look into condor, FTS , etc

Detailed reading
http//www-d0/computing/grid/doc/Application-Resou
rceTuning-01Aug05-cut.pdf

Write a Comment

User Comments (0)

About PowerShow.com

Data queuing in SAMGrid - PowerPoint PPT Presentation

Data queuing in SAMGrid

D0 is trying to 'fit' into resources scarcely available at participating ... came up with this idea battling inefficiencies in reconstruction and merging at ... – PowerPoint PPT presentation