Data queuing in SAMGrid - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Data queuing in SAMGrid

Description:

D0 is trying to 'fit' into resources scarcely available at participating ... came up with this idea battling inefficiencies in reconstruction and merging at ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 22
Provided by: andrewba6
Category:

less

Transcript and Presenter's Notes

Title: Data queuing in SAMGrid


1
Data queuing in SAMGrid
  • A. Baranovski

2
  • D0 is relatively small experiment with
    substantial computing requirements
  • In particular, wrt. Data movement and data
    access.
  • D0 is trying to fit into resources scarcely
    available at participating institutions.

3
  • Critical mission of the D0 aware middleware
    deployment (SAMGRid )is to enable access to data
  • And do it efficiently .
  • Efficiency in organizing access to data is the
    topic of these slides

4
  • SAMGrid supports several types of computing
    workflow (branches)
  • Reprocessing
  • Montecarlo generation
  • Merging

5
  • Each workflow branch consists of several steps
    that may either download or uploading set of
    files .
  • Each step refers to different hardware resources
    and different types of data
  • Download runtime environment( from disk)
  • Download montecarlo min bias/phase datasets(from
    disk)
  • Download raw data (from tape)
  • Upload reprocessed/merged/montecarlo data (to
    disk/tape)

6
  • The variety suggests several use case scenarios
  • optimization of use of the storage resources
  • prioritization of the operational tasks

7
Efficiency Use case 1
  • Different steps within the same branch have
    different priority
  • The further you move down the chain of steps the
    greater the impact of failure or delay is going
    to be.
  • Each step consumes resources and we dont want to
    waste them
  • Low efficiency waste of resources

8
Application Use case 2
  • Branches themselves have different operational
    priority
  • Merging takes over data generation
  • Data reconstruction takes over montecarlo

9
Efficiency Use case 3
  • Non homogeneous networks
  • TCP connection properties vary depending on
    destination/source of the data
  • Data requests that group around similar
    source/destination should be also grouped with
    respect to their data access priority.

10
  • A no brainier solution is to mix all data access
    requests into one queue.
  • A queue that would limit resource use (prevent
    overloading) by dispatching requests on first
    come first serve basis.
  • A resource can only serve this many requests at a
    time - queue
  • Offers No prioritization , no distinction among
    tasks or network endpoints.

11
The bad
  • Time to access data is averaged for all processes
    in the system.
  • No way to push certain tasks ahead of others.
  • Interdependence between steps
  • High priority tasks that require little data will
    have to wait for lower priority tasks that
    require more data.

12
Worse
  • Increased granularity in number of processes that
    start and end at the same time. Reduction of the
    pipeline effect.
  • The pipe line (conveyer by H. Ford)
  • Prepare prerequisite resources (data) before
    acquiring critical section (CPU)
  • The more jobs that start at the same time - the
    more CPU resources are unnecessary wasted because
    of inability to properly prepare (stage to the
    worker) data.
  • Ideally 1 job starts to replace another that
    exists.

13
The worst
  • Effectively all data resources (grid/non grid
    enabled) work in the above modes.
  • i.e. assuming one size fits all

14
The fix
  • More flexible way of organizing the data flow
  • Define independent data access queues per each
    branch , sub task and TCP endpoint to fully
    control the priority and data resource
    consumption at any time.
  • came up with this idea battling inefficiencies in
    reconstruction and merging at Lyon (2 years back)
  • Explores similarities between data and job
    scheduling policies (been around for some time).

15
Example
  • D0ReprocessingStorageDeploymentMap.ppt

16
Our implementation
  • Processes (jobs) use fcp to reserve a place in
    the desired data queue.
  • Fcp is a part of the large package samcp
  • Fcp is the scheduler of actual data transfer
    requests.
  • Data transfers are done via protocol of a choice
    (gridftp in our case)
  • client waits for its turn as signaled by fcpd
    daemon running on the machine that the client
    want to access data.

17
  • Each fcpd instance effectively defines an
    independent queue that has name and size.
  • Size share of the resource this queue is
    allowed to get hold of.
  • We set configuration end environment of the
    client such that particular fcpd control
    processed is picked.
  • Name of the queue is the config. qualifier

18
Summary
  • Each workflow branch/step/TCP endpoint
    combination can now resolve into a unique fcpd
    queue name
  • Large number of queues !

19
Main issues
  • Consistency of configuration that parameterizes
    the above function is very important aspect that
    needs to be paid attention.
  • can be hard to maintain or respond to changes in
    the deployment.
  • Fcp runs on data nodes that need to be in sync
    with central(forwarding) node configuration
  • Static configuration of queues may leave
    resources underutilized

20
Issues summary
  • Weve managed to stabilize data configurations
    for OSG and LCG (took some time)
  • Usual tradeoff when using old technology (fcp) in
    the new way.
  • There is an opportunity to develop an abstract
    solution by further generalizing our experience
  • Look into condor, FTS , etc

21
  • Detailed reading
  • http//www-d0/computing/grid/doc/Application-Resou
    rceTuning-01Aug05-cut.pdf
Write a Comment
User Comments (0)
About PowerShow.com