Lightweight Data Replicator - PowerPoint PPT Presentation

About This Presentation
Title:

Lightweight Data Replicator

Description:

Sites at Livingston, LA (LLO) and Hanford, WA (LHO) 2 ... Still want to investigate Stork, DiskRouter, ? Do contact me if you do bulk data replication... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 13
Provided by: scottk155
Learn more at: http://www.phys.ufl.edu
Category:

less

Transcript and Presenter's Notes

Title: Lightweight Data Replicator


1
Lightweight Data Replicator
  • Scott Koranda
  • University of Wisconsin-Milwaukee
  • National Center for Supercomputing Applications
  • Brian Moe
  • University of Wisconsin-Milwaukee

2
LIGO data replication needs
  • Sites at Livingston, LA (LLO) and Hanford, WA
    (LHO)
  • 2 interferometers at LHO, 1 at LLO
  • 1000s of channels recorded at rates of 16 KHz,
    16 Hz, 1 Hz,
  • Output is binary frame files holding 16 seconds
    data with GPS timestamp
  • 100 MB from LHO
  • 50 MB from LLO
  • 1 TB/day in total
  • S1 run 2 weeks
  • S2 run 8 weeks

3
Networking to IFOs Limited
  • LIGO IFOs remote, making bandwidth expensive
  • Couple of T1 lines for email/administration only
  • Ship tapes to Caltech (SAM-QFS)
  • Reduced data sets (RDS) generated and stored on
    disk
  • 20 size of raw data
  • 200 GB/day

GridFedEx protocol
Bandwidth to LHO increases dramatically for S3!
Newsflash!
4
Replication to University Sites
Cardiff
MIT
LHO
AEI
UWM
PSU
CIT
UTB
5
Why Bulk Replication to University Sites?
  • Each has compute resources (Linux clusters)
  • Early plan was to provide one or two analysis
    centers
  • Now everyone has a cluster
  • Cheap storage is cheap
  • 1/GB for drives
  • TB RAID-5 lt 10K
  • Throw more drives into your cluster
  • Analysis applications read a lot of data
  • Different ways to slice some problems, but most
    want access to large sets of data for a
    particular instance of search parameters

6
LIGO Data Replication Challenge
  • Replicate 200 GB/day of data to multiple sites
    securely, efficiently, robustly (no babysitting)
  • Support a number of storage models at sites
  • CIT ? SAM-QFS (tape) and large IDE farms
  • UWM ? 600 partitions on 300 cluster nodes
  • PSU ? multiple 1 TB RAID-5 servers
  • AEI ? 150 partitions on 150 nodes with redundancy
  • Coherent mechanism for data discovery by users
    and their codes
  • Know what data we have, where it is, and
    replicate it fast and easy

7
Prototyping Realizations
  • Need to keep pipe full to achieve desired
    transfer rates
  • Mindful of overhead of setting up connections
  • Set up GridFTP connection with multiple channels,
    tuned TCP windows and I/O buffers and leave it
    open
  • Sustained 10 MB/s between Caltech and UWM, peaks
    up to 21 MB/s
  • Need cataloging that scales and performs
  • Globus Replica Catalog (LDAP) lt 105 and not
    acceptable
  • Need solution with relational database backend
    scales to 107 and fast updates/reads
  • Not necessarily need reliable file transfer
    (RFT)
  • Problem with any single transfer? Forget it, come
    back later
  • Need robust mechanism for selecting collections
    of files
  • Users/sites demand flexibility choosing what data
    to replicate

8
LIGO, err Lightweight Data Replicator (LDR)
  • What data we have
  • Globus Metadata Catalog Service (MCS)
  • Where data is
  • Globus Replica Location Service (RLS)
  • Replicate it fast
  • Globus GridFTP protocol
  • What client to use? Right now we use our own
  • Replicate it easy
  • Logic we added
  • Is there a better solution?

9
Lightweight Data Replicator
  • Replicated gt 20 TB to UWM thus far
  • Less to MIT, PSU
  • Just deployed version 0.5.5 to MIT, PSU, AEI,
    CIT, UWM, LHO, LLO for LIGO/GEO S3 run
  • Deployment in progress at Cardiff
  • LDRdataFindServer running at UWM for S2, soon at
    all sites for S3

10
Lightweight Data Replicator
  • Lightweight because we think it is the minimal
    collection of code needed to get the job done
  • Logic coded in Python
  • Use SWIG to wrap Globus RLS
  • Use pyGlobus from LBL elsewhere
  • Each site is any combination of publisher,
    provider, subscriber
  • Publisher populates metadata catalog
  • Provider populates location catalog (RLS)
  • Subscriber replicates data using information
    provided by publishers and providers
  • small, independent daemons that each do one thing
  • LDRMaster, LDRMetadata, LDRSchedule, LDRTransfer,

11
Future?
  • Held LDR face-to-face at UWM last summer
  • CIT, MIT, PSU, UWM, AEI, Cardiff all represented
  • LDR Needs
  • Better/easier installation, configuration
  • Dashboard for admins for insights into LDR
    state
  • More robustness, especially with RLS server hangs
  • Fixed with version 2.0.9
  • API and templates for publishing

12
Future?
  • LDR is a tool that works now for LIGO
  • Still, we recognize a number of projects need
    bulk data replication
  • There has to be common ground
  • What middleware can be developed and shared?
  • We are looking for opportunities
  • Code for solve our problems for us
  • Still want to investigate Stork, DiskRouter, ?
  • Do contact me if you do bulk data replication
Write a Comment
User Comments (0)
About PowerShow.com