Lightweight Replication of Heavyweight Data - PowerPoint PPT Presentation

About This Presentation
Title:

Lightweight Replication of Heavyweight Data

Description:

Replicate 200 GB/day of data to multiple sites securely, efficiently, robustly (no babysitting... database backend scales to 107 and fast updates/reads ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 12
Provided by: scottk80
Category:

less

Transcript and Presenter's Notes

Title: Lightweight Replication of Heavyweight Data


1
Lightweight Replication ofHeavyweight Data
  • Scott Koranda
  • University of Wisconsin-Milwaukee
  • National Center for Supercomputing Applications

2
Heavyweight Data from LIGO
  • Sites at Livingston, LA (LLO) and Hanford, WA
    (LHO)
  • 2 interferometers at LHO, 1 at LLO
  • 1000s of channels recorded at rates of 16 KHz,
    16 Hz, 1 Hz,
  • Output is binary frame files holding 16 seconds
    data with GPS timestamp
  • 100 MB from LHO
  • 50 MB from LLO
  • 1 TB/day in total
  • S1 run 2 weeks
  • S2 run 8 weeks

4 km LIGO interferometer at Livingston, LA
3
Networking to IFOs Limited
  • LIGO IFOs remote, making bandwidth expensive
  • Couple of T1 lines for email/administration only
  • Ship tapes to Caltech (SAM-QFS)
  • Reduced data sets (RDS) generated and stored on
    disk
  • 20 size of raw data
  • 200 GB/day

GridFedEx protocol
4
Replication to University Sites
Cardiff
MIT
AEI
UWM
PSU
CIT
UTB
5
Why Bulk Replication to University Sites?
  • Each has compute resources (Linux clusters)
  • Early plan was to provide one or two analysis
    centers
  • Now everyone has a cluster
  • Cheap storage is cheap
  • 1/GB for drives
  • TB RAID-5 lt 10K
  • Throw more drives into your cluster
  • Analysis applications read a lot of data
  • Different ways to slice some problems, but most
    want access to large sets of data for a
    particular instance of search parameters

6
LIGO Data Replication Challenge
  • Replicate 200 GB/day of data to multiple sites
    securely, efficiently, robustly (no babysitting)
  • Support a number of storage models at sites
  • CIT ? SAM-QFS (tape) and large IDE farms
  • UWM ? 600 partitions on 300 cluster nodes
  • PSU ? multiple 1 TB RAID-5 servers
  • AEI ? 150 partitions on 150 nodes with redundancy
  • Coherent mechanism for data discovery by users
    and their codes
  • Know what data we have, where it is, and
    replicate it fast and easy

7
Prototyping Realizations
  • Need to keep pipe full to achieve desired
    transfer rates
  • Mindful of overhead of setting up connections
  • Set up GridFTP connection with multiple channels,
    tuned TCP windows and I/O buffers and leave it
    open
  • Sustained 10 MB/s between Caltech and UWM, peaks
    up to 21 MB/s
  • Need cataloging that scales and performs
  • Globus Replica Catalog (LDAP) lt 105 and not
    acceptable
  • Need solution with relational database backend
    scales to 107 and fast updates/reads
  • No need for reliable file transfer (RFT)
  • Problem with any single transfer? Forget it, come
    back later
  • Need robust mechanism for selecting collections
    of files
  • Users/sites demand flexibility choosing what data
    to replicate
  • Need to get network people interested
  • Do your homework, then challenge them to make
    your data flow faster

8
LIGO, err Lightweight Data Replicator (LDR)
  • What data we have
  • Globus Metadata Catalog Service (MCS)
  • Where data is
  • Globus Replica Location Service (RLS)
  • Replicate it fast
  • Globus GridFTP protocol
  • What client to use? Right now we use our own
  • Replicate it easy
  • Logic we added
  • Is there a better solution?

9
Lightweight Data Replicator
  • Replicated 20 TB to UWM thus far
  • Just deployed at MIT, PSU, AEI
  • Deployment in progress at Cardiff
  • LDRdataFindServer running at UWM

10
Lightweight Data Replicator
  • Lightweight because we think it is the minimal
    collection of code needed to get the job done
  • Logic coded in Python
  • Use SWIG to wrap Globus RLS
  • Use pyGlobus from LBL elsewhere
  • Each site is any combination of publisher,
    provider, subscriber
  • Publisher populates metadata catalog
  • Provider populates location catalog (RLS)
  • Subscriber replicates data using information
    provided by publishers and providers
  • Take Condor approach with small, independent
    daemons that each do one thing
  • LDRMaster, LDRMetadata, LDRSchedule, LDRTransfer,

11
Future?
  • LDR is a tool that works now for LIGO
  • Still, we recognize a number of projects need
    bulk data replication
  • There has to be common ground
  • What middleware can be developed and shared?
  • We are looking for opportunities
  • Code for solve our problems for us
  • Want to investigate Stork, DiskRouter, ?
  • Do contact me if you do bulk data replication
Write a Comment
User Comments (0)
About PowerShow.com