Shelter from the Storm - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Shelter from the Storm

Description:

Shelter from the Storm. Building a Safe Archive in a Hostile World. SCOOP Goal ... Want this to be software of a good quality, to be robust ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 16
Provided by: jennif231
Category:
Tags: shelter | storm

less

Transcript and Presenter's Notes

Title: Shelter from the Storm


1
Shelter from the Storm
  • Building a Safe Archive in a Hostile World

2
SCOOP Goal
  • SURA-funded Coastal Modeling Project
  • Want to develop the communitys cutting-edge
    techniques to make them ready for use in
    tomorrows production systems.
  • For example, automatic verification of
    storm/surge models against observed data, to help
    improve the models

3
CCT Goals
  • One of CCTs key research outputs is software
  • Want this to be software of a good quality, to be
    robust
  • Want re-use of software across projects
  • Also want software to be picked up by external
    users, as well as collaborators

4
The SCOOP Archive
  • Need to archive lots of files
  • Atmospheric models (MM5, GFDL)
  • Hydrodynamic models (ADCIRC, SWAN, etc)
  • Observational data (sensor data, buoys)
  • Requirements poorly defined
  • How much data? Dont know
  • How long should we keep it for? Dont know
  • Have to interface with bespoke data transport
    mechanisms (LDM)
  • How to achieve our goals under these conditions?!

5
Basic Archive Operation
  • Upload
  • Client signals they want to do an upload of some
    files (names are given)
  • Archive tells the client where to upload them to
    (transaction handles)
  • Client uploads files (indep. of archive)
  • Client tells archive its done
  • Archive creates the logical filenames
  • Use upload tool for this

6
Basic Archive Operation
  • Download
  • Clients use the catalog service to
    discover/search for logical filenames
  • Clients talk to the RLS server to get physical
    URLs
  • Interact with physical URLs directly
  • Can use getdata CLI tool to encapsulate this
  • Also, there are portal pages...

7
Operations on Service
  • fileUploadBegin - for starting an upload
  • fileUploadEnd - for saying that an upload is
    completed
  • logicalNameRetry
  • removeDeadTransactions
  • closeArchive

8
Distributed Software
  • Some services hosted externally
  • Cant assume our machine or s/w never fails
  • Need to retain state of our service on restart

9
Robust Code
  • Dont assume our service will remain up
  • gt Keep all internal state in a database
  • gt Reload internal state on a restart
  • Dont assume external services always up
  • gt Design loosely coupled services
  • gt Store pending interactions in the database
  • gt Retry these periodically
  • Do stress testing on the service during the
    testing/debug cycle

10
Keep the internalAPIs Simple
  • int logname_initialize(void)
  • void logname_remove(void)
  • bool logname_create_logfile
  • (stdstring
    logical_name,
  • bool
    name_is_final,
  • const
    stdvectorltstdstringgt urls)
  • bool logname_delete_logfile(stdstring
    logical_name)
  • ulong logname_upload_pending_lognames
  • (ulong max_rows,
  • ulong total_found,
  • ulong max_rows_used)

11
Encouraging Reuse
  • SCOOP Archive has lots of strange rules about
    filenames and metadata
  • During design and implementation, keep thinking
  • Is this for the SCOOP project, or
  • Is this a generic feature
  • Use good O-O design to keep SCOOP code separate
    from archive code

12
Keeping SCOOPto one side...
  • class ArchiveFilingLogic
  • public
  • // Called by the default moveFiles
    implementation
  • virtual bool createPhysicalPath(stdstrin
    g physicalPath)
  • virtual bool moveFiles(stdvectorltstdst
    ringgt fileNames,stdvectorltstdstringgt
    missingFiles,stdstring stagePath,stdstring
    physicalPath)
  • virtual void physicalLocationForFiles
  • (const
    stdvectorltstdstringgt filenames,
  • stdmapltstdstring,st
    dstringgt directories,
  • stdmapltstdstring,st
    dstringgt errors)0
  • virtual stdvectorltstdstringgt
    logicalNamesForFiles(const stdvectorltstdstring
    gt filenames,stdstring physicalPath)0

13
New Requirements
  • Handling common compression formats
  • Producing subsets of data (predictively)
  • Tracking data before it is ingested
  • Notifying people when data arrives
  • Transforming data to other formats
  • Generating analytical data on the fly
  • Federating data across multiple locations
  • Good initial design will simplify all this...

14
Highest Priority...
  • Archive machine running out of space
  • People have started to rely on the service
  • So, currently we are uploading copies of all data
    to SDSC DataCenter, using SRB
  • Now need to keep track of URLs on physically
    distributed resources
  • But SRB can help with some of the other
    requirements...

15
Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com