Positioning Dynamic Storage Caches for Transient Data - PowerPoint PPT Presentation

About This Presentation
Title:

Positioning Dynamic Storage Caches for Transient Data

Description:

Positioning Dynamic Storage Caches for Transient Data Sudharshan Vazhkudai Oak Ridge National Lab Douglas Thain University of Notre Dame Xiaosong Ma North Carolina ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 23
Provided by: Dougla174
Learn more at: https://www3.nd.edu
Category:

less

Transcript and Presenter's Notes

Title: Positioning Dynamic Storage Caches for Transient Data


1
PositioningDynamic Storage Cachesfor Transient
Data
  • Sudharshan Vazhkudai Oak Ridge National Lab
  • Douglas Thain University of Notre Dame
  • Xiaosong Ma North Carolina State Univ.
  • Vince Freeh North Carolina State Univ.

High Performance I/O Workshop at IEEE Cluster
Computing 2006
2
Problem Space
  • Data Deluge
  • Experimental facilities SNS, LHC (PBs/yr)
  • Observatories sky surveys, world-wide telescopes
  • Simulations from NLCF end-stations
  • Internet archives NIH GenBank (serves 100
    gigabases of sequence data)
  • Typical user access traits on large scientific
    data
  • Download remote datasets using favorite tools
  • FTP, GridFTP, hsi, wget
  • Shared interest among groups of researchers
  • A Bioinformatics group collectively analyze and
    visualize a sequence database for a few days
    Locality of interest!
  • Often times, discard original datasets after
    interest dissipates

3
Existing Storage Models
  • Local Disk
  • High bandwidth local access to small data.
  • Distributed File Systems and NAS
  • Medium bandwidth for dist/shared data.
  • Mass Storage ()
  • High latency access for disaster recovery.
  • Parallel Storage ()
  • High bandwidth shared access to large data with
    high reliability and fault tolerance.

4
Whats Missing?
Computing Cluster
Computing Cluster
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Fat Pipe
Fat Pipe
Parallel Storage
Mass Storage
5
Needed Transient Storage
  • High bandwidth
  • Needs to be keep up with network and archive.
  • Also needs to keep up with aggressive apps.
    (viz?)
  • Some management control.
  • Capacity, bandwidth, locality are all limited.
  • Need some controls in order to guarantee QoS.
  • Understandable latency.
  • Keep user informed about stage-in latency.
  • Once staged, should have consistent latency.
  • Low cost.
  • Old idea Lots of commodity disks.
  • Can we scavenge space from existing systems?
  • Reliability useful, but not crucial.

6
Transient Storage Use Cases
  • Checkpointing Large Computations
  • Dont need to keep all forever!
  • Impedance Matching for Large Outputs
  • Evacuate CPUs, then trickle data to archive.
  • Caching Large Inputs
  • Share same data among many local users.
  • Out of Core Datasets
  • Large temporary array split across caches.

7
A Real Example Grid3 (OSG)
  • Robert Gardner, et al. (102 authors)
  • The Grid3 Production Grid
  • Principles and Practice
  • IEEE HPDC 2004
  • The Grid2003 Project has deployed a multi-virtual
    organization, application-driven grid laboratory
    that has sustained for several months the
    production-level services required by
  • ATLAS, CMS, SDSS, LIGO

8
Grid2003 The Details
  • The good news
  • 27 sites with 2800 CPUs
  • 40985 CPU-days provided over 6 months
  • 10 applications with 1300 simultaneous jobs
  • The bad news on ATLAS jobs
  • 40-70 percent utilization
  • 30 percent of jobs would fail.
  • 90 percent of failures were site problems
  • Most site failures were due to disk space!

9
Two Transient Storage Projects
  • Freeloader
  • Oak Ridge Natl Lab and North Carolina State U
  • Scavenge unused desktop storage.
  • Provide a large cache for archival backends.
  • Modify scientific apps slightly for direct
    access.
  • Tactical Storage
  • University of Notre Dame
  • Use comp. cluster storage as flexible substrate.
  • Configure subsets for distinct needs.
  • Filesystem interfaces for existing apps.

10
Desktop Storage Scavenging?
  • FreeLoader
  • Imagine Condor for storage
  • Harness the collective storage potential of
    desktop workstations Harnessing idle CPU cycles
  • Increased throughput due to striping
  • Split large datasets into pieces, Morsels, and
    stripe them across desktops
  • Scientific data trends
  • Usually write-once-read-many
  • Remote copy held elsewhere
  • Primarily sequential accesses
  • Data trends LAN-Desktop Traits user access
    patterns make collaborative caches using storage
    scavenging a viable alternative!

11
Properties of Desktop Machines
  • Desktop Capabilities better than ever before
  • Space usage to Available storage ratio is
    significantly low in academic and industry
    settings
  • Increasing numbers of workstations online most of
    the time
  • At ORNL-CSMD, 600 machines are estimated to be
    online at any given time
  • At NCSU, gt 90 availability of 500 machines
  • Well-connected, secure LAN settings
  • A high-speed LAN connection can stream data
    faster than local disk I/O

12
FreeLoader Environment
13
FreeLoader Architecture
  • Lightweight UDP
  • Scavenger device metadata bitmaps, morsel
    organization
  • Morsel service layer
  • Monitoring and Impact control
  • Global free space management
  • Metadata management
  • Soft-state registrations
  • Data placement
  • Cache management
  • Profiling

14
Comparing FreeLoader with other storage systems
15
Tactical Storage Systems (TSS)
  • A TSS allows any node to serve as a file server
    or as a file system client.
  • All components can be deployed without special
    privileges but with security.
  • Users can build up complex structures.
  • Filesystems, databases, caches, ...
  • Admins need not know/care about larger
    structures.
  • Two Independent Concepts
  • Resources The raw storage to be used.
  • Abstractions The organization of storage.

16
App
Parrot
???
file system
file system
file system
file system
file system
file system
file system
17
ApplicationsHigh BW Access to Astrophys Data
tcsh, cp, vi, emacs, fortran...
Disk
Disk
Disk
CPU
CPU
CPU
Adapter
Disk
Disk
Disk
GBs/ Day
CPU
CPU
CPU
10 TB Logical Volume
Scratch Disk
Disk
Disk
Disk
CPU
CPU
CPU
GBs / Day
Disk
Disk
Disk
GBs/ Day
CPU
CPU
CPU
General Purpose Computing Cluster
Tape Archive
18
ApplicationsHigh BW Access to Biometric Data
Job
NFS I/O
Gb Ethernet
Job
Storage Archive
NFS I/O
Disk
Disk
Disk
Job
Job
NFS I/O
Job
19
ApplicationsHigh BW Access to Biometric Data
Disk
Disk
Disk
Disk
Disk
CPU
CPU
CPU
Gb Ethernet
Disk
Disk
Disk
Disk
CPU
CPU
CPU
Storage Archive
Controlled Replication
Disk
Disk
Disk
Disk
Disk
Disk
CPU
CPU
CPU
Disk
Disk
Disk
Disk
CPU
CPU
CPU
General Purpose Computing Cluster
20
(No Transcript)
21
Open Problems
  • Combining Technologies
  • A filesystem interface for Freeloader.
  • Making TSS harness FL benefactors.
  • Seamless Data Migration
  • Not easy to move between parallel systems!
  • Can transient storage match impedance?
  • Performance Adaptation
  • Many axes BW, Latency, Locality, Mgmt.
  • Can we have a system that allows for a more
    continuous tradeoff or reconfiguration?

22
Take-Home Message
Big, fast storage archives are important,
but... Making transient storage usable,
accessible, and high performance is critical to
improving the end-user experience.
Write a Comment
User Comments (0)
About PowerShow.com