OceanStore: An Infrastructure for Global-Scale Persistent Storage - PowerPoint PPT Presentation

About This Presentation
Title:

OceanStore: An Infrastructure for Global-Scale Persistent Storage

Description:

OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 27
Provided by: homepageC4
Category:

less

Transcript and Presenter's Notes

Title: OceanStore: An Infrastructure for Global-Scale Persistent Storage


1
OceanStore An Infrastructure for Global-Scale
Persistent Storage
  • John Kubiatowicz, David Bindel, Yan Chen, Steven
    Czerwinski, Patrick Eaton, Dennis Geels,
    Ramakrishna Gummadi, Sean Rhea, Hakim
    Weatherspoon, Westley Weimer, Chris Wells, Ben
    Zhao

A few slides have been borrowed from the authors
presentations
2
Vision
  • What is Oceanstore?
  • a utility infrastructure to span the globe and
    provide continuous access to persistent
    information

Source Berkeley OceanStore Website
3
Vision
  • What is Oceanstore?
  • a utility infrastructure to span the globe and
    provide continuous access to persistent
    information
  • data
  • all kinds of information
  • desktop, laptop, palmtop
  • cars, cellular phones, other devices
  • futuristic embedded in environment

4
Vision
  • What is Oceanstore?
  • a utility infrastructure to span the globe and
    provide continuous access to persistent
    information
  • persistence
  • devices can be rebooted, lost, replaced
  • reliable, durable data (deep archival will last
    forever)
  • Automatic maintenance

5
Vision
  • What is Oceanstore?
  • a utility infrastructure to span the globe and
    provide continuous access to persistent
    information
  • connectivity
  • even to tiniest devices, possibly intermittent
  • variable bandwidth, latency
  • availability
  • uniform access, comparable to LAN-based networked
    storage
  • fault-tolerant, DoS-tolerant

6
Vision
  • what is oceanstore?
  • a utility infrastructure to span the globe and
    provide continuous access to persistent
    information
  • scale
  • geographically distributed
  • 1010 users
  • 1014 files / objects

7
Questions about information
  • Where is persistent information stored?
  • 20th-century tie between location and content
    outdated
  • In world-scale system, locality is key
  • How is it protected?
  • Can disgruntled employee of ISP sell your
    secrets?
  • Cant trust anyone (how paranoid are you?)
  • Can we make it indestructible?
  • Want our data to survive the big one!
  • Highly resistant to hackers (denial of service)
  • Wide-scale disaster recovery
  • Is it hard to manage?
  • Worst failures are human-related
  • Want automatic (introspective) diagnosis and
    repair

8
First ObservationWant Utility Infrastructure
  • Mark Weiser from Xerox Transparent computing is
    the ultimate goal. Computers should disappear
    into the background
  • In the context of storage
  • Dont want to worry about backup
  • Dont want to worry about obsolescence
  • Need lots of resources to make data secure and
    highly available, BUT dont want to own them
  • Outsourcing of storage already becoming popular
  • Pay monthly fee and your data is out there

9
Utility-based Infrastructure
Canadian OceanStore
Sprint
ATT
IBM
Pac Bell
IBM
  • Service provided by confederation of companies
  • Monthly fee paid to one service provider
  • Companies buy and sell capacity from each other

10
Target applications
  • Email
  • Group calendar, contacts
  • Distributed design tools
  • Computer Supported Cooperative Work
  • Digital libraries
  • Distributed/shared repositories

11
Assumptions
  • Untrusted infrastructure
  • A small number of servers may crash or leak
    information
  • most of the servers functioning correctly
  • financially responsible party of servers ensure
    integrity
  • but only clients trusted with cleartext
  • Nomadic data
  • data divorced from location
  • flows freely within the storage infrastructure
  • promiscuous caching anywhere, anytime
  • location important for performance
  • dynamic system tuning through introspection

12
System overview
  • persistent object
  • GUID 160-bit SHA-1 hash
  • secure identification globally unique and
    unforgeable
  • 280 unique objects before collisions (birthday
    paradox)
  • floating object replicas independent of location
  • encrypted data
  • read
  • try fast probabilistic replica search (Bloom
    filter)
  • fallback to slower deterministic search
    (Tapestry)
  • write
  • update with predicates as in Bayou what is
    Bayou?
  • creates new version

13
What is Bayou
  • The Bayou System (Xerox PARC) is a platform of
    replicated, highly-available, variable-consistency
    , databases on which collaborative applications
    can be built. It caters to portable devices
    having intermittent connections.

14
System overview
  • application interface
  • sessions sequence of read/writes
  • session guarantees Bayou
  • loose consistency levels, ACID
  • active and archival forms
  • active latest version, with update handle
  • archive erasure coded read-only version
  • dynamic optimization
  • object location
  • degree of replication

15
Tentative UpdatesEpidemic Dissemination
16
Committed UpdatesMulticast Dissemination
17
naming
  • self-certifying path names (Mazières)
  • object GUID hash of owner key and readable name
  • create hierarchies using directory objects
  • read restriction
  • through client encryption of data
  • write restriction, access control
  • associate ACL lists with object, respected by
    servers

18
addressing
  • address an object by its GUID
  • message GUID, random number, small predicate
  • route to closest GUID replica matching predicate
  • combines data location and routing
  • no central name service to attack
  • save one round-trip for location discovery
  • routing
  • fast, probabilistic search algorithm
  • slow, deterministic search algorithm

19
routing
  • fast, probabilistic search algorithm
  • Bloom filter
  • probabilistic set membership test using bit
    vector
  • n-bit vector generated from n hashes of each set
    element
  • filter is union (OR) of all bit vectors
  • attenuated Bloom filter
  • array of d Bloom filters
  • i th Bloom filter is union of all lti -hop nodes
  • slow, deterministic algorithm
  • Tapestry

20
addressing and routing
deterministic
probabilistic
21
Attenuated Bloom Filter
22
updates
  • Updates based on versioning and conflict
    resolution
  • i.e. no locking
  • update actions with predicates
  • commit apply action of first true predicate
  • abort no true predicates
  • conflict resolution on encrypted data
  • possible predicates
  • compare-version, compare-size, compare-block,
    search
  • possible actions
  • replace-block, insert-block, delete-block, append

23
archival
  • produced when objects idle
  • use erasure codes (redundant fragmentation)
  • simplest example parity bit
  • need any (n-1) out of n fragments
  • interleaved Reed-Solomon codes, Tornado codes
  • fragmentation improves reliability
  • deep archival storage
  • sweeper processes ensure replication sustained
    over time
  • fragmentation improves performance

24
Erasure Codes
Simple parity bits, or generalized Reed-Solomon
codes can be used to implement it.
25
Floating Replica and Deep Archival Coding
26
dynamic optimization (introspection)
  • observation modules
  • collect and summarize information
  • incrementally update system database
  • optimization modules
  • periodically process the observation database
  • cluster recognition group related objects
  • replica management maintain replica number and
    location
  • periodic migration work-home-work-home
  • maintenance routing, dissemination,
    availability, durability
Write a Comment
User Comments (0)
About PowerShow.com