OceanStore: In Search of GlobalScale, Persistent Storage - PowerPoint PPT Presentation

About This Presentation
Title:

OceanStore: In Search of GlobalScale, Persistent Storage

Description:

Rapid growth of bandwidth in the interior of the net. Broadband to the ... Cross-administrative domain. Metric: MOLE OF BYTES (6 1023) OceanStore:4. FDIS 2002 ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 42
Provided by: johnk201
Learn more at: http://www.fdis.org
Category:

less

Transcript and Presenter's Notes

Title: OceanStore: In Search of GlobalScale, Persistent Storage


1
OceanStoreIn Search of Global-Scale,
Persistent Storage
  • John Kubiatowicz
  • UC Berkeley

2
OceanStore Context Ubiquitous Computing
  • Computing everywhere
  • Desktop, Laptop, Palmtop
  • Cars, Cellphones
  • Shoes? Clothing? Walls?
  • Connectivity everywhere
  • Rapid growth of bandwidth in the interior of the
    net
  • Broadband to the home and office
  • Wireless technologies such as CMDA, Satelite,
    laser

3
Utility-based Infrastructure?
  • Data service provided by federation of companies
  • Cross-administrative domain
  • Metric MOLE OF BYTES (6?1023)

4
OceanStore Assumptions
  • Untrusted Infrastructure
  • The OceanStore is comprised of untrusted
    components
  • Only ciphertext within the infrastructure
  • Responsible Party
  • Some organization (i.e. service provider)
    guarantees that your data is consistent and
    durable
  • Not trusted with content of data, merely its
    integrity
  • Mostly Well-Connected
  • Data producers and consumers are connected to a
    high-bandwidth network most of the time
  • Exploit multicast for quicker consistency when
    possible
  • Promiscuous Caching
  • Data may be cached anywhere, anytime

5
Key ObservationWant Automatic Maintenance
  • Cant possibly manage billions of servers by
    hand!
  • System should automatically
  • Adapt to failure
  • Repair itself
  • Incorporate new elements
  • Introspective Computing/Autonomic Computing
  • Can data be accessible for 1000 years?
  • New servers added from time to time
  • Old servers removed from time to time
  • Everything just works

6
Outline
  • Motivation
  • Assumptions of the OceanStore
  • Specific Technologies and approaches
  • Routing and Data Location
  • Naming
  • Conflict resolution on encrypted data
  • Replication and Deep archival storage
  • Introspection for optimization and repair
  • Conclusion

7
Basic StructureIrregular Mesh of Pools
8
Bringing Order to this Chaos
  • How do you find information?
  • Must be scalable and provide maximum flexibility
  • How do you name information?
  • Must provide global uniqueness
  • How do you ensure consistency?
  • Must scale and handle intermittent connectivity
  • Must prevent unauthorized update of information
  • How do you protect information?
  • Must preserve privacy
  • Must provide deep archival storage (continuous
    repair)
  • How do go tune performance?
  • Locality very important
  • Throughout all of this how do you maintain it???

9
Location and Routing
10
Locality, Locality, LocalityOne of the defining
principles
  • The ability to exploit local resources over
    remote ones whenever possible
  • -Centric approach
  • Client-centric, server-centric, data
    source-centric
  • Requirements
  • Find data quickly, wherever it might reside
  • Locate nearby object without global communication
  • Permit rapid object migration
  • Verifiable cant be sidetracked
  • Locality yields Performance, Availability,
    Reliability

11
Enabling Technology DOLR(Decentralized Object
Location and Routing)
Tapestry
12
Stability under Changes
  • Unstable, unreliable, untrusted nodes are the
    common case!
  • Network never fully stabilizes
  • What is half-life of a routing node?
  • Must provide stable routing in these
    circumstances
  • Redundancy and adaptation fundamental
  • Make use of alternative paths when possible
  • Incrementally remove faulty nodes
  • Route around network faults
  • Continuously tune neighbor links

13
The Tapestry DOLR
  • Routing to Objects, not Locations!
  • Replacement for IP?
  • Very powerful abstraction
  • Built as overlay network, but not fundamental
  • Randomized prefix routing distributed object
    location index
  • Routing nodes have links to nearby neighbors
  • Additional state tracks objects
  • Massive parallel insert (SPAA 2002)
  • Construction of nearest-neighbor mesh links
  • Log2 n message complexity for new node
  • New nodes integrated, faulty ones removed
  • Objects kept available during this process

14
OceanStore Naming
15
Model of Data
  • Ubiquitous object access from anywhere
  • Undifferentiated Bag of Bits
  • Versioned Objects
  • Every update generates a new version
  • Can always go back in time (Time Travel)
  • Each Version is Read-Only
  • Can have permanent name (SHA-1 Hash)
  • Much easier to repair
  • An Object is a signed mapping between permanent
    name and latest version
  • Write access control/integrity involves managing
    these mappings

16
Secure Hashing
  • Read-only data GUID is hash over actual
    information
  • Uniqueness and Unforgeability the data is what
    it is!
  • Verification check hash over data
  • Changeable data GUID is combined hash over a
    human-readable name public key
  • Uniqueness GUID space selected by public key
  • Unforgeability public key is indelibly bound to
    GUID
  • Verification check signatures with public key

17
Secure Naming
  • Naming hierarchy
  • Users map from names to GUIDs via hierarchy of
    OceanStore objects (ala SDSI)
  • Requires set of root keys to be acquired by user

18
The Write Path
19
The Path of an OceanStore Update
20
OceanStore Consistency viaConflict Resolution
  • Consistency is form of optimistic concurrency
  • An update packet contains a series of
    predicate-action pairs which operate on encrypted
    data
  • Each predicate tried in turn
  • If none match, the update is aborted
  • Otherwise, action of first true predicate is
    applied
  • Inner Ring must securely
  • Pick serial order of updates
  • Apply them
  • Sign result (threshold signature)
  • Disseminate results to active users

21
Automatic Maintenance
  • Byzantine Commitment for inner ring
  • Tolerates up to 1/3 malicious servers in inner
    ring
  • Continuous refresh of set of inner-ring servers
  • Proactive threshold signatures
  • Use of Tapestry ?membership of inner ring unknown
    to clients
  • Secondary tier self-organized into overlay
    dissemination tree
  • Use of Tapestry routing to suggest placement of
    replicas in the infrastructure
  • Automatic choice between update vs invalidate

22
Self-Organizing Soft-State Replication
  • Simple algorithms for placing replicas on nodes
    in the interior
  • Intuition locality propertiesof Tapestry help
    select positionsfor replicas
  • Tapestry helps associateparents and childrento
    build multicast tree
  • Preliminary resultsshow that this is effective

23
Deep Archival Storage
24
TwoTypes of OceanStore Data
  • Active Data Floating Replicas
  • Per object virtual server
  • Logging for updates/conflict resolution
  • Interaction with other replicas for
    consistentency
  • May appear and disappear like bubbles
  • Archival Data OceanStores Stable Store
  • m-of-n coding Like hologram
  • Data coded into n fragments, any m of which are
    sufficient to reconstruct (e.g m16, n64)
  • Coding overhead is proportional to n?m (e.g 4)
  • Other parameter, rate, is 1/overhead
  • Fragments are cryptographically self-verifying
  • Most data in the OceanStore is archival!

25
Archival Disseminationof Fragments
26
Fraction of Blocks Lost per Year (FBLPY)
  • Exploit law of large numbers for durability!
  • 6 month repair, FBLPY
  • Replication 0.03
  • Fragmentation 10-35

27
The Dissemination ProcessAchieving Failure
Independence
28
Automatic Maintenance
  • Continuous Entropy Suppression i.e. repair!
  • Erasure coding give flexibility in timing repair
  • Data continuously transferred from physical
    medium to physical medium
  • No tapes decaying in basement
  • Actual Repair
  • Recombine fragments, then send out copies again
  • DOLR permits efficient heartbeat mechanism
  • Permits infrastructure to notice
  • Servers going away for a while
  • Or, going away forever!
  • Continuous sweep through data

29
Introspective Tuning
30
On the use of Redundancy
  • Question Can we use Moores law gains for
    something other than just raw performance?
  • Growth in computational performance
  • Growth in network bandwidth
  • Growth in storage capacity
  • Physical systems are unreliable and untrusted
  • Can we use multiple faulty elements instead of
    one?
  • Can we devote resources to monitoring and
    analysis?
  • Can we devote resources to repairing systems?
  • Complexity of systems growing rapidly
  • Can no longer debug systems entirely
  • How to handle this?

31
The Biological Inspiration
  • Biological Systems are built from (extremely)
    faulty components, yet
  • They operate with a variety of component failures
    ? Redundancy of function and representation
  • They have stable behavior ? Negative feedback
  • They are self-tuning ? Optimization of common
    case
  • Introspective Computing
  • Components for computing
  • Components for monitoring andmodel building
  • Components for continuous adaptation

32
The Thermodynamic Analogy
  • System such as OceanStore has a variety of latent
    order
  • Connections between elements
  • Mathematical structure (erasure coding, etc)
  • Distributions peaked about some desired behavior
  • Permits Stability through Statistics
  • Exploit the behavior of aggregates
  • Subject to Entropy
  • Servers fail, attacks happen, system changes
  • Requires continuous repair
  • Apply energy (i.e. through servers) to reduce
    entropy

33
Introspective Optimization
  • Adaptation of routing substrate
  • Optimization of Tapestry Mesh
  • Fault-tolerant routing mechanisms
  • Adaptation of second-tier multicast tree
  • Monitoring of access patterns
  • Clustering algorithms to discover object
    relationships
  • Time series-analysis of user and data motion
  • Observations of system behavior
  • Extracting of failure correllations
  • Continuous testing and repair of information
  • Slow sweep through all information to make sure
    there are sufficient erasure-coded fragments
  • Continuously reevaluate risk and redistribute data

34
PondStore Java
  • Event-driven state-machine model
  • Included Components
  • Initial floating replica design
  • Conflict resolution and Byzantine agreement
  • Routing facility (Tapestry)
  • Bloom Filter location algorithm
  • Plaxton-based locate and route data structures
  • Introspective gathering of tacit info and
    adaptation
  • Language for introspective handler construction
  • Clustering, prefetching, adaptation of network
    routing
  • Initial archival facilities
  • Interleaved Reed-Solomon codes for fragmentation
  • Methods for signing and validating fragments
  • Target Applications
  • Unix file-system interface under Linux (legacy
    apps)
  • Email application, proxy for web caches,
    streaming multimedia applications

35
We have Things Running!
  • Latest it is up to 7MB/sec
  • Still a ways to go, but working

36
Update Latency
  • Cryptography in critical path (not surprising!)
  • New metric Avoid hashes (like avoid copies)

37
OceanStore Goes Global!
  • OceanStore components running globally
  • Australia, Georgia, Washington, Texas, Boston
  • Able to run the Andrew File-System benchmark with
    inner ring spread throughout US
  • Interface NFS on OceanStore
  • Word on the street it was easy to do
  • The components were debugged locally
  • Easily set up remotely
  • I am currently talking with people in
  • England, Maryland, Minnesota, .
  • PlanetLab testbed will give us access to much more

38
Reality Web Caching through OceanStore
39
Other Apps
  • Better file system support
  • NFS (working reimplementation in progress)
  • Windows Installable file system (soon)
  • Email through OceanStore
  • IMAP and POP proxies
  • Let normal mail clients access mailboxes in OS
  • Palm-pilot synchronization
  • Palm data base as an OceanStore DB

40
OceanStore Conclusions
  • OceanStore everyones data, one big utility
  • Global Utility model for persistent data storage
  • OceanStore assumptions
  • Untrusted infrastructure with a responsible party
  • Mostly connected with conflict resolution
  • Continuous on-line optimization
  • OceanStore properties
  • Provides security, privacy, and integrity
  • Provides extreme durability
  • Lower maintenance cost through redundancy,
    continuous adaptation, self-diagnosis and repair
  • Large scale system has good statistical properties

41
For more info
  • OceanStore vision paper for ASPLOS 2000
  • OceanStore An Architecture for Global-Scale
    Persistent Storage
  • Tapestry algorithms paper (SPAA
    2002) Distributed Object Location in a Dynamic
    Network
  • Bloom Filters for Probabilistic Routing (INFOCOM
    2002)
  • Probabilistic Location and Routing
  • OceanStore web site http//oceanstore.org/
Write a Comment
User Comments (0)
About PowerShow.com