Sinfonia: A New Paradigm for - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Sinfonia: A New Paradigm for

Description:

Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christos Karamanolis ... Development of these DS based on Message-Passing involves complicated ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 28
Provided by: philh4
Category:

less

Transcript and Presenter's Notes

Title: Sinfonia: A New Paradigm for


1
Sinfonia A New Paradigm for Building Scalable
Distributed Systems
Marcos K. Aguilera, Arif Merchant, Mehul Shah,
Alistair Veitch, Christos Karamanolis
Presented by Phil Huynh
2
The problem
  • Distributed systems (DS) requires
    fault-tolerance, scalability, consistency,
    reasonable performance
  • Development of these DS based on Message-Passing
    involves complicated protocols for handling
    distributed states ( protocols for replication,
    file management, cache consistency,
    membership...)

HARD TO ACHIEVE
3
Sinfonia comes to help!
  • Sinfonia is claimed to be a service that supports
    application data sharing in a fault-tolerant,
    scalable and consistent manner
  • Provides the ACID abstraction of
    minitransaction instead of low level message
    passing
  • Transform the problem of protocol design into
    data structure design ltltlt much easier

4
Overview
  • Design
  • Minitransactions
  • Protocol
  • Features
  • Sample Applications
  • Evaluation
  • Conclusion

5
Design
  • Assumptions data center
  • fairly well-connected machines
  • small network latencies
  • trustworthy participants
  • (not valid assumptions for WAN, P2P)
  • Goals
  • Provide a framework for building distributed
    infrastructure applications

6
Design
  • Principles
  • Make components reliable before scaling them
  • Reduce coupling to obtain scalability no
    assumption about structure on the data, just
    memory addressing and raw binary

7
Design
  • Components app nodes, memory nodes
  • Separate address space
  • ltmem-node-id, addressgt

8
Minitransaction
  • Conceptually a combination of multiple operations
    for handling distributed data
  • ACID properties
  • Each minitransaction compare items, read
    items, write items
  • Swap, compare-and-swap, atomic-read-of-many-data,
    acquire-a-lease, acquire-multiple-lease-atomically
    , change-data-if-lease-held

9
Minitransaction
10
Protocol
11
Protocol
12
Features
  • Fault tolerance
  • Consistent backup
  • Replication
  • Modes of operation

13
Fault tolerance
  • Recovery from coordinator crashes
  • Use a 3rd party recovery coordinator
  • Recovery node ask participants to abort
  • Participant if did not vote, then vote abort
  • otherwise, resend the vote to recovery
    coordinator
  • Recovery node act as if it was the
    minitransactions coordinator
  • Recovery from participant crashes
  • Participant
  • On restart, replay redo-log
  • If a minitransaction is not in decided list, ask
    relevant participants for status (committed /
    aborted)
  • Still appear offline until this process finished
  • Recovery from whole system crashes
  • Participants
  • Send final status of recent minitransactions to
    all other nodes
  • Start the above procedure

14
Consistent backups
  • Lock all addresses of all nodes
  • (without blocking to avoid dead lock)
  • Update the disk image up to the last committed
    minitransaction
  • Disk image is copied or snapshotted
  • Release locks
  • Backup made from the copy or snapshot

15
Replication
  • Replicate redo-log, decide-log, forced-abort log
  • The primary copy sends updates on these logs to
    the replica and checks for acks

16
Modes of operation
17
Review
  • Fault tolerance smart strategy
  • Consistent backup lock all address of all nodes,
    copy the disk images until the last committed
    minitransaction at each mem-node, then release
    locks
  • Replication nothing special
  • Load balancing not supported
  • Caching not supported, app should take care
  • Modes of operation

18
Sample Applications
  • SinfoniaFS distributed file system
  • SinfoniaGCS group communication system

19
SinfoniaFS
  • Scalable, fault tolerant FS
  • Cluster nodes application nodes of Sinfonia
  • Each Sinfonia memory node holds data metadata
  • Sinfonia LOG or LOG-REPL mode
  • Data block of 16KB
  • Inode, chaining list, and file content of the
    same file are stored on the same memory node
    (locality)
  • Each cluster (app) node writes to a preferred
    memory node (load balancing)
  • Memory nodes need not even know each others
    existence (scalability)

20
Implementation
  • As easy as implement a local file system

21
Implementation
22
SinfoniaGCS
  • Broadcast m member adds m to its queue, finds
    the end of the global list, updates the global
    tail pointing to m
  • Receive new msgs member follows next pointer in
    the global list

23
SinfoniaGCS
  • Join member acquires a global lock, updates the
    latest view, finds the global tail, broadcast a
    join msg, release the lock
  • Leave same as join

24
Evaluation
  • Sinfonia is taken to compare with Berkely DB 4.5
    on 1 mem node

25
Evaluation
  • Scalability tests

26
Evaluation
  • Scalability tests compared with Berkeley DB 4.5

27
Conclusion
  • Good stuffs
  • ACID transactions
  • Fault tolerance strategies
  • Concerns
  • Load balancing and memory nodes are not
    transparent to users
  • Locking on big system may not be efficient which
    prevents backups
Write a Comment
User Comments (0)
About PowerShow.com