Title: CS 603 Mid-Semester Review
1CS 603Mid-Semester Review
2One or Two Day Review?
- One day Skim material and Test Overview
- What to do with Wednesday?
- More on replication
- Start on distributed processes
- Two day Discuss material to date
- Wednesday
- Finish Review
- Work out sample question
3Basics
- Why do we want distributed systems?
- Scaling
- Heterogeneity
- Geographic Distribution
- What is a distributed system?
- Transparency vs. Exposing Distribution
- Hardware Basics
- Communication Mechanisms
4Basic Software Concepts
- Hiding vs. Exposing
- Distribution Distributed OS
- Location, but not distribution Middleware
- None Network OS
- Concurrency Primitives
- Semaphores
- Monitors
- Distributed System Models
- Client-Server
- Multi-Tier
- Peer to Peer
5Communication Mechanisms
- Shared Memory
- Enforcement of single-system view
- Delayed consistency d-Common Storage
- Message Passing
- Reliability and its limits
- Stream-oriented Communications
- Remote Procedure Call
- Remote Method Invocation
6RPC Example DCE
- Language / Platform Independent
- Implementation Issues
- Data Conversion
- Underlying Mechanisms
- Fault Tolerance Approaches
7Java RMI
- Supports remote invocation of Java objects
- Key Java Object SerializationStream objects
over the wire - Language specific
- Advantages
- True object-orientation Objects as arguments
and values - Mobile behavior Returned objects can execute on
caller - Integrated security
- Built-in concurrency (through Java threads)
- Disadvantage Java only
- Implementation / Use
- Registry
8SOAP
- Goal RPC protocol that works over wide area
networks - Interoperable
- Language independent
- Problem Firewalls
- Solution HTTP/XML
- Client side Ability to generate http calls and
listen for response - Server
- Listen for HTTP
- Bind to procedure
- Respond with HTTP
- SOAP message format and use mechanisms
9Naming Requirements
- Disambiguate only
- Access resource given the name
- Build a name to find a resource
- Do humans need to use name?
- Static/Dynamic Resource
- Performance Requirements
10Naming Approaches
- Scope
- Global vs. Hierarchical
- Unique ID vs. Non-Unique Description
- Namespaces
- URN, URI, URL
- Registries
11Registry Example X.500
- Goal Global white pages
- Lookup anyone, anywhere
- Developed by Telecommunications Industry
- ISO standard directory for OSI networks
- Idea Distributed Directory
- Application uses Directory User Agent to access a
Directory Access Point
12Directory Information Base(X.501)
- Tree structure
- Root is entire directory
- Levels are groups
- Country
- Organization
- Individual
- Entry structure
- Unique name
- Build from tree
- Attributes Type/value pairs
- Schema enforces type rules
- Alias entries
13X.500
- Directory Entry
- Organization level CNPurdue University, LWest
Lafayette - Person level CNChris Clifton, SNClifton,
TITLEAssociate Professor - Directory Operations
- Query, Modify
- Authorization / Access control
- To directory
- Directory as mechanism to implement for others
14X.500 Distributed Directory
- Directory System Agent
- Referrals
- Replication
- Cache vs. Shadow copy
- Access control
- Modifications at Master only
- Consistency
- Each entry must be internally consistent
- DSA giving copy must identify as copy
15X.500 Subsets
- LDAP
- X.500 without OSI
- Intended for use over IP
- Active Directory
- Microsofts answer to LDAP
- Extensible default naming schema
- Limited replication facilities
16Clock Synchronization
- Definition All nodes agree on time
- What do we mean by time?
- What do we mean by agree?
- Lamport Definition Events
- Events partially ordered
- Clock counts the order
17Event-based definition(Lamport 78)
- Define partial order of processes
- A ? B A happened before B Smallest relation
such that - If A and B in same process and A occurs first, A
? B - If A is sending a message and B is receipt of a
message, A ? B - If A ? B and B ? C, then A ? C
- Clock C(x) is time x occurs
- C(x) Ci(x) where x running on node i.
- Clocks correct if ? a,b a?b ? C(a) lt C(b)
18Lamport Clock Implementation
- Node i Increments Ci between any two successive
events - If event a is sending of a message m from i to j,
- m contains timestamp Tm Ci(a)
- Upon receiving m, set Cj current Cj and gt Tm
- Can now define total ordering. a ? b iff
- Ci(a) lt Cj(b)
- Ci(a) Cj(b) and Pi lt Pj
19What if we want wall clock time?
- Ci must run at correct rate
- ? ? ltlt 1 such that dCi(t)/dt 1 lt ?
- Synchronized
- ? small e such that ? i,j Ci(t) Cj(t) lt e
- Assume transmission time between µ and µ?
- Algorithm Upon receiving message m,set Cj(t)
max(Cj(t), Tmµ) - Theorem Assume every t seconds a message with
unpredictable delay ? is sent over every arc.
Then ? t t0 td, e d(2?t ?)
20Clock SynchronizationLimits
- Best Possible Delay Uncertainty
- Actually e(1 1/n)
- Synchronization with Faults
- Faulty clock
- Communication Failure
- Malicious processor
- Worst case Can only synchronize if lt 1/3
processors faulty - Better if clocks can be authenticated
21Real example NTP
- I doubt you need to review this...
22Process Synchronization
- Problem Shared Resources
- Model as sequential or parallel process
- Assumes global state!
- Alternative Mutual Exclusion when Needed
- Coordinator approach
- Token Passing
- Timestamp
23Mutual Exclusion
- Requirements
- Does it guarantee mutual exclusion?
- Does it prevent starvation?
- Is it fair?
- Does it scale?
- Does it handle failures?
24CS 603Mid-Semester Review
25Mutual ExclusionColored Ticket Algorithm
- Goals
- Decentralized
- Fair
- Fault tolerant
- Space Efficient
- Idea Numbered Tickets
- Next number gets resource
- Problem Unbounded Space
- Solution Reissue blocks
26Multi-ResourceMutual Exclusion
- New Problem Deadlock
- Processes using all resources
- Each needs additional resource to proceed
- Dining Philosophers Problem
- Coordinated vs. truly distributed solutions
- Problems with deterministic solutions
- Probabilistic solution Lehman Rabin
- Starvation / fairness properties
27Distributed Transactions
- ACID properties
- Issues
- Commit Protocols
- Fault Tolerance
- Why is this enough?
- Failure Models and Limitations
- Mechanisms
- Two-phase commit
- Three-phase commit
28Two-Phase Commit(Lamport 76, Gray 79)
- Central coordinator initiates protocol
- Phase 1
- Coordinator asks if participants can commit
- Participants respond yes/no
- Phase 2
- If all votes yes, coordinator sends Commit
- Participants respond when done
- Blocks on failure
- Participants must replace coordinator
- If participant and coordinator fail, wait for
recovery - While blocked, transaction must remain Isolated
- Prevents other transactions from completing
29Transaction Model
- Transaction Model
- Global Transaction State
- Reachable State Graph
- Local states potentially concurrent if a
reachable global state contains both local states - Concurrency set C(s) is all states potentially
concurrent with s - Sender set S(s) local states t t sends m and
s can receive m - Failure Model
- Site failure assumed when expected message not
received in time - Independent Recovery
30Problems with 2-PC
- Blocking on failure
- 3-PC as solution
- Theorems on recovery limits
- Independent recovery No two-site failure
- Non-independent recovery
- Anything short of total failure okay
- Recovery protocol for total failure
313PC assuming timeout on receipt of message
Coordinator
Participant
q1
q2
start xact/ no
start xact/ yes
xact request/ start xact
abort/ -
w1
w2
no/ abort
yes/ pre-commit
pre-commit/ ack
p1
p2
ack/commit
commit/ -
32Termination Protocol
- If participant times out in w2 or p2
- Elect new Coordinator
- If coordinator alive, would have
committed/aborted - New coordinator requests state of all processes.
Termination rules - If any aborted, broadcast abort
- If any committed, broadcast commit
- If all w2, broadcast abort
- If any p2, send pre-commit and enter state p1
- Complete failure protocol
33Test Basics
- Mechanics Open book/notes
- No electronic aids
- Two questions
- Each multi-part
- Will include scoring suggestions
- Underlying question Do you understand the
material? - No need to regurgitate best in literature
answer - Reasonable self-designed solution fine
- Key Do you really understand your answer
- Can you build CORRECT distributed systems?
34Sample QuestionClock Synchronization
- Develop synchronization protocol for a four
processor system with fully-connected processors.
- Linear envelope of real time
- Bounded difference between clocks on correct
processors. - Time set to 0 when the protocol begins (but not
synchronized). - Assume
- Clocks don't drift
- Messages take between time 0 and e
- At most one faulty processor
- No authentication
- Discuss the correctness of your algorithm,
including the types of faults handled. - Scoring
- Protocol Up to five points
- Argument for correctness 2 points
- requires believable proof sketch for full 2
points - Faults supported / not supported 1-3 points
- 3 points requires proof sketch that it handles
supported faults and examples showing failure
with unsupported fault types.