EEC 688/788 Secure and Dependable Computing

About This Presentation

Title:

EEC 688/788 Secure and Dependable Computing

Description:

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing_at_ieee.org – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 27

Provided by: Wenb3

Learn more at: https://academic.csuohio.edu

Category:

more less

Transcript and Presenter's Notes

Title: EEC 688/788 Secure and Dependable Computing

1
EEC 688/788Secure and Dependable Computing

Lecture 7
Wenbing Zhao
Department of Electrical and Computer Engineering
Cleveland State University
wenbing_at_ieee.org

2
Outline

Checkpointing and logging
Checkpoint-based protocols
Uncoordinted checkpointing
Coordinated checkpointing
Logging-based protocols
Pessimistic logging
Optimistic logging
Causal logging

3
Chandy and Lamport Distributed Snapshot Protocol

CL snapshot protocol is a nonblocking protocol
TS checkpointing protocol is blocking
CL protocol is more desirable for applications
that do not wish to suspect normal operation
However, CL protocol is only concerned how to
obtain a consistent global checkpoint
CL Protocol no coordinator, any node may
initiate a global checkpointing
Data structure
Marker message equivalent to the CHECKPOINT
message
Marker certificate keep track to see a marker is
received from every incoming channel

4
CL Distributed Snapshot Protocol
5
Example

P0 channel state m0 (p1 to p0 channel)
P1 channel state m1 (p2 to p1 channel)
P2 channel state empty

6
Comparison of TS CL Protocols

Similarity
Both rely on control msgs to coordinate
checkpointing
Both capture channel state in virtually the same
way
Start logging channel state upon receiving the
1st checkpoint msg from another channel
Stop logging channel state after received
checkpoint on the incoming channel
Communication overhead similar

7
Comparison of TS CL Protocols

Differences strategies in producing a global
checkpoint
TS protocol suspends normal operation upon 1st
checkpoint msg while CL does not
TS protocol captures channel state prior to
taking a checkpoint, while CL captures channel
state after taking a checkpoint
TS protocol more complete and robust than CL
Has fault handling mechanism

8
Log Based Protocols

Work might be lost upon recovery using
checkpoint-based protocols
By logging messages, we may be able to recover
the system to where it was prior to the failure
System mode the execution of a process is
modeled as a set of consecutive state intervals
Each interval is initiated by a nondeterministic
state or initial state
We assume the only type of nondeterministic event
is receiving of a message

9
Log Based Protocols

In practice, logging is always used together with
checkpointing
Limits the recovery time start with the latest
checkpoint instead of from the initial state
Limits the size of the log after taking a
checkpoint, previously logged events can be
purged
Logging protocol types
Pessimistic logging msgs are logged prior to
execution
Optimistic logging msgs are logged
asynchronously
Causal logging nondeterministic events that not
yet logged (to stable storage) are piggybacked
with each msg sent
For optimistic and causal logging, dependency of
processes has to be tracked gt more complexity,
longer recovery time

10
Pessimistic Logging

Synchronously log every incoming message to
stable storage prior to execution
Each process periodically checkpoints its state
no need for coordination
Recovery a process restores its state using the
last checkpoint and replay all logged incoming
msgss

11
Pessimistic Logging Example

Pessimistic logging can cope with concurrent
failures and the recovery of two or more processes

12
Benefits of Pessimistic Logging

Processes do not need to track their dependencies
Logging mechanism is easy to implement and less
error prone
Output commit is automatically ensured
No need to carry out coordinated global
checkpointing
By replaying the logged msgs, a process can
always bring itself to be consistent with other
processes
Recovery can be done completely locally
Only impact to other processes duplicate msgs
(can be discarded)

13
Pessimistic Logging Discussion

Reconnection
A process must be able to cope with temporary
connection failures and be ready to accept
reconnections from other processes
Application logic should be made independent from
the transport level events event-based or
document-based computing paradigm
Message duplicate detection
Messages may be replayed during recovery gt
duplicate messages
Transport level duplicate detection irrelevant.
Must add mechanism in application level
protocols, e.g., WS-ReliableMessaging
Atomic message receiving and logging
A process may fail right after the receiving of a
message before it has a chance to log it to
stable storage
Need application-level reliable messaging
mechanism

14
Application-Level Reliable Messaging

Sender buffers message sent until receives an
application-level ack
Benefits of application-level reliable messaging
Atomic message receiving and logging
Facilitate distributed system recovery from
process failures enables reconnection
Enables optimization message received can be
executed immediately and the logging can be
deferred until another message is to be sent
Logging and msg execution can be done
concurrently
If a process sends out a message after receiving
several msgs, logging of msgs can be batched

15
Sender Based Message Logging

Basic idea
Log the message at the sending side in volatile
memory
Should the receiving process fail, it could
obtain the messages logged at the sending
processes for recovery.
To avoid restarting from the initial state after
a failure, a process can periodically checkpoint
its local state and write the message log in
stable storage (as part of the checkpoint)
asynchronously
Tradeoff
Relative ordering of messages must be explicitly
supplied by the receiver to the sender (quite
counter-intuitive!)
The receiver must wait for an explicit ack for
the ordering message before it send any msgs to
other processes (however, it can execute the
message received immediately without delay)
The mechanism is to prevent the formation of
orphan messages and orphan processes

16
Orphan Message and Orphan Process

An orphan message is one that was sent by a
process prior to a failure, but cannot be
guaranteed to be regenerated upon the recovery of
the process
An orphan process is a process that receives an
orphan message
If a process sends out a message and subsequently
fails before the determinants of the messages it
has received are properly logged, the message
sent becomes an orphan message

17
Sender Based Message Logging Protocol Data
Structures

A counter, seq_counter, used to assign a sequence
number (using the current value of the counter)
to each outgoing message
Needed for duplicate detection
A table for duplicate detection
Each entry has the form ltprocess_id,max_seqgt,
where max_seq is the maximum sequence number that
the current process has received from a process
with an identifier of process_id.
A message is deemed as a duplicate if it carries
a sequence number lower or equal to max_seq for
the corresponding process
Another counter, rsn_counter, used to record the
receiving/execution order of an incoming message
The counter is initialized to 0 and incremented
by one for each message received

18
Sender Based Message Logging Protocol Data
Structures

A message log (in volatile memory) for msg sent
by the process. In addition to the msg sent, the
following meta data is also recorded
Destination process id, receiver_id
Sending sequence number, seq
Receiving sequence number, rsn.
A history list for the messages received since
the last checkpoint. It is used to find the
receiving order number for a duplicate msg.
Upon receiving a duplicate message, the process
should supply the corresponding (original)
receiving order number so that the sender of the
message can log such ordering information
properly
Each entry in the list has the following
information
Sending process id, sender_id
Sending sequence number, seq
Receiving sequence number, rsn (assigned by the
current process).

19
What Should be Checkpointed?

All the data structures described above except
the history list must be checkpointed together
with the process state
The two counters, one for assigning the message
sequence number and the other for assigning the
message receiving order, are needed so that the
process can continue doing so upon recovery using
the checkpoint
The table for duplicate detection is needed for a
similar reason.
Why the message log must be checkpointed?
The log is needed for the receiving processes to
recover from a failure, and hence, cannot be
garbage collected upon a checkpointing operation
Additional mechanism is necessary to ensure that
the message log does not grow indefinitely

20
Sender Based Message Logging Protocol Message
Types

REGULAR It is used for sending regular messages
generated by the application process, and it has
the form ltREGULAR, seq, rsn,mgt
ORDER It is used for the receiving process is
notify the sending process the receiving order of
the message. An order message carries the form
ltORDER, m, rsngt,
m is the message identifier consisting of a
tuple ltsender_id, receiver_id, seqgt
ACK It is used for the sending process (of a
regular message) to acknowledge the receipt of
the order message. It assumes the form ltACK, mgt

21
Sender Based Message Logging Protocol Normal
Operation

The protocol operates in three steps for each
message
A regular message, ltREGULAR,seq, rsn,mgt, is sent
from one process, e.g., Pi, to another process,
e.g., Pj .
Process Pj determines the receiving/execution
order, rsn, of the regular message and informs
the determinant information to Pi in an order
message ltORDER, m, rsngt.
Process Pj waits until it has received the
corresponding acknowledgment message, ltACK, mgt,
before it sends out any regular message.

22
(No Transcript)
23
Sender Based Message Logging Protocol Recovery
Mechanism

On recovering from a failure, a process first
restores its state using the latest local
checkpoint, and then it must broadcast a request
to all other processes in the system to
retransmit all their logged messages that were
sent to the process
The recovering process retransmit the regular
messages or the ack messages based on the
following rule
If the entry in the log for a message contains no
rsn value, then a regular message is
retransmitted because the intended receiving
process might not have received this message.
If the entry in the log for a message contains a
valid rsn value, then an ack message is sent so
that the receiving process can send regular
messages
When a process receives a regular message, it
always sends a corresponding order message in
response

24
Actions upon Receiving a Regular Message

A process always sends a corresponding order msg
in response
Three scenarios with recovery
The msg is a not duplicate the current rsn
counter value is assigned to the msg and the
order msg is sent. The process must wait until it
receives the ack msg before it can send any
regular msg
The msg is a duplicate, and the corresponding rsn
is found in the history list actions are
identical to above except rsn is not newly
assigned
The msg is a duplicate, and no rsn is found in
the history list the process must have
checkpointed its state after receiving the msg
and the msg is no longer needed for recovery.
Hence, the order msg includes a special constant
indicating so. The sender can then purge the msg
in its log
The recovering process may receive two types of
retransmitted regular messages
Those with a valid rsn value the rsn must be
already part of the checkpoint. It executes the
msg according to the order
Those without can assign the msg to any order

25
Limitations of Sender Based Msg Logging Protocol

Wont work in the presence of 2 or more
concurrent failures
Determinant for some regular msgs (i.e., rsn)
might be lost gt orphan processes and cascading
rollbacks

P2 may become an orphan process if P0 and P1 both
crash received mt that no one has sent
26
Truncating Senders Message Log

Once a process completes a local checkpoint, it
broadcasts a message containing the highest rsn
value for the messages that it has executed prior
to the checkpoint.
All messages sent by other processes to this
process that were assigned a value that is
smaller or equal to this rsn value can now to
purged from its message log (including those in
stable storage as part of a checkpoint)
Alternatively, this highest rsn value can be
piggybacked with each message (regular or control
messages) sent to another process to enable
asynchronous purging of the logged messages that
are no longer needed

Write a Comment

User Comments (0)

About PowerShow.com

EEC 688/788 Secure and Dependable Computing - PowerPoint PPT Presentation

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing_at_ieee.org – PowerPoint PPT presentation