Title: Building Undo for Operators: An Undoable Email Store
1Building Undo for OperatorsAn Undoable E-mail
Store
- Aaron Brown
- ROC Research GroupUniversity of California,
BerkeleyWinter 2003 ROC Research Retreat14
January 2003
2Outline
- Recap of Three-Rs Undo model
- Implementing Undo for e-mail
- Analysis of e-mail undo prototype
- Conclusions and future directions
3Recap Undo for Operators
- Dependability is achieved at the interface of the
system and its operator - Undo goal support the operator
- create a forgiving operations environment
- tolerate human error, allow trial-and-error
experiments - provide last-resort recovery from unknown damage
- retroactive repair of software bugs, broken
upgrades, unknown operator changes, external
attacks - be easily comprehensible by operator and
developer - preserve user data while refreshing system state
4Recap The Three-Rs Undo Model
- Provide Undo as virtual time travel
- Rewind roll back all system state, physically
- Repair make changes to past timeline to avert
original disaster - Replay roll state forward logically, merging
original timeline with effects of repairs - Key properties of 3Rs Undo
- recovery from problems at any system layer
- recovery from unanticipated problems
- no assumptions about correct application behavior
5Outline
- Recap of Three-Rs Undo model
- Implementing Undo for e-mail
- Analysis of e-mail undo prototype
- Conclusions and future directions
6Building an E-mail Undo Prototype
- Target an undoable e-mail store service
- a leaf node in the Internet e-mail network
- accessed via IMAP and SMTP
- Built around existing e-mail store service
- e.g., sendmailimapd, iPlanet, Exchange
- Extensible to other apps
- architected around a reusable undo core with
e-mail-specific extensions - Written in Java
7Undo System Architecture
ControlUI
User
IMAP, SMTP
Verbs
UndoManager
TimelineLog
E-mail Proxy
IMAP, SMTP
E-mail Service
Includes - user state - application - OS
control
Time-travelStorage
8Architecture Design Points
- Proxy-based design targeted at services
- Application-neutral core
- undo manager, timeline log, time-travel storage,
UI - contains all undo-cycle logic
- E-mail-specific semantics encapsulated
- proxy encodes e-mail interactions into verbs
- verbs define e-mail semantics
- model of preserved state
- model of acceptable external (observed)
consistency
9Verbs
- Key construct in undo architecture
- represent end-user state changes, state exposure
- building block of timeline and history
- used to detect and manage inconsistencies
- Essential for reasoning about undo
- verbs define exactly what state is preserved
during an undo cycle - provide framework for defining applications
model of acceptable external inconsistency
10Verbs and the Undo Cycle
11Comparison to Optimistic Replication
- Insight undo can be recast as replicationnot in
space, but in time - two virtual replicas diverge at the Rewind point
- verb history defines an update log to one replica
- repairs define updates to the other replica
- Replay involves reconciling the two update
streams, detecting and handling inconsistencies - Verb-based design is heavily influenced by
replica systems - verbs map to replica system actions or updates
12Comparison to Optimistic Replication (2)
- But Undo has key differences
- reconciliation between a logical log (verb
history) and a physical state (repaired system) - so log-merging schemes (e.g., IceCube) dont
apply - not all inconsistencies are bad
- no need to check for inconsistency until
externalization - no complex, error-prone guards on verb execution
- inconsistencies can be handled after-the-fact
- post-conditions on externalized data are
sufficient - fewer assumptions of application correctness
- if necessary, can rewind undesired
inconsistency-producing verb
13E-mail example original timeline
Systemboundary
Systemstate
Verbs
Historylog
Time
14E-mail example replay timeline
Systemboundary
X
Systemstate
Verbs
Historylog
Time
15Verbs and External Inconsistency
- Detecting inconsistency
- verbs that externalize state record a signature
of externalized data and define a comparison
function - comparison fn. defines the external consistency
model - Handling inconsistency
- verbs define compensation routines, invoked when
inconsistencies are found - later non-commuting verbs are squashed to prevent
data loss
16Verbs for E-mail
- SMTP IMAP protocols mapped into 13 verbs
- SMTP Deliver
- IMAP Append, Fetch, Store, Copy, List, Status,
Select, Expunge, Close, Create, Rename, Delete - set could easily be extended with user and
subscription management functionality - Each verb is a Java class implementing a generic,
app-neutral Verb interface
17Example Verb STORE (set msg flags)
- Tag
- input target folder, message IDs, flag value
- externalized output resultant flags of messages
- Sequencing tests
- commutes with independent of SMTP verbs, all
IMAP verbs except Fetch/Store/Close on same
folder - Consistency check
- new flags must match original flags for all
messages in common no original messages should
be missing - Inconsistency handler
- leave user a message explaining and listing
changes
18An External Consistency Model for E-mail
- Retrieval (IMAP)
- tracked state includes message bodies, key
message headers (to/from/subject/cc), flags,
folder lists, and execution status of
state-altering commands - inconsistency if objects are missing or altered
on replay or commands fail order new objects
ignored - compensation via explanatory messages and
creation of lost-and-found containers - Delivery (SMTP)
- only possible inconsistency is in execution
status - one tricky case originally-failed delivery
succeeds - delay bounce to provide window for undo-repair
19Undo System Architecture
ControlUI
User
IMAP, SMTP
Verbs
UndoManager
TimelineLog
E-mail Proxy
IMAP, SMTP
- Next up
- proxy
- undo manager
- time-travel storage layer
E-mail Service
control
Time-travelStorage
20Proxying E-mail
- IMAP proxy loop is straightforward
- accept client connection and open server
connection - parse commands and instantiate corresponding
verbs - invoke undo manager to schedule, execute, log
verb - passing a handle to the client and server
connections - SMTP is more complicated
- failed SMTP deliveries must be logged for later
retry - proxy poses as a server that always accepts
deliveries - deliver verb is created after client has been
ACKd - some finesse to avoid being an open-relay
- if verb fails, proxy sends a bounce after a delay
21Undo Manager Implementation
- Timeline log
- BerkeleyDB recno database storing serialized
verbs - also stores control records to make undo manager
state persistent and recoverable - Verb scheduling
- all verb execution passes through undo manager
- scoreboard-like structure sequences verbs
according to independence, commutativity,
ordering properties - External inconsistency management
- undo manager coordinates verb APIs to ensure that
external inconsistencies are detected and handled
22A Time-Travel Storage Layer
- Base Network Appliance Filer with snapshots
- Java wrapper for Filers management CLI
- provides direct control of snapshot
create/restore - periodically takes snapshots, aging out old ones
- hierarchical scheduling allows the 31 snapshots
to span one month, with density inversely
proportional to age - Rewind restores the closest prior snapshot, then
replays up to the exact rewind point - Challenge making snapshot restore undoable
- solution implement rollback by copying old
snapshot forward to present (expensive, but
necessary)
23Outline
- Recap of Three-Rs Undo model
- Implementing Undo for e-mail
- Analysis of e-mail undo prototype
- Conclusions and future directions
24Code complexity
- Entire system is about 23K lines of Java
- split evenly between app-neutral and
e-mail-specific code - bulk of e-mail code is in verbs
- Implementing and debugging the e-mail verbs and
proxy took roughly two man-months
25Measuring Overhead for Undo
- Workload
- modified SPECmail2000 e-mail benchmark
- simulates traffic seen by an ISP mail server
- modified to use IMAP instead of POP, all mail
local - configured with 5000 users
- about 56 SMTP cxns/minute, 149 IMAP cxns/minute
- Setup
- mail server Linux, sendmail UW imapd, 5000
users - undo proxy Win2k, Sun 1.4.1 JDK
- workload generator Win2k, Sun 1.4.1 JDK
- storage all mailboxes and logs on NetApp Filer
- mailboxes accessed via NFS, logs via CIFS
26Overhead Results Space and Time
- Space overhead
- 5GB/day for timeline log 1GB/day per 1000 users
- 325KB per 1000 mail folder name translations
- per 120GB disk
- 7 weeks of timeline for 1000 ISP users
- or 350 million folder name translations
- Time overhead
- cumulative distribution plot of IMAP and SMTP
session lengths
27Outline
- Recap of Three-Rs Undo model
- Implementing Undo for e-mail
- Analysis of e-mail undo prototype
- Conclusions and future directions
28Next Steps
- Perform more analysis of prototype
- speed of rewind, replay, entire undo cycle
- Measure efficacy of undo with human expts.
- compare recovery time from pre-defined scenarios
with and without undo - based on survey of e-mail administrators
- Examine other applications to understand
generality of undo architecture - auctions, e-commerce, others?
29Conclusions
- We can build a real Undo implementation
- for a real application
- with tolerable overhead
- with a reusable, application-independent core
- based on a verb model that enables reasoning
about state and undo behavior - Look forward to final analysis at next retreat!
30Building Undo for OperatorsAn Undoable E-mail
Store
- Aaron Brown
- ROC Research GroupUniversity of California,
Berkeleyabrown_at_cs.berkeley.eduhttp//roc.cs.ber
keley.edu/
31Backup Slides
32Summary Whats in a Verb?
- Encapsulation of a user-service interaction
- action specification
- procedure to execute verb via proxy
- tag input parameters, signature of externalized
data, verb execution status - Commutativity/independence test functions
- Consistency management functions
- routine to check two tags for external
inconsistency - inconsistency handler
- squash routine
33Verbs and the Undo Timeline
- Recorded history of verbs must be consistent with
actual verb execution - but logging at proxy may not match execution
order - Solution partial serialization of verbs
- undo manager tracks arriving executing verbs in
scoreboard - arriving verbs that commute are executed in
parallel - non-commuting verbs stall until dependencies
clear - Guarantees consistent recorded timeline without
requiring hooks into application
34Verb Issue Naming State
- Verbs need to specify names of target state
- folders, messages, users
- Names must be time-invariant and unaffected by
changes to execution timeline - IMAP protocol names cant guarantee this
- Solution allocate UndoIDs to objects
- UndoID never changes and is restored on replay
- undo system maintains translations to IMAP names
- messages embed UndoID in header field
- folders track UndoIDs in parallel database