FT 101 Jim Gray Microsoft Research http://research.microsoft.com/~gray/Talks/ 80% of slides are not shown (are hidden) so view with PPT to see them all Outline - PowerPoint PPT Presentation

About This Presentation
Title:

FT 101 Jim Gray Microsoft Research http://research.microsoft.com/~gray/Talks/ 80% of slides are not shown (are hidden) so view with PPT to see them all Outline

Description:

Fail-Fast is Good, Repair is Needed. Improving either MTTR or ... duplexed disc will fail during maintenance?1: ... to make a process fail-fast ... – PowerPoint PPT presentation

Number of Views:201
Avg rating:3.0/5.0
Slides: 63
Provided by: jimg178
Category:

less

Transcript and Presenter's Notes

Title: FT 101 Jim Gray Microsoft Research http://research.microsoft.com/~gray/Talks/ 80% of slides are not shown (are hidden) so view with PPT to see them all Outline


1
FT 101 Jim Gray Microsoft Researchhttp//resear
ch.microsoft.com/gray/Talks/80 of slides are
not shown (are hidden) so view with PPT to see
them allOutline
  • Terminology and empirical measures
  • General methods to mask faults.
  • Software-fault tolerance
  • Summary

2
Dependability The 3 ITIES
  • Reliability / Integrity does the right thing.
    (Also large MTTF)
  • Availability does it now. (Also small MTTR
    MTTFMTTRSystem
    Availabilityif 90 of terminals up 99 of DB
    up? (gt89 of transactions are serviced on
    time).
  • Holistic vs. Reductionist view

Security
Integrity
Reliability
Availability
3
High Availability System ClassesGoal Build
Class 6 Systems
Availability 90. 99. 99.9 99.99 99.999 99.99
99 99.99999
UnAvailability MTTR/MTBF can cut it in ½ by
cutting MTTR or MTBF
4
Demo looking at some nodes
  • Look at http//uptime.netcraft.com/
  • Internet Node availability 92 mean, 97
    medianDarrell Long (UCSC) ftp//ftp.cse.ucsc.e
    du/pub/tr/
  • ucsc-crl-90-46.ps.Z "A Study of the Reliability
    of Internet Sites"
  • ucsc-crl-91-06.ps.Z "Estimating the Reliability
    of Hosts Using the Internet"
  • ucsc-crl-93-40.ps.Z "A Study of the Reliability
    of Hosts on the Internet"
  • ucsc-crl-95-16.ps.Z "A Longitudinal Survey of
    Internet Host Reliability"

5
Sources of Failures
  • MTTF MTTR
  • Power Failure 2000 hr 1 hr
  • Phone Lines
  • Soft gt.1 hr .1 hr
  • Hard 4000 hr 10 hr
  • Hardware Modules 100,000hr 10hr (many are
    transient)
  • Software
  • 1 Bug/1000 Lines Of Code (after vendor-user
    testing)
  • gt Thousands of bugs in System!
  • Most software failures are transient dump
    restart system.
  • Useful fact 8,760 hrs/year 10k hr/year

6
Case Study - Japan"Survey on Computer Security",
Japan Info Dev Corp., March 1986. (trans Eiichi
Watanabe).
Vendor
4
2

Tele Comm lines
1
2

1
1
.
2
Environment

2
5

Application Software
9
.
3

Operations
  • Vendor (hardware and software) 5 Months
  • Application software 9 Months
  • Communications lines 1.5 Years
  • Operations 2 Years
  • Environment 2 Years
  • 10 Weeks
  • 1,383 institutions reported (6/84 - 7/85)
  • 7,517 outages, MTTF 10 weeks, avg
    duration 90 MINUTES
  • To Get 10 Year MTTF, Must Attack All These Areas

7
Case Studies - Tandem Trends Reported MTTF by
Component
  • 1985 1987 1990
  • SOFTWARE 2 53 33 Years
  • HARDWARE 29 91 310 Years
  • MAINTENANCE 45 162 409 Years
  • OPERATIONS 99 171 136 Years
  • ENVIRONMENT 142 214 346 Years
  • SYSTEM 8 20 21 Years
  • Problem Systematic Under-reporting

8
Many Software Faults are Soft
  • After Design Review
  • Code Inspection
  • Alpha Test
  • Beta Test
  • 10k Hrs Of Gamma Test (Production)
  • Most Software Faults Are Transient
  • MVS Functional Recovery Routines
    51
  • Tandem Spooler 1001
  • Adams gt1001
  • Terminology
  • Heisenbug Works On Retry
  • Bohrbug Faults Again On Retry
  • Adams "Optimizing Preventative Service of
    Software Products", IBM J RD,28.1,1984
  • Gray "Why Do Computers Stop", Tandem TR85.7,
    1985
  • Mourad "The Reliability of the IBM/XA Operating
    System", 15 ISFTCS, 1985.

9
Summary of FT Studies
  • Current Situation 4-year MTTF gt Fault
    Tolerance Works.
  • Hardware is GREAT (maintenance and MTTF).
  • Software masks most hardware faults.
  • Many hidden software outages in operations
  • New Software.
  • Utilities.
  • Must make all software ONLINE.
  • Software seems to define a 30-year MTTF ceiling.
  • Reasonable Goal 100-year MTTF.
    class 4 today gt class 6 tomorrow.

10
Fault Tolerance vs Disaster Tolerance
  • Fault-Tolerance mask local faults
  • RAID disks
  • Uninterruptible Power Supplies
  • Cluster Failover
  • Disaster Tolerance masks site failures
  • Protects against fire, flood, sabotage,..
  • Redundant system and service at remote site.
  • Use design diversity

11
Outline
  • Terminology and empirical measures
  • General methods to mask faults.
  • Software-fault tolerance
  • Summary

12
Fault Model
  • Failures are independentSo, single fault
    tolerance is a big win
  • Hardware fails fast (blue-screen)
  • Software fails-fast (or goes to sleep)
  • Software often repaired by reboot
  • Heisenbugs
  • Operations tasks major source of outage
  • Utility operations
  • Software upgrades

13
Fault Tolerance Techniques
  • Fail fast modules work or stop
  • Spare modules instant repair time.
  • Independent module fails by design MTTFPair
    MTTF2/ MTTR (so want tiny MTTR)
  • Message based OS Fault Isolation software has
    no shared memory.
  • Session-oriented comm Reliable messages detect
    lost/duplicate messages coordinate messages
    with commit
  • Process pairs Mask Hardware Software Faults
  • Transactions give A.C.I.D. (simple fault model)

14
Example the FT Bank
  • Modularity Repair are KEY
  • vonNeumann needed 20,000x redundancy in
    wires and switches
  • We use 2x redundancy.
  • Redundant hardware can support peak loads (so
    not redundant)

15
Fail-Fast is Good, Repair is Needed
Lifecycle of a module fail-fast gives short
fault latency High Availability is
low UN-Availability Unavailability MTTR
MTTF
  • Improving either MTTR or MTTF gives benefit
  • Simple redundancy does not help much.

16
Hardware Reliability/Availability (how to make
it fail fast)
  • Comparitor Strategies
  • Duplex Fail-Fast fail if either fails (e.g.
    duplexed cpus)
  • vs Fail-Soft fail if both fail (e.g. disc,
    atm,...)
  • Note in recursive pairs, parent knows which is
    bad.
  • Triplex Fail-Fast fail if 2 fail (triplexed
    cpus)
  • Fail-Soft fail if 3 fail (triplexed FailFast
    cpus)

17
Redundant Designs have Worse MTTF!
The Airplane Rule A two-engine airplane has
twice as many engine problems as a one engine
plane.
  • THIS IS NOT GOOD Variance is lower but MTTF is
    worse
  • Simple redundancy does not improve MTTF
    (sometimes hurts).
  • This is just an example of
    the airplane rule.

18
Add Repair Get 104 Improvement
19
When To Repair?
  • Chances Of Tolerating A Fault are 10001 (class
    3)
  • A 1995 study Processor Disc Rated At 10khr
    MTTF
  • Computed Single Observed
  • Failures Double Fails Ratio
  • 10k Processor Fails 14 Double 1000 1
  • 40k Disc Fails, 26 Double 1000 1
  • Hardware Maintenance
  • On-Line Maintenance "Works" 999 Times Out Of
    1000.
  • The chance a duplexed disc will fail during
    maintenance?11000
  • Risk Is 30x Higher During Maintenance
  • gt Do It Off Peak Hour
  • Software Maintenance
  • Repair Only Virulent Bugs
  • Wait For Next Release To Fix Benign Bugs

20
OK So Far
  • Hardware fail-fast is easy
  • Redundancy plus Repair is great (Class 7
    availability)
  • Hardware redundancy repair is via modules.
  • How can we get instant software repair?
  • We Know How To Get Reliable Storage
  • RAID Or Dumps And Transaction Logs.
  • We Know How To Get Available Storage
  • Fail Soft Duplexed Discs (RAID 1...N).
  • ? How do we get reliable execution?
  • ? How do we get available execution?

21
Outline
  • Terminology and empirical measures
  • General methods to mask faults.
  • Software-fault tolerance
  • Summary

22
Key Idea



  • Architecture Hardware Faults
  • Software Masks Environmental Faults
  • Distribution Maintenance
  • Software automates / eliminates operators
  • So,
  • In the limit there are only software design
    faults.Software-fault tolerance is the key to
    dependability.
    INVENT IT!

23
Software Techniques Learning from Hardware
  • Recall that most outages are not hardware.
  • Most outages in Fault Tolerant Systems are
    SOFTWARE
  • Fault Avoidance Techniques Good Correct
    design.
  • After that Software Fault Tolerance Techniques
  • Modularity (isolation, fault containment)
  • Design diversity
  • N-Version Programming N-different
    implementations
  • Defensive Programming Check parameters and data
  • Auditors Check data structures in background
  • Transactions to clean up state after a failure
  • Paradox Need Fail-Fast Software

24
Fail-Fast and High-Availability Execution
  • Software N-Plexing Design Diversity
  • N-Version Programming
  • Write the same program N-Times (N gt 3)
  • Compare outputs of all programs and take
    majority vote
  • Process Pairs Instant restart (repair)
  • Use Defensive programming to make a process
    fail-fast
  • Have restarted process ready in separate
    environment
  • Second process takes over if primary faults
  • Transaction mechanism can clean up distributed
    state
  • if takeover in middle of computation.

25
What Is MTTF of N-Version Program?
  • First fails after MTTF/N
  • Second fails after MTTF/(N-1),...
  • so MTTF(1/N 1/(N-1) ... 1/2)
  • harmonic series goes to infinity, but VERY
    slowly
  • for example 100-version programming gives
  • 4 MTTF of 1-version programming
  • Reduces variance
  • N-Version Programming Needs REPAIR
  • If a program fails, must reset its state from
    other programs.
  • gt programs have common data/state
    representation.
  • How does this work for Database Systems?
  • Operating Systems?
  • Network Systems?
  • Answer I dont know.

26
Why Process Pairs Mask FaultsMany Software
Faults are Soft
  • After Design Review
  • Code Inspection
  • Alpha Test
  • Beta Test
  • 10k Hrs Of Gamma Test (Production)
  • Most Software Faults Are Transient
  • MVS Functional Recovery Routines 51
  • Tandem Spooler 1001
  • Adams gt1001
  • Terminology
  • Heisenbug Works On Retry
  • Bohrbug Faults Again On Retry
  • Adams "Optimizing Preventative Service of
    Software Products", IBM J RD,28.1,1984
  • Gray "Why Do Computers Stop", Tandem TR85.7,
    1985
  • Mourad "The Reliability of the IBM/XA Operating
    System", 15 ISFTCS, 1985.

27
Heisenbugs A Probabilistic Approach to
Availability
  • There is considerable evidence that (1)
    production systems have about one bug per
    thousand lines of code (2) these bugs manifest
    themselves in stochastically failures are due
    to confluence of rare events, (3) system
    mean-time-to-failure has a lower bound of a
    decade or so. To make highly available
    systems, architects must tolerate these failures
    by providing instant repair (un-availability is
    approximated by repair_time/time_to_fail so
    cutting the repair time in half makes things
    twice as good. Ultimately, one builds a set of
    standby servers which have both design diversity
    and geographic diversity. This minimizes
    common-mode failures.

28
Process Pair Repair Strategy
  • If software fault (bug) is a Bohrbug, then there
    is no repair
  • wait for the next release or
  • get an emergency bug fix or
  • get a new vendor
  • If software fault is a Heisenbug, then repair
    is
  • reboot and retry or
  • switch to backup process (instant restart)
  • PROCESS PAIRS Tolerate Hardware Faults
  • Heisenbugs
  • Repair time is seconds, could be mili-seconds if
    time is critical
  • Flavors Of Process Pair Lockstep
  • Automatic
  • State Checkpointing
  • Delta Checkpointing
  • Persistent

29
How Takeover Masks Failures
  • Server Resets At Takeover But What About
    Application State?
  • Database State?
  • Network State?
  • Answer Use Transactions To Reset State!
  • Abort Transaction If Process Fails.
  • Keeps Network "Up"
  • Keeps System "Up"
  • Reprocesses Some Transactions On Failure

30
PROCESS PAIRS - SUMMARY
  • Transactions Give Reliability
  • Process Pairs Give Availability
  • Process Pairs Are Expensive Hard To Program
  • Transactions Persistent Process Pairs
  • gt Fault Tolerant Sessions E
    xecution
  • When Tandem Converted To This Style
  • Saved 3x Messages
  • Saved 5x Message Bytes
  • Made Programming Easier

31
SYSTEM PAIRSFOR HIGH AVAILABILITY
Primary
Backup
  • Programs, Data, Processes Replicated at two
    sites.
  • Pair looks like a single system.
  • System becomes logical concept
  • Like Process Pairs System Pairs.
  • Backup receives transaction log (spooled if
    backup down).
  • If primary fails or operator Switches, backup
    offers service.

32
SYSTEM PAIR CONFIGURATION OPTIONS
Backup
Primary
  • Mutual Backup
  • each has1/2 of Database Application
  • Hub
  • One site acts as backup for many others
  • In General can be any directed graph
  • Stale replicas Lazy replication

Primary
Primary
Primary
Backup
Backup
Primary
Copy
Copy
Copy
33
SYSTEM PAIRS FOR SOFTWARE MAINTENANCE








(
B
a
c
k
u
p
)
(
B
a
c
k
u
p
)
(
P
r
i
m
a
r
y
)
(
P
r
i
m
a
r
y
)
V
1
V
1
V
1
V
2
S
t
e
p

1


B
o
t
h

s
y
s
t
e
m
s

a
r
e

r
u
n
n
i
n
g

V
1
.
S
t
e
p

2


B
a
c
k
u
p

i
s

c
o
l
d
-
l
o
a
d
e
d

a
s

V
2
.








(
P
r
i
m
a
r
y
)
(
P
r
i
m
a
r
y
)
(
B
a
c
k
u
p
)
(
B
a
c
k
u
p
)
V
1
V
2
V
2
V
2
S
t
e
p

4


B
a
c
k
u
p

i
s

c
o
l
d
-
l
o
a
d
e
d

a
s

V
2

D
3
0
.
S
t
e
p

3


S
W
I
T
C
H

t
o

B
a
c
k
u
p
.
  • Similar ideas apply to
  • Database Reorganization
  • Hardware modification (e.g. add discs,
    processors,...)
  • Hardware maintenance
  • Environmental changes (rewire, new air
    conditioning)
  • Move primary or backup to new location.

34
SYSTEM PAIR BENEFITS
  • Protects against ENVIRONMENT weather
  • utilities
  • sabotage
  • Protects against OPERATOR FAILURE
  • two sites, two sets of operators
  • Protects against MAINTENANCE OUTAGES
  • work on backup
  • software/hardware install/upgrade/move...
  • Protects against HARDWARE FAILURES
  • backup takes over
  • Protects against TRANSIENT SOFTWARE ERRORR
  • Allows design diversity
  • different sites have different software/hardware)

35
Key Idea



  • Architecture Hardware Faults
  • Software Masks Environmental Faults
  • Distribution Maintenance
  • Software automates / eliminates operators
  • So,
  • In the limit there are only software design
    faults. Many are HeisenbugsSoftware-fault
    tolerance is the key to dependability.
    INVENT IT!

36
References
  • Adams, E. (1984). Optimizing Preventative
    Service of Software Products. IBM Journal of
    Research and Development. 28(1) 2-14.0
  • Anderson, T. and B. Randell. (1979). Computing
    Systems Reliability.
  • Garcia-Molina, H. and C. A. Polyzois. (1990).
    Issues in Disaster Recovery. 35th IEEE Compcon
    90. 573-577.
  • Gray, J. (1986). Why Do Computers Stop and What
    Can We Do About It. 5th Symposium on Reliability
    in Distributed Software and Database Systems.
    3-12.
  • Gray, J. (1990). A Census of Tandem System
    Availability between 1985 and 1990. IEEE
    Transactions on Reliability. 39(4) 409-418.
  • Gray, J. N., Reuter, A. (1993). Transaction
    Processing Concepts and Techniques. San Mateo,
    Morgan Kaufmann.
  • Lampson, B. W. (1981). Atomic Transactions.
    Distributed Systems -- Architecture and
    Implementation An Advanced Course. ACM,
    Springer-Verlag.
  • Laprie, J. C. (1985). Dependable Computing and
    Fault Tolerance Concepts and Terminology. 15th
    FTCS. 2-11.
  • Long, D.D., J. L. Carroll, and C.J. Park (1991).
    A study of the reliability of Internet sites.
    Proc 10th Symposium on Reliable Distributed
    Systems, pp. 177-186, Pisa, September 1991.
  • Darrell Long, Andrew Muir and Richard Golding,
    A Longitudinal Study of Internet Host
    Reliability,'' Proceedings of the Symposium on
    Reliable Distributed Systems, Bad Neuenahr,
    Germany IEEE, September 1995, pp. 2-9

37
(No Transcript)
38
Scaleable Replicated Databases
  • Jim Gray (Microsoft)
  • Pat Helland (Microsoft)
  • Dennis Shasha (Columbia)
  • Pat ONeil (U.Mass)

39
Outline
  • Replication strategies
  • Lazy and Eager
  • Master and Group
  • How centralized databases scale
  • deadlocks rise non-linearly with
  • transaction size
  • concurrency
  • Replication systems are unstable on scaleup
  • A possible solution

40
Scaleup, Replication, Partition
  • N2 more work

41
Why Replicate Databases?
  • Give users a local copy for
  • Performance
  • Availability
  • Mobility (they are disconnected)
  • But... What if they update it?
  • Must propagate updates to other copies

42
Propagation Strategies
  • Eager Send update right away
  • (part of same transaction)
  • N times larger transactions
  • Lazy Send update asynchronously
  • separate transaction
  • N times more transactions
  • Either way
  • N times more updates per second per node
  • N2 times more work overall

43
Update Control Strategies
  • Master
  • Each object has a master node
  • All updates start with the master
  • Broadcast to the subscribers
  • Group
  • Object can be updated by anyone
  • Update broadcast to all others
  • Everyone wants Lazy Group
  • update anywhere, anytime, anyway

44
Quiz Questions Name One
  • Eager
  • Master N-Plexed disks
  • Group ?
  • Lazy
  • Master Bibles, Bank accounts, SQLserver
  • Group Name servers, Oracle, Access...
  • Note Lazy contradicts Serializable
  • If two lazy updates collide, then ... reconcile
  • discard one transaction (or use some other rule)
  • Ask for human advice
  • Meanwhile, nodes disagree gt
  • Network DB state diverges System Delusion

45
Anecdotal Evidence
  • Update Anywhere systems are attractive
  • Products offer the feature
  • It demos well
  • But when it scales up
  • Reconciliations start to cascade
  • Database drifts out of sync (System Delusion)
  • Whats going on?

46
Outline
  • Replication strategies
  • Lazy and Eager
  • Master and Group
  • How centralized databases scale
  • deadlocks rise non-linearly
  • Replication is unstable on scaleup
  • A possible solution

47
Simple Model of Waits
DBsize records
  • TPS transactions per second
  • Each
  • Picks Actions records uniformly from set of
    DBsize records
  • Then commits
  • About Transactions x Actions/2 resources locked
  • Chance a request waits is
  • Action rate is TPS x Actions
  • Active Transactions TPS x Actions x Action_Time
  • Wait Rate Action rate x Chance a request waits
  • 10x more transactions, 100x more waits

TransctionsxActions 2
Transactions x Actions 2 x DB_size
TPS2 x Actions3 x Action_Time 2 x DB_size
48
Simple Model of Deadlocks
  • A deadlock is a wait cycle
  • Cycle of length 2
  • Wait rate x Chance Waitee waits for waiter
  • Wait rate x (P(wait) / Transactions)
  • Cycles of length 3 are PW3, so ignored.
  • 10x bigger trans 100,000x more deadlocks

TPS x Actions3x Action_Time 2 x DB_size TPS x
Actions x Action_Time
TPS2 x Actions3 x Action_Time 2 x DB_size
TPS2 x Actions5 x Action_Time 4 x DB_size2
49
Summary So Far
  • Even centralized systems unstable
  • Waits
  • Square of concurrency
  • 3rd power of transaction size
  • Deadlock rate
  • Square of concurrency
  • 5th power of transaction size

Trans Size
Concurrency
50
Outline
  • Replication strategies
  • How centralized databases scale
  • Replication is unstable on scaleup
  • Eager (master group)
  • Lazy (master group disconnected)
  • A possible solution

51
Eager Transactions are FAT
  • If N nodes, eager transaction is Nx bigger
  • Takes Nx longer
  • 10x nodes, 1,000x deadlocks
  • (derivation in paper)
  • Master slightly better than group
  • Good news
  • Eager transactions only deadlock
  • No need for reconciliation

52
Lazy Master Group
Write A
New Timestamp
Write B
Write C
Commit
Write A
  • Use optimistic concurrency control
  • Keep transaction timestamp with record
  • Updates carry oldnew timestamp
  • If record has old timestamp
  • set value to new value
  • set timestamp to new timestamp
  • If record does not match old timestamp
  • reject lazy transaction
  • Not SNAPSHOT isolation (stale reads)
  • Reconciliation
  • Some nodes are updated
  • Some nodes are being reconciled

Write A
Write B
Write B
Write C
Write C
Commit
Commit
53
Reconciliation
  • Reconciliation means System Delusion
  • Data inconsistent with itself and reality
  • How frequent is it?
  • Lazy transactions are not fat
  • but N times as many
  • Eager waits become Lazy reconciliations
  • Rate is
  • Assuming everyone is connected

TPS2 x (Actions x Nodes)3 x Action_Time 2 x
DB_size
54
Eager Lazy Disconnected
  • Suppose mobile nodes disconnected for a day
  • When reconnect
  • get all incoming updates
  • send all delayed updates
  • Incoming is Nodes x TPS x Actions x
    disconnect_time
  • Outgoing is TPS x Actions x Disconnect_Time
  • Conflicts are intersection of these two sets

Action_Time
Action_Time
Disconnect_Time x (TPS xActions x Nodes)2 DB_size
55
Outline
  • Replication strategies (lazy eager, master
    group)
  • How centralized databases scale
  • Replication is unstable on scaleup
  • A possible solution
  • Two-tier architecture Mobile Base nodes
  • Base nodes master objects
  • Tentative transactions at mobile nodes
  • Transactions must be commutative
  • Re-apply transactions on reconnect
  • Transactions may be rejected

56
Safe Approach
  • Each object mastered at a node
  • Update Transactions only read and write master
    items
  • Lazy replication to other nodes
  • Allow reads of stale data (on user request)
  • PROBLEMS
  • doesnt support mobile users
  • deadlocks explode with scaleup
  • ?? How do banks work???

57
Two Tier Replication
  • Two kinds of nodes
  • Base nodes always connected, always up
  • Mobile nodes occasionally connected
  • Data mastered at base nodes
  • Mobile nodes
  • have stale copies
  • make tentative updates

58
Mobile Node Makes Tentative Updates
  • Updates local database while disconnected
  • Saves transactions
  • When Mobile node reconnects Tentative
    transactions re-done as Eager-Master (at
    original time??)
  • Some may be rejected
  • (replaces reconciliation)
  • No System Delusion.

59
Tentative Transactions
  • Must be commutative with others
  • Debit 50 rather than Change 150 to 100.
  • Must have acceptance criteria
  • Account balance is positive
  • Ship date no later than quoted
  • Price is no greater than quoted

Transactions From Others
Tentative Transactions at local DB
send Tentative Xacts
Updates Rejects
60
Refinement Mobile Node Can Master Some Data
  • Mobile node can master private data
  • Only mobile node updates this data
  • Others only read that data
  • Examples
  • Orders generated by salesman
  • Mail generated by user
  • Documents generated by Notes user.

61
Virtue of 2-Tier Approach
  • Allows mobile operation
  • No system delusion
  • Rejects detected at reconnect (know right away)
  • If commutativity works,
  • No reconciliations
  • Even though work rises as (Mobile Base)2

62
Outline
  • Replication strategies (lazy eager, master
    group)
  • How centralized databases scale
  • Replication is unstable on scaleup
  • A possible solution (two-tier architecture)
  • Tentative transactions at mobile nodes
  • Re-apply transactions on reconnect
  • Transactions may be rejected reconciled
  • Avoids system delusion
Write a Comment
User Comments (0)
About PowerShow.com