Title: Database Replication Using Generalized Snapshot Isolation
1Database Replication Using Generalized Snapshot
Isolation
- Sameh Elnikety, EPFL
- Fernando Pedone, USI
- Willy Zwaenepoel, EPFL
2Snapshot Isolation (SI)
- Snapshot committed state of database
- On begin
- Snapshot(T) latest snapshot at start(T)
- On read or write operation
- T reads from and writes to its snapshot
- On commit
- Read-only T commits immediately
- Update T commits if no conflicting writes between
its start commit times
3Advantages of SI
- Read-only Ts never block or abort
- Read-only Ts never cause update Ts to block or
abort - Compare to 2PL
- No read-locks are used in SI
- Important for read-dominated workloads
4Drawbacks of SI
- Not serializable
- Permits certain anomalies
- But
- Anomalies are rare in practice
- Conditions on workload can identify and avoid
them - Developers use SI serializably
5Summary of SI
- SI is here to stay
- Used in several databases, e.g.,
- Oracle
- PostgreSQL
- Microsoft SQL Server ( 2PL SI )
- Borland InterBase
6 SI Replication
- Replicate SI to scale performance for dynamic
content Web servers - E.g., E-commerce, bulletin boards
- Workload is suitable for SI
- Read-only Ts dominate workload
- Update Ts are short few
- How to maintain SI properties?
7SI in Replicated Database
- On begin
- Snapshot(T) latest snapshot at start(T)
- On read or write operation
- T reads from and writes to its snapshot
- On commit
- Read-only T commits immediately
- Update T commits if no conflicting writes between
its start commit times
8Strict SI in Replicated Database
- On begin
- Snapshot(T) latest snapshot at start(T)
- On read or write operation
- T reads from and writes to its snapshot
- On commit
- Read-only T commits immediately
- Update T commits if no conflicting writes between
its start commit times
9Generalized Snapshot Isolation (GSI)
- On begin
- Snapshot(T) (latest) older snapshot
- At replica, use latest local snapshot
- On read or write operation
- T reads from and writes to its snapshot
- On commit
- Read-only T commits immediately
- Update T commits if no conflicting writes between
its (start) snapshot commit times
10Generalized Snapshot Isolation (GSI)
- On begin
- Snapshot(T) (latest) older snapshot
- At replica, use latest local snapshot
- On read or write operation
- T reads from and writes to its snapshot
- On commit
- Read-only T commits immediately
- Update T commits if no conflicting writes between
its (start) snapshot commit times
Certification for update T
11Advantages of GSI
- All Ts reads and writes are local
- Important for replicated databases
- Read-only Ts never block or abort
- Read-only Ts never cause update Ts to block or
abort - Important for read-dominated workloads
12A - GSI Serializability
- Not serializable
- Permits certain anomalies as in SI
- But
- Anomalies are rare in practice
- Two serializability conditions (in the paper)
- Static examine transaction templates
- Dynamic at run time
- Easy to verify workload is serializable
- Easy to modify workload to be serializable
13A - GSI Serializability
- Not serializable
- Permits certain anomalies as in SI
- But
- Anomalies are rare in practice
- Two serializability conditions (in the paper)
- Static examine transaction templates
- Dynamic at run time
- Easy to verify workload is serializable
- Easy to modify workload to be serializable
Similar to what many Oracle DBAs already do
14B - GSI Older Snapshots
- GSI uses older snapshots
- But
- Clear definition, always consistent data
- No new anomalies ( same as in SI )
- In replicated database
- Transparent db appears as running SI
- Efficient reads are non-blocking
- Staleness can be bounded
1- On begin Snapshot(T) (latest) older
snapshot
15C - GSI Abort Rates
3- On commit - Read-only T commits
immediately - Update T commits if no
conflicting writes between its (start)
snapshot commit times
- Potentially higher abort rate for updates
- But
- Abort rates are small in target workloads
- GSI Abort rates can be higher or lower
Certification for update T
16GSI in Replicated Databases
- System consists of
- Many SI replicas, full replication
- Centralized certifier ( distributed in the paper
) - A client connects to one replica
- Issues read and update transactions
- Algorithm implements an instance GSI
- Snapshot(T) latest local snapshot at replica
17Algorithm at Replica
- On begin
- Provide T with a local Snapshot
- Record T.version Snapshot.version
- On read or write operation
- Run transaction (reads/writes) locally
- Record T.writeset
- On commit
- IF ( T is read-only ) THEN commit
- ELSE Invoke certification ( T.version,
T.writeset ) . . .
18Algorithm at Certifier
- Check for conflicting writes from committed Ts
with larger version number - IF ( yes ) THEN Reply ( abort )
- ELSE Advance certifier-version
- Record (writeset, certifier-version) to
log Reply ( 1 - commit,
2 - certifier-version, 3 - missing
writesets )
19Algorithm at Replica (cont.)
- On begin
- . . .
- On read or write operation
- . . .
- On commit
- IF ( T is read-only ) THEN commit
- ELSE Invoke certification (T.version,
T.writeset ) 1- Apply missing writesets 2-
Commit locally 3- Advance local version
20Performance Tradeoff GSI SI
- GSI
- better response time
- SI
- fresher data (latest snapshot in the system)
- lower abort rate for updates (?)
- Analytical performance model
- Model used by Jim Gray
- Replicated database over WAN
21Analytical Model
- GSI
- Execute T immediately
- Updates are certified remotely (communication)
- SI
- Block T to obtain latest version (communication)
- Updates are certified remotely (communication)
- Objective is to compare GSI SI
- Response time
- Abort rate
22Analytical Equations
- Parameters
- x round trip delay / transaction length
-
- Response time ratio (GSI SI)
- Read-only update
23Analytical Equations
- Parameters
- x round trip delay / transaction length
- t snapshot age / transaction length
- Response time ratio (GSI SI)
- Read-only update
- Abort rate ratio (GSI SI)
- Read-only (never aborted!) update
24Analytical Results
- Parameters
- x round trip delay / transaction length
- t snapshot age / transaction length
- X-axis
- x round trip delay / transaction length
- x 0 ? centralized database
- x is increasing as technology advances
- Y-axis
- Response time ratio (for reads updates)
- Abort ratio (updates)
25Response Time Ratio of GSI SI
GSI is better
26Abort Ratio of GSI SI for Updates
SI better
GSI better
Parameter t ( snapshot age / transaction length
)
27Abort Ratio of GSI SI for Updates
t decreasing fresher snapshot
SI better
GSI better
Parameter t ( snapshot age / transaction length
)
28GSI SI - Summary
- GSI response times are better
- Read-only Ts ratio significantly better
- Update Ts ratio reaches ½
- GSI abort rate
- maybe higher or lower
- COST observing older data in GSI
- Favorable trade-off
- Distributed environments
- Read-dominated workloads
29Conclusions
- GSI is appealing for replication
- All Ts read write operations are local
- Read-only Ts never block or abort
- GSI can be made serializable
- Algorithm for GSI in replicated databases
- Analytical results are encouraging