157337 Database Development

About This Presentation

Title:

157337 Database Development

Description:

Such a transaction is rolled back or undone. Transaction Support Example ... commit, it will not have modified database and so no undoing of changes required. ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 63

Provided by: tonyric

Category:

more less

Transcript and Presenter's Notes

Title: 157337 Database Development

1
157337Database Development
3C13/D6

Week 9
Transaction Management

2
Topics

Transaction Support
Properties of transactions
Database architecture
Concurrency Control
Needs
Serializability and Recoverability
Locking Methods
Deadlock
Timestamping methods
Multiversion timestamp ordering
Optimistic techniques
Granularity of data items

Database Recovery
Needs
Transaction and Recovery
Recovery facilities
Advanced Transaction Models
- Nested Transaction Model
- Sagas
- Multilevel Transaction Model
- Dynamic Restructuring
- Workflow Models

3
Transaction Support

Transaction Action, or series of actions,
carried out by user or application, which
accesses or changes contents of database.
A Logical unit of work on the database.
An application program can be thought of as a
series of transactions with non-database
processing in between.
Can have one of two outcome
Success - transaction commits and database
reaches a new consistent state.
Failure - transaction aborts, and database must
be restored to consistent state before it
started. Such a transaction is rolled back or
undone.

4
Transaction Support Example
A transaction must transform the database from
one consistent state to another. In the example
transaction (b) all elements in the database
related to the member of staff who has left must
be allocated a new staff number, otherwise the
database will be in an inconsistent state (using
a staff number that doesnt exist).
5
Transaction Support

Committed transactions cannot be aborted. If it
was a mistake reverse with a compensating
transaction
An aborted transaction that is rolled back can be
restarted later.
DBMS has no inherent way of knowing which updates
are grouped together to form a logical
transaction.
usually use keywords BEGIN, TRANSACTION, COMMIT,
ROLLBACK to delimit transactions.
or DBMS treats entire program as a single
transaction

6
Transaction Support
State transition diagram for a Transaction.

FAILED if transaction cannot be committed or if
it is aborted while in the ACTIVE state (perhaps
the user or the concurrency control protocol
aborted to ensure serializability)

PARTIALLY COMMITTED after final statement
executed, it may be found that the transaction
has violated serializability(see later) or
integrity constraints, and would need to be
aborted.

7
ACIDTransaction properties
4.2 Transaction Support

Atomicity All or nothing
Responsibility of recovery subsystem.
Consistency must transform the database between
consistent states.
Responsibility of DBMS and application developers
Isolation transactions execute independently of
one another.
Responsibility of concurrency control subsystems
Durability effects of a committed transaction are
not lost in a failure.
Responsibility of recovery subsystem.

8
Database Architecture

Transaction manager coordinates transactions on
behalf of application programs. Communicates with
scheduler.
Scheduler (or Lock manager) implements
particular strategies or concurrency controls.
Maximizes concurrency and isolation to ensure
integrity and consistency.
Recovery manager ensures database restoration
after failure when failure occurs during a
transaction.
Buffer manager responsible for transfer of data
between disk storage and main memory.

9
Concurrency controlNeeds

Concurrency control Process of managing
simultaneous operations on the
database without having them interfere with one
another.
Main objective of databases is to allow many
users to share data concurrently.
When one user updates a multi-user database this
could cause inconsistencies for others accessing
same data.
Example system that handles input/output (I/O)
operations independently, while performing CPU
operations.
Interleaved executes CPU transactions until an
I/O is reached, then suspends 1st transaction
and executes second transaction. When 2nd
transaction reaches I/O operation, control
returns to the 1st transaction at the point
where it was suspended. Achieves concurrent
execution
Throughput the amount of work accomplished in a
given time interval, is improved as CPU executes
other transactions instead of waiting in an idle
state for I/O operations to complete.
Interleaving may produce an incorrect result,
compromising integrity and consistency.

10
Concurrency controlNeeds

Three potential problems with consistency
1. Lost update problem (T1 overwrites T2)
Solution prevent T1 from accessing data till T2
updates data.
2. Uncommitted dependency problem (T1 sees
intermediate results of T2)
Solution prevent T1 reading data till T2 commits
or aborts
3. Inconsistent analysis problem dirty read or
unrepeatable read (T2 sees data which T1 has not
yet updated)
Solution prevent T2 reading data till T1 has
finished updating.

11
Lost update problem
12
Uncommitted dependency problem
13
Inconsistent analysis problem
14
Inconsistent analysis problem contd.
15
Concurrency controlSerializability

Objective of a concurrency control protocol is to
schedule transactions in such a way as to avoid
interference. Could run transactions serially
limits parallelism, some can execute together
consistently.
Schedule Sequence of reads/writes by set of
concurrent transactions.
Serial Schedule operations are executed
consecutively without interleaved operations from
other transactions. No guarantee that results of
all serial executions of a given set of
transactions will be identical.
Non-serial Schedule operations from set of
concurrent transactions are interleaved.
Objective of serializability is to find
non-serial schedules that allow concurrent
execution without interference.
Serializability identifies executions guaranteed
to ensure consistency.

16
Concurrency controlSerializability

In serializability, ordering of read/writes is
important
(a) If two transactions only read a data item,
they do not conflict and order is not important.
(b) If two transactions either read or write
completely separate data items, they do not
conflict and order is not important.
(c) If one transaction writes a data item and
another reads or writes same data item, order of
execution is important.
Under constrained write rule (transaction updates
data item based on its old value, which is first
read), use precedence graph to test for
serializability.
There are two definitions of equivalence (and
thus of serializability)

17
Concurrency controlSerializability

VIEW EQUIVALENCE / SERIALIZABILITY
View equivalence if two schedules S1, S2 cause
all transactions Ti to read the same values and
make the same final writes, then S1 and S2 are
view-equivalent
View serializability S is view-equivalent to a
serial schedule
CONFLICT EQUIVALENCE / SERIALIZABILITY
Two operations conflict if they are issued by
different transactions, operate on the same data
item, and one of them is a write operation
Conflict equivalence all conflicting operations
have the same order
Conflict serializability S is conflict-equivalent
to a serial schedule

18
Concurrency controlSerializability

If precedence graph contains a cycle, the
schedule is not conflict serializable.
In practice DBMS doesnt test serializability, it
uses protocols (to be discussed).
Recoverability Effect a schedule where, for each
pair of transactions Ti and Tj, if Tj reads a
data item previously written by Ti, then the
commit operation of Ti precedes the commit
operation of Tj.

19
Locking methods

Locking Transaction uses locks to deny access to
other transactions and so prevent incorrect
updates.
Most widely used approach to ensure
serializability.
Generally, a transaction must claim a shared
(read) or exclusive (write) lock on a data item
before read or write.
Lock prevents another transaction from modifying
item or even reading it.
Shared Lock if a transaction has a shared lock
on an item it can read the item but not update
it.
Exclusive Lock if a transaction has an exclusive
lock on a data item it can both read and update
the item.

20
Locking methods2-Phase locking (2PL)

2PL Transaction follows 2PL protocol if all
locking operations precede the first unlock
operation in the transaction.
Two phases for transaction
Growing phase - acquires all locks but cannot
release any locks.
Shrinking phase - releases locks but cannot
acquire any new locks.
Rules
transaction must acquire a lock on an item before
operating on it.
Once a transaction releases a lock it can never
acquire any new locks.
upgrading of locks can only take place in the
growing phase.
downgrading can only take place during the
shrinking phase.
2PL can prevent the lost update problem etc

21
Deadlock

Deadlock An impasse that may result when two (or
more)
transactions are each waiting for locks held by
the other to be released.
Once it occurs, the applications involved cannot
resolve the problem, the DBMS has to recognize it
and break it
Only one way to break a deadlock Abort one or
more of the transactions or applications
involved.

Three general techniques for handling deadlock
Timeouts lock request only waits for certain
amount of time, after which transaction is
aborted and restarted. Very simple and
practical.
Deadlock prevention Order transactions using
timestamps. Wait-Die or wound-wait algorithms.
Deadlock detection and recovery DBMS allows
deadlock to occur but recognizes it and breaks
it. construct wait-for graph (WFG).

22
Recovery from Deadlock Detection

Several issues
choice of deadlock victim choice of which
transaction to abort may not be clear. Abort the
T that incurs minimal cost. Consider
how long T has been running
how many data items have been updated by the T
How many data items T still has left to update
how far to roll a transaction back undoing all
changes T made is simplest solution, not
necessarily most efficient. May be possible to
resolve deadlock whilst only partly rolling back.
avoiding starvation starvation is when the same
T is always chosen as the victim (similar to
livelock). Avoid by storing a counter for number
of times T has been selected.

23
Timestamping Methods

Timestamp A unique identifier created by DBMS
that indicates relative starting time of a
transaction.
Read/write proceeds only if last update on that
data item was carried out by an older
transaction.
Otherwise, transaction requesting read/write is
restarted and given a new timestamp.
Also timestamps for data items
read-timestamp - timestamp of last transaction to
read item
write-timestamp - timestamp of last transaction
to write item

24
Basic timestamp ordering

Consider a transaction T with timestamp ts(T)
T wants to read x
If ts(T) lt write_timestamp(x)
x already updated by younger (later) transaction.
Transaction must be aborted and restarted with a
new timestamp
Otherwise proceed and set read_timestamp(x)
max(ts(T), read_timestamp(x))

25
Basic timestamp ordering

Consider a transaction T with timestamp ts(T)
T wants to write x
If ts(T) lt read_timestamp(x)
x already read by younger transaction.
Roll back transaction and restart it using a
later timestamp.
If ts(T) lt write_timestamp(x)
x already written by younger transaction.
Write can safely be ignored ignore obsolete
write rule.
Otherwise proceed and set write_timestamp(x)
max(ts(T), write_timestamp(x))
Guarantees transactions are conflict
serializable.
Does not guarantee recoverable schedules.

26
Multiversion timestamp ordering

Versioning of data can be used to increase
concurrency.
Basic timestamp ordering protocol assumes only
one version of data item exists.
Can allow multiple transactions to read and write
different versions of same data item.
In multiversion concurrency control, each write
operation creates new version of data item while
retaining old version.
When transaction attempts to read data item,
system selects one version that ensures
serializability.
Versions can be deleted once they are no longer
required.

27
Optimistic Techniques

Optimistic techniques assume that conflict is
rare. No locking required, so greater
concurrency.
At commit, check is made to determine whether
conflict has occurred.
If there is a conflict, transaction must be
rolled back and restarted.

28
Optimistic Techniques

Three phases
1. Read
Extends from start until immediately before
commit.
Values from database stored in local variables.
Updates are applied to a local copy of the data.
2. Validation
For read-only transaction, checks that data read
are still current values. If no interference
commit else abort and restart.
For update transaction, checks transaction leaves
database in a consistent state, with
serializability maintained.
3. Write
Follows successful validation phase for update
transactions.
Updates made to local copy are applied to the
database.

29
Granularity of Data

Granularity size of data items chosen as unit of
protection (i.e. for locking) by concurrency
control protocol.
Ranging from coarse to fine
The entire database.
A file.
A page (or area or database spaced).
A record.
A field value of a record.
Granularity of data item that can be locked in a
single operation has significant effect on
concurrency performance.

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Hierarchy of granularity

Could represent granularity of locks in a
hierarchical structure.
Root node represents entire database, level 1s
represent files, etc
When node is locked, all descendants are also
locked.
DBMS should check hierarchical path before
granting lock.
Intention lock could be used to lock all
ancestors of a node to be locked.
Intention locks can be read or write. Applied
top-down, released bottom-up.
To ensure serializability, 2PL used as follows
No lock can be granted once any node has been
unlocked
No node may be locked until its parent is locked
by an intention lock
No node may be unlocked until all its descendents
are unlocked

35
Database RecoveryFriday

Database Recovery The process of restoring the
database to the correct state in the event of a
failure.
Earlier we introduced the concept that DBMS needs
to provide a service to ensure consistency in
event of failure
How to provide this service?

36
Database RecoveryNeeds
Reliability increases downwards
Data storage includes 4 different types of media

Main memory Volatile.
Usually does not survive system crashes.

Primary storage

2. Magnetic disk Online non-volatile.
More reliable, cheaper. Slower by 3-4
orders of magnitude.
3. Magnetic tape Offline non-volatile.
More reliable, fairly inexpensive, slower,
only sequential access
4. Optical disk
More reliable, generally cheaper, faster,
providing random access.

Secondary Storage
37
Database RecoveryNeeds

Stable storage info has been replicated in
several non-volatile storage media (usually
disk). E.g. simulate stable storage with RAID.
CAUSES OF FAILURE
System crashes
Media failure
Application software errors
Natural physical disasters
Carelessness
Sabotage

2 PRINCIPLE EFFECTS
Loss of main memory
Loss of disk copy of database
How to minimize these
effects?

38
Transactions and Recovery

Transactions (T) represent basic unit of
recovery.
Role of Recovery Manager (RM) to ensure
atomicity and durability (ACID) in event of
failures.
If failure occurs between commit and database
buffers being flushed to secondary storage
to ensure durability RM redo (rollforward)
transactions updates.
If transaction had not committed at failure time
To ensure atomicity RM undo (rollback) any
effects of that transaction for atomicity.

Partial undo only one T to be undone Global
undo all active Ts to be undone
39
Transactions and Recovery
Example UNDO/ REDO
DBMS starts at time t0, but fails at time tf.
Assume data for transactions T2 and T3 have been
written to secondary storage.

T1 and T6 have to be undone. (not committed at
crash time)
In absence of any other information, RM has to
redo T2, T3, T4, and T5.

40
Transactions and Recovery

Buffer manager (BM)
- Plays important role in recovery process
responsible for efficient database buffer
management
used to transfer pages to and from secondary
storage.
Replacement strategies decide which buffer(s) to
force-write to disk to make space. E.g.
First-in-first-out (FIFO) and Last recently used
(LRU).
Steal policy allows BM to write a buffer to disk
before T commits. Steals page from T. Alternative
is no-steal.
Force policy ensures all pages updated by T are
immediately written to disk when it commits.
Alternative is no-force.
Simplest use no-steal, force policy (no undo if
T aborts, no redo if crash).
Most DBMSs use steal, no-force policy (smaller
buffer space fewer rewrites)

41
Recovery Facilities

DBMS should provide following facilities to
assist with recovery
Backup mechanism makes periodic backup copies of
database regularly without stopping system.
Can be complete or incremental (changes since
last complete only)
Stored on offline storage such as magnetic tape.
Logging facilities keeps track of current state
of transactions and database changes.
Checkpoint facility enables in progress updates
to database to be made permanent.
Recovery manager, which allows DBMS to restore
database to consistent state following a failure.

42
Recovery Facilities

Log File (or Journal)
Keeps track of database transactions, contains
info about all updates to database.
May contain
Transaction records
T identifier
Type of log record
Identifier of data item
Before-image of data
After-image of data
Log management info
Checkpoint records (described shortly)

43
Recovery Facilities

Log File (or Journal)

Example

Log duplexed or triplexed
Stored online on fast direct access storage
device
Old files transferred to offline storage if vast
amount of logging info
Log file potential bottleneck

44
Recovery Facilities

Checkpoint
Point of synchronization between database and
log file. All buffers are force-written to
secondary storage.
Scheduled at predetermined points.
Involve
Writing all log records in main memory to storage
Writing modified blocks in database buffers to
secondary storage
Writing a checkpoint record to the log file (all
active Ts at time of checkpoint)
Relatively inexpensive operation in terms of time
overhead

45
Recovery Facilities

Checkpoint

Example UNDO/ REDO With checkpointing
DBMS starts at time t0, but fails at time tf.
Checkpoint occurred at time tc

T1 and T6 have to be undone. (not commited at
crash time)
RM neednt redo T2, T3
RM has to redo T4, and T5.

46
Recovery Techniques

If database has been damaged
Need to restore last backup copy of database
reapply updates of committed transactions using
log file.
If database is only inconsistent
Need to undo changes that caused inconsistency.
May also need to redo some transactions to ensure
updates reach secondary storage.
Do not need backup, can restore database using
before-and-after- images in the log file

Three main recovery techniques
Deferred Update
Immediate Update
Shadow Paging

47
Deferred Update

Updates are not written to the database (or
buffers) until after a T has reached its commit
point.
If T fails before commit, it will not have
modified database and so no undoing of changes
required.
May be necessary to redo updates of committed Ts
as their effect may not have reached database.

48
Immediate Update

Updates are applied to database (via buffers) as
they occur.
May need to redo updates of committed Ts
following a failure.
May need to undo effects of Ts that had not
committed at time of failure.

Essential that log records are written before
write to database. (Write-ahead log protocol. )
If no transaction commit record in log, then
that T was active at failure and must be undone.
Undo operations are performed in reverse order to
that written.

49
Shadow Paging

Maintain two page tables during life of a
transaction current page and shadow page table.
When transaction starts, two pages the same.
Shadow page table never changed thereafter.
Used to restore database in event of failure.
During T, current page table records all updates
to database.
When transaction completes, current page table
becomes shadow page table.

50
Advanced Transaction Models

Protocols considered so far are suitable for
types of Ts that arise in traditional business
applications,
Advanced Ts characterized by
Data has many types, each with small number of
instances.
Designs may be very large.
Design is not static but evolves through time.
Updates are far-reaching.
Cooperative engineering.

Look at five advanced transaction
models -Nested Transaction Model - Dynamic
Restructuring -Sagas - Workflow
Models -Multi-level Transaction Model
51
Nested Transaction Model

Nested Transaction Model A T is viewed as a
collection of subtransactions (subT) which may
also contain any no. of subTs . (Moss 1981)
only leaf-level subTs allowed to perform
database operations.
Ts have to commit from bottom upwards.
However, T abort at one level does not have to
affect T in progress at higher level.
Advantages
Modularity - T can be decomposed into number of
subTs for purposes of concurrency and recovery.
Finer level of granularity for concurrency
control and recovery.
Intra-transaction parallelism.
Intra-transaction recovery control.

52
Nested Transaction Model

Parent allowed to perform its own recovery by
either
Retry subT.
Ignore failure, in which case subT non-vital.
Run contingency subT. (Alternative subT)
Abort.
Updates of committed subTs at intermediate
levels are visible only within scope of their
immediate parents.
Further, commit of subT is conditionally subject
to commit or abort of its superiors.
Using this model, top-level Ts conform to
traditional ACID properties of flat transaction.

53
Emulating Nested Ts using Savepoints

Savepoint An identifiable point in flat
transaction representing some partially
consistent state.
Can be used as restart point for transaction if
subsequent problem detected.
During execution of transaction, user can
establish savepoint, which user can use to roll
transaction back to.
Unlike nested transactions, savepoints do not
support any form of intra-transaction
parallelism.

54
Sagas

Sagas A sequence of (flat) Ts that can be
interleaved with other Ts. (Garcia-Molina
Salem 1987)

DBMS guarantees that either all Ts in saga are
successfully completed or
compensating Ts are run to undo partial
execution.
Saga has only one level of nesting.
For every subT defined, there is corresponding
compensating T that will semantically undo subTs
effect.
Relax property of isolation by allowing saga to
reveal its partial results to other concurrently
executing Ts before it completes.
Useful when subTs are relatively independent and
compensating Ts can be produced.
Difficult sometimes to define compensating T in
advance, so DBMS may need to interact with user
to determine compensation.

55
Multilevel Transaction Model

Closed nested transaction - atomicity enforced at
the top-level.
Open nested transactions - allow partial results
of subTs to be seen outside transaction. E.g.
SAGA.
multi-level transaction model where tree of
subTs is balanced (open nested transaction).
Nodes at same depth of tree correspond to
operations of same level of abstraction in DBMS.
Edges represent implementation of an operation by
sequence of operations at next lower level.
Traditional flat T ensures no conflicts at lowest
level (L0).
In multi-level model two operations at level Li
may not conflict even though their
implementations at next lower level Li-1 do.

Example
T7 T71, which increases balx by 5
T72, which subtracts 5 from baly
T8 T81, which increases baly by 10
T82, which subtracts 2 from balx
As addition and subtraction commute, can execute
these subTs in any order, and correct result
will always be generated.

56
Dynamic Restructuring

To address constraints imposed by ACID properties
of flat Ts,
two new operations proposed
1. split-transaction
- splits T into two serializable Ts and divides
its actions and resources (for example, locked
data items) between new Ts.
- Resulting Ts proceed independently.
- Allows partial results of transaction to be
shared, while preserving semantics.
- Can be applied only when possible to generate
2 serializable Ts

Conditions that permit T to be split into A and B
are
- AWriteSet ? BWriteSet ? BWriteLast.
(If both A and B write to same object, Bs
writes must follow As writes).
- AReadSet ? BWriteSet ?.
(A cannot see any results from B).
- BReadSet ? AWriteSet ShareSet.
(B may see results of A.)
Guarantee A serialized before B. If A aborts B
must also abort.
If BWriteLast and ShareSet ? then A and B can
be serialized in any order and both can be
committed independently.

57
Dynamic Restructuring

To address constraints imposed by ACID properties
of flat Ts,
two new operations proposed
2. Joint-transaction
- performs reverse operation, merging ongoing
work of two or more independent transactions, as
though they had always been single transaction.
Main advantages of dynamic restructuring are
Adaptive recovery.
Reducing isolation.

58
Workflow Models

Has been argued that previous models are still
not powerful to model some business activities.
More complex models proposed combinations of
open and nested transactions.
hardly conform to any of ACID properties so
called workflow model
Workflow activity involving coordinated
execution of multiple tasks performed by
different processing entities (people or software
systems).
Two general problems involved in workflow
systems
specification of the workflow,
execution of the workflow.
Both problems complicated by fact that many
organizations use multiple, independently managed
systems to automate different process parts

59
Concurrency Controls and Recovery in Oracle

Uses Multiversion Read Consistency protocol
guarantees user sees consistent view.
Oracle places no locks on read operations (read
never blocks a write)
Oracle implements levels 2 4 of the four SQL
Isolation Levels
Read uncommitted fast
Read committed
Repeatable read
Serializable (the highest level)
correct
uses row-level locking
waits if T tries to change a row updated by an
uncommitted T

Oracle refers to Relation as a table with columns
and rows.
60
Multiversion Read Consistency

Rollback segments store undo info. Also redo
logs.
System change number (SCN) logical timestamp,
records order in which operations occur. Stores
SCN in redo log. Used to determine
which version to use within a T.
when to clean out info from rollback segments.
Locks implicit locking-user never explicitly
locks resources can manually.
Default locking mechanisms lock data at lowest
level of restrictiveness
Guarantees integrity
Allows highest degree concurrency
Deadlock Detection automatically detects, rolls
back one statement involved.

61
Multiversion Read Consistency

Backup and Recovery
Recovery manager (RMAN) server-managed backup
recovery. Includes facilities to
Backup one or more datafiles to disk or tape
Backup archived redo logs to disk or tape
Restore datafiles from disk or tape
Restore and apply archived redo logs to perform
recovery
Instance recovery when Oracle instance is
restarted after failure detects crash from
headers and control file. Recovers from redo log
files using rollforward and rollback. Allows
checkpoints, set to 0 to disable.
Point-in-time recovery allows one or more table
spaces to be restored to particular point (SCN)
Standby database used in the event of primary
database failure. Kept at alternative location.
Can be opened for read-only access too.