Transactions and Wrap-Up - PowerPoint PPT Presentation

About This Presentation
Title:

Transactions and Wrap-Up

Description:

Sometimes we can tolerate dirty reads; other times we cannot: ... Forbids the reading of dirty (uncommitted) data, but allows a transaction T to ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 38
Provided by: zack4
Category:

less

Transcript and Presenter's Notes

Title: Transactions and Wrap-Up


1
Transactions and Wrap-Up
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 550 Database Information Systems
  • December 9, 2004

Some slide content derived from Ramakrishnan
Gehrke
2
Reminders
  • Please be sure youre signed up for a project
    demo
  • Due at that time 8-15 page report describing
  • What your project goals were
  • What you implemented
  • Basic architecture and design
  • Division of labor
  • And the code!
  • Also please email me an assessment of how well
    your group worked group members contributions
    your contributions
  • Final examination Dec. 17th, Meyerson Hall B3,
    830AM

3
Recall Good Executions
  • An execution is good if it is serial
    (transactions are executed atomically and
    consecutively) or serializable (i.e. equivalent
    to some serial execution)
  • Equivalent to executing Deposit 1 then 3, or vice
    versa

Deposit 1 Deposit
3 read(X.bal)
read(Y.bal) X.bal
X.bal 50
Y.bal Y.bal 10 write(X.bal)

write(Y.bal)
4
Atomicity
  • Problems can also occur if a crash occurs in the
    middle of executing a transaction
  • Need to guarantee that the write to X does not
    persist (ABORT)
  • Default assumption if a transaction doesnt commit

5
Transactions in SQL
  • A transaction begins when any SQL statement that
    queries the db begins.
  • To end a transaction, the user issues a COMMIT or
    ROLLBACK statement.

Transfer UPDATE Accounts SET balance
balance - 100 WHERE account 1234 UPDATE
Accounts SET balance balance 100 WHERE
account 5678 COMMIT
6
Read-Only vs. Read-Write Transactions
  • We can tell the DBMS that we wont be performing
    any updates (What does this allow the DBMS to
    do?)
  • If we are going to modify the DBMS, we need

SET TRANSACTION READ ONLY SELECT FROM
Accounts WHERE account1234
SET TRANSACTION READ WRITE UPDATE Accounts SET
balance balance - 100 WHERE account 1234
...
7
Dirty Reads
  • Dirty data is data written by an uncommitted
    transaction a dirty read is a read of dirty data
    (WR conflict)
  • Sometimes we can tolerate dirty reads other
    times we cannot
  • e.g., if we wished to ensure balances never went
    negative in the transfer example, we should test
    that there is enough money first!

8
Bad Dirty Read
EXEC SQL select balance into bal
from Accounts where
account1234 if (bal gt 100) EXEC SQL
update Accounts set balance
balance - 100 where account
1234 EXEC SQL update Accounts
set balance balance 100
where account 5678 EXEC SQL COMMIT
If the initial read (italics) were dirty, the
balance could become negative!
9
Acceptable Dirty Read
  • If we are just checking availability of an
    airline seat, a dirty read might be fine! (Why is
    that?)
  • Reservation transaction

EXEC SQL select occupied into occ
from Flights where Num
123 and date11-03-99
and seat23f if (!occ) EXEC SQL
update Flights set
occupiedtrue where Num 123 and
date11-03-99 and
seat23f else notify user that seat is
unavailable
10
Other Undesirable Phenomena
  • Unrepeatable read a transaction reads the same
    data item twice and gets different values (RW
    conflict)
  • Phantom problem a transaction retrieves a
    collection of tuples twice and sees different
    results

11
Phantom Problem Example
  • T1 find the students with best grades who Take
    either cis550-f03 or cis570-f04
  • T2 insert new entries for student 1234 in the
    Takes relation, with grade A for cis570-f04 and
    cis550-f03
  • Suppose that T1 consults all students in the
    Takes relation and finds the best grades for
    cis550-f03
  • Then T2 executes, inserting the new student at
    the end of the relation, perhaps on a page not
    seen by T1
  • T1 then completes, finding the students with best
    grades for cis570-f04 and now seeing student 1234

12
Isolation
  • The problems weve seen are all related to
    isolation
  • General rules of thumb w.r.t. isolation
  • Fully serializable isolation is more expensive
    than no isolation
  • We cant do as many things concurrently (or we
    have to undo them frequently)
  • For performance, we generally want to specify the
    most relaxed isolation level thats acceptable
  • Note that were slightly violating a
    correctness constraint to get performance!

13
Specifying Acceptable Isolation Levels
  • The default isolation level is SERIALIZABLE (as
    for the transfer example)
  • To signal to the system that a dirty read is
    acceptable,

SET TRANSACTION READ WRITE ISOLATION LEVEL READ
UNCOMMITTED
14
READ COMMITTED
  • Forbids the reading of dirty (uncommitted) data,
    but allows a transaction T to issue the same
    query several times and get different answers
  • No value written by T can be modified until T
    completes
  • For example, the Reservation example could also
    be READ COMMITTED the transaction could
    repeatably poll to see if the seat was available,
    hoping for a cancellation

15
REPEATABLE READ
  • What it is NOT a guarantee that the same query
    will get the same answer!
  • However, if a tuple is retrieved once it will be
    retrieved again if the query is repeated
  • For example, suppose Reservation were modified to
    retrieve all available seats
  • If a tuple were retrieved once, it would be
    retrieved again (but additional seats may also
    become available)

16
Summary of Isolation Levels
Level Dirty Read
Unrepeatable Read Phantoms READ
UN- Maybe Maybe Maybe COMMITTED READ
No Maybe Maybe COMMITTED REPEATABLE No
No Maybe READ SERIALIZABLE No No No
17
Implementing Isolation Levels
  • One approach use locking at some level
  • each data item is either locked (in some mode,
    e.g. shared or exclusive) or is available (no
    lock)
  • an action on a data item can be executed if the
    transaction holds an appropriate lock
  • consider granularity of locks how big of an
    item to lock
  • Larger granularity fewer locking operations but
    more contention!
  • tuple, page, table, etc.
  • Appropriate locks
  • Before a read, a shared lock must be acquired
  • Before a write, an exclusive lock must be acquired

18
Lock Compatibility Matrix
  • Locks on a data item are granted based on a lock
    compatibility matrix
  • When a transaction requests a lock, it must wait
    (block) until the lock is granted

19
Locks Prevent Bad Execution
  • If the system used locking, the first bad
    execution could have been avoided

Deposit 1 Deposit
2 xlock(X) read(X.bal)
xlock(X)
is not granted X.bal X.bal 50
write(X.bal) release(X)
xlock(X)
read(X.bal)
X.bal X.bal 10

write(X.bal)
release(X)
20
Lock Types and Read/Write Modes
  • When we specify read-only, the system only uses
    shared-mode locks
  • Any transaction that attempts to update will be
    illegal
  • When we specify read-write, the system may also
    acquire locks in exclusive mode
  • Obviously, we can still query in this mode

21
Isolation Levels and Locking
  • Always update with exclusive lock held to end of
    transaction
  • READ UNCOMMITTED
  • read data without acquiring any lock
  • READ COMMITTED
  • read data lock all tuples read, immediately
    release locks
  • REPEATABLE READ
  • read data grab shared lock on all tuples read,
    hold to end of transaction
  • SERIALIZABLE
  • read data grab shared lock on all tuples read
    and the index, hold to end of transaction
  • Holding locks to the end of a transaction is
    called strict locking

22
Theory of Serializability
  • A schedule of a set of transactions is a linear
    ordering of their actions
  • e.g. for the simultaneous deposits example
  • R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)
  • A serial schedule is one in which all the steps
    of each transaction occur consecutively
  • A serializable schedule is one which is
    equivalent to some serial schedule (i.e. given
    any initial state, the final state is the same as
    one produced by some serial schedule)
  • The example above is neither serial nor
    serializable

23
Questions to Address
  • Given a schedule S, is it serializable?
  • How can we "restrict" transactions in progress to
    guarantee that only serializable schedules are
    produced?

24
When Actions Conflict
  • Consider a schedule S in which there are two
    consecutive actions Ii and Ij of transactions Ti
    and Tj respectively
  • If Ii and Ij refer to different data items, then
    swapping Ii and Ij does not matter
  • If Ii and Ij refer to the same data item Q, then
    swapping Ii and Ij matters if and only if one of
    the actions is a write
  • Ri(Q) Wj(Q) produces a different final value for
    Q than Wj(Q) Ri(Q)

25
Testing for Serializability
  • Given a schedule S, we can construct a di-graph
    G(V,E) called a precedence graph
  • V all transactions in S
  • E Ti ? Tj whenever an action of Ti precedes
    and conflicts with an action of Tj in S
  • Theorem A schedule S is conflict serializable
    if and only if its precedence graph contains no
    cycles
  • Note that testing for a cycle in a digraph can be
    done in time O(V2)

26
An Example
27
Locking and Serializability
  • We said that a transaction must hold all locks
    until it terminates (a condition called strict
    locking)
  • It turns out that this is crucial to guarantee
    serializability
  • Note that the first (bad) example could have been
    produced if transactions acquired and immediately
    released locks.

28
Well-Formed, Two-Phased Transactions
  • A transaction is well-formed if it acquires at
    least a shared lock on Q before reading Q or an
    exclusive lock on Q before writing Q and doesnt
    release the lock until the action is performed
  • Locks are also released by the end of the
    transaction
  • A transaction is two-phased if it never acquires
    a lock after unlocking one
  • i.e., there are two phases a growing phase in
    which the transaction acquires locks, and a
    shrinking phase in which locks are released

29
Two-Phased Locking Theorem
  • If all transactions are well-formed and
    two-phase, then any schedule in which conflicting
    locks are never granted ensures serializability
  • i.e., there is a very simple scheduler!
  • However, if some transaction is not well-formed
    or two-phase, then there is some schedule in
    which conflicting locks are never granted but
    which fails to be serializable
  • i.e., one bad apple spoils the bunch

30
Summary
  • Transactions are all-or-nothing units of work
    guaranteed despite concurrency or failures in the
    system
  • Theoretically, the correct execution of
    transactions is serializable (i.e. equivalent to
    some serial execution)
  • Practically, this may adversely affect throughput
    ? isolation levels
  • With isolation levels, users can specify the
    level of incorrectness they are willing to
    tolerate

31
What to Look for Down the Road
  • well, no one really knows the answer to this
  • But here are some hints, ideas, and hot
    directions
  • Sensors and streaming data
  • Peer-to-peer meets databases and data integration
  • The Semantic Web

32
Sensors and Streaming Data
  • No databases at all
  • Instead we have networks of simple sensors
  • queries are in SQL
  • data is live and streaming
  • we compute aggregates over windows

33
Whats Interesting Here
  • Were not talking about data on disk were
    talking about queries over current readings
  • Sensors are generally stupid and may be
    battery-operated
  • A lot of challenges are networking-related how
    to aggregate data before it gets sent, etc.
  • Future challenges what happens when we have
    lots of different kinds of sensors

34
Peer-to-Peer Computing
  • Fundamentally, our model of DBMSs tends to be
    centralized
  • Even for data integration theres a single
    mediator
  • What can be gained from borrowing a page from
    peer-to-peer systems like Napster, Kazaa, etc.?
  • A better architecture?
  • Solutions to many problems unsolved by
    distributed DBMSs?
  • Replication, object location, distributed
    optimization, resiliency to failure,
  • New types of applications, e.g., in integration?
  • Can we exchange data between databases in some
    controlled way?
  • e.g., share parts of your Palm contact list with
    someone elses cell phone contact list

35
The Semantic Web
  • In some ways, a very pie-in-the-sky vision
  • A World Wide Web thats machine-understandable
  • The ultimate automated librarian
  • But some real and concrete problems might be
    partly solvable
  • Goal is really very similar to data integration,
    where somehow we have mappings between the
    schemas
  • Need
  • Languages for not only describing relationships,
    but transformations between formats (e.g., XML
    schemas)
  • Automatic or partly automated ways of discovering
    mappings and correspondences
  • These are all database problems, and the solution
    likely must come from the DB community

36
My Take on the Future of Data Management
  • Weve evolved from a world where data management
    is about controlling the data
  • Instead, data management is about translating and
    transforming data using declarative languages
  • It should ultimately become much like TCP or SOAP
    a set of standard services for getting stuff
    from one point to another, or from one form to
    another
  • Its the plumbing that connects different
    applications using different formats

37
A Plug for Next Semester
  • CIS 650 focus is on techniques for constructing
    data management systems
  • Databases distributed databases P2P databases
    data integration middleware etc.
  • Well read many of the definitive papers in the
    DB field
  • Meanwhile Best of luck on your projects and
    exams and have a wonderful break
  • I hope you learned a lot in this course and that
    it at least for stretches was enjoyable!
Write a Comment
User Comments (0)
About PowerShow.com