Advanced DBI

1 / 54
About This Presentation
Title:

Advanced DBI

Description:

To do or to undo, that is the question. 21. Advanced DBI tutorial. by Tim Bunce. August 1999 ... Also removes burden of undoing incomplete changes ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 55
Provided by: timb175

less

Transcript and Presenter's Notes

Title: Advanced DBI


1
Advanced DBI
  • Perl Conference 3.0 Tutorial
  • August 1999
  • Tim Bunce

2
Topical Topics
  • Wheels within Wheels
  • Error Checking and Handling
  • Transactions
  • Speed Speed Speed!
  • Portability
  • The Power of the Proxy
  • DBI for the Web
  • Handling LONG/BLOB data
  • Tainting
  • Flexing the Multiplex

3
Wheels within Wheels
  • The DBI architecture
  • and how to watch it at work

4
Under the hood
  • DBI defines and implements an interface
  • Driver modules do much of the real work
  • DBI provides default methods, functions, tools
    etc for drivers
  • Not limited to the lowest common denominator -
    mechanism provided for driver specific extensions
  • Designed and built for speed
  • Valuable detailed call tracing/debugging built-in

5
A picture is worth?
Perl Application
DBI Module
DBDOther
DBDInformix
DBDOracle
Oracle Server
Informix Server
Other Server
6
Setting the scene
  • Inner and outer worlds
  • Application and Drivers
  • Inner and outer handles
  • DBI handles are references to tied hashes
  • The DBI Method Dispatcher
  • gateway between the worlds and the heart of the
    DBI
  • Now we'll go all deep and visual for a while...

7
Architecture of the DBI classes 1
outer
8
Architecture of the DBI classes 2
9
Anatomy of a DBI handle
Handle Ref.
10
Method call walk-through
  • Consider a simple prepare call
    dbh-gtprepare(...)
  • dbh is reference to a DBIdb object (regardless
    of driver)
  • DBIdbprepare is an alias for DBI dispatch
    method
  • Dispatch calls drivers prepare method something
    like this
  • my inner_hash_ref from tie magic
  • my implementor_class from DBI magic
    data
  • inner_hash_ref-gtimplementor_classprepare(..
    .)
  • Driver code gets the inner hash so it has fast
    access to the hash contents

11
Watching the DBI in action
  • DBI has detailed call tracing built-in
  • The trace can be very helpful in understanding
    application behavior and for debugging
  • Shows parameters and results
  • Trace information can be written to a file
  • Not used often enough!

12
Our program for today...
  • 1 !/usr/bin/perl -w
  • 2
  • 3 use DBI
  • 4
  • 5 dbh DBI-gtconnect('', '', '', RaiseError
    gt 1 )
  • 6
  • 7 upd dbh-gtprepare("UPDATE prices SET
    price? WHERE prod?")
  • 8 ins dbh-gtprepare("INSERT INTO prices
    (prod,price) VALUES(?,?)")
  • 9 rows upd-gtexecute(42, "Widgets")
  • 10 ins-gtexecute("Widgets", 42) if rows 0
  • 11
  • 12 dbh-gtdisconnect

13
Enabling Tracing
  • Per handle
  • h-gttrace(level)
  • h-gttrace(level, filename)
  • Only effects that handle and any new child
    handles created from it
  • Child handles get trace level of parent in effect
    at time of creation
  • Global (internal to application)
  • DBI-gttrace(...)
  • Sets effective global default minimum trace level
  • Global (external to application)
  • Enabled using DBI_TRACE environment variable
  • DBI_TRACEdigits DBI-gttrace(digits)
  • DBI_TRACEfilename DBI-gttrace(2, filename)
  • DBI_TRACEdigitsfilename DBI-gttrace(digits,
    filename)

(extra)
14
Trace level 1
  • Trace level 1 shows method results and line
    numbers
  • lt- connect DBIdbHASH(0xe0abc) at DBI.pm
    line 356.
  • lt- STORE('PrintError', 1) 1 at DBI.pm line
    382.
  • lt- STORE('AutoCommit', 1) 1 at DBI.pm line
    382.
  • lt- STORE('RaiseError', 1) 1 at DBI.pm line
    382.
  • lt- prepare('UPDATE ') DBIstHASH(0xe1238)
    at test.pl line 7.
  • lt- prepare('INSERT ') DBIstHASH(0xe1504)
    at test.pl line 8.
  • lt- execute '0E0' at test.pl line 9.
  • lt- execute 1 at test.pl line 10.
  • lt- disconnect 1 at test.pl line 11.
  • lt- DESTROY undef
  • lt- DESTROY undef
  • lt- DESTROY undef
  • lt- DESTROY undef

15
Trace level 2 and above
  • Trace level 2 shows calls with parameters and
    more
  • -gt connect for DBDODBCdr
    (DBIdrHASH(0x13dfec)0xe14a4
  • '' ''
    HASH(0xe0a10))
  • lt- connect DBIdbHASH(0xe0ab0) at DBI.pm
    line 356.
  • -gt STORE for DBDODBCdb (DBIdbHASH(0xe0a
    bc)INNER 'PrintError' 1)
  • lt- STORE 1 at DBI.pm line 382.
  • -gt prepare for DBDODBCdb
    (DBIdbHASH(0xe0ab0)0xe0abc
  • 'UPDATE prices
    SET price? WHERE prod?')
  • lt- prepare DBIstHASH(0xe1274) at test.pl
    line 7.
  • Trace level 3 and above shows more processing
    details

16
Error Checking and Handling
  • To err is human, to detect, divine.

17
The importance of error checking
  • Errors happen!
  • Failure happens when you don't expect errors!
  • database crash / network disconnection
  • lack of disk space for insert or select (sort
    space for order by)
  • server math error on select (divide by zero)
  • and maybe, just maybe, errors in your own code
    (Gasp!)
  • Beat failure by expecting errors!
  • Detect errors early to limit effects
  • Defensive Programming, e.g., check assumptions
  • Through Programming, e.g., check for errors after
    fetch loops

18
Error checking - ways and means
  • Error checking the hard way...
  • h-gtmethod die "DBI method failed
    DBIerrstr"
  • h-gtmethod die "DBI method failed
    DBIerrstr"
  • h-gtmethod die "DBI method failed
    DBIerrstr"
  • Error checking the easy way...
  • h-gtRaiseError 1
  • h-gtmethod
  • h-gtmethod
  • h-gtmethod

19
Handling errors the smart way
  • For simple applications immediate death on error
    is fine
  • For more advanced applications greater control is
    needed
  • Life after death
  • h-gtRaiseError 1
  • eval h-gtmethod
  • if (_at_)
  • Bonus prize
  • Other, non-DBI, code within the eval block may
    also raise an exception that will be caught and
    handled cleanly

20
Transactions
  • To do or to undo,
  • that is the question

21
Transactions - What's it all about?
  • Far more than just locking
  • The A.C.I.D. test
  • True transactions give true safety
  • even from power failures and system crashes!
  • Incomplete transactions are automatically
    rolled-back by the database server when it's
    restarted.
  • Also removes burden of undoing incomplete changes
  • Hard to implement (for the vendor) and has
    significant performance cost
  • A very large topic worthy of an entire tutorial

22
Transactions - Life Preserving
  • Classic system crash between one bank account
    being debited and another being credited.
  • Dramatic power failure during update statement
    on 3 million rows after 2 minutes when only part
    way through.
  • Real-world complex series of inter-related
    updates, deletes and inserts on many separate
    tables fails at the last step due to a duplicate
    primary key on an insert.
  • Transaction recovery would handle all these
    situations automatically.
  • Makes a system far more robust and trustworthy
    over the long term.
  • Use transactions if your database supports them.
  • If it doesn't and you need them, switch to a
    different database.

23
Transactions - How the DBI helps
  • Tools of the trade
  • Set AutoCommit off, and RaiseError on
  • Wrap eval around the code
  • Use dbh-gtcommit and dbh-gtrollback
  • Disable AutoCommit via dbh-gtAutoCommit 0
  • to enable transactions and thus rollback-on-error
  • Enable RaiseError via dbh-gtRaiseError 1
  • to automatically 'throw an exception' on error
  • The surrounding eval
  • catches the exception and puts the error message
    into _at_
  • Test _at_ and rollback() if set, else commit()

24
Transactions - Example code
  • dbh-gtRaiseError 1
  • eval
  • dbh-gtmethod() assorted DBI calls
  • foo(...) application code
  • dbh-gtcommit commit the changes
  • if (_at_)
  • warn "Transaction aborted because _at_"
  • dbh-gtrollback
  • ...

25
Transactions - Further comments
  • The eval catches all exceptions
  • not just from DBI calls. Also catches fatal
    runtime errors from Perl.
  • Put commit() inside the eval
  • ensures commit failure is caught cleanly.
  • remember that commit() itself may fail for many
    reasons.
  • Don't forget that rollback() may also fail
  • due to database crash or network failure etc.
  • Other points
  • Always explicitly commit or rollback before
    disconnect().
  • Destroying a connected dbh should always
    rollback.
  • END blocks can catch exit-without-disconnect to
    rollback and disconnect cleanly.

26
Speed Speed Speed!
  • What helps,what doesn't

27
Give me speed!
  • DBI was designed for speed from day one
  • DBI dispatch written in hand-crafted XS/C
  • Dispatch to XS driver method calls optimized
  • Cached attributes returned directly by DBI
    dispatch
  • DBI overhead is generally insignificant
  • So we'll talk about other speed issues instead ...

28
Partition for speed
  • Application partitioning
  • stop and think - work smarter not harder
  • Pick the right database for the job, if you have
    the choice
  • Work close to the data
  • Use stored procedures where appropriate
  • Moving data to/from the client is always
    expensive
  • think of latency as well as bandwidth
  • Use proprietary bulk-load where appropriate
  • Consider database maths vs. Perl maths
  • Multiple simple queries with 'joins' in Perl may
    be faster
  • Consider local caching, in memory or DBM file
    etc.
  • Mix 'n Match techniques as needed

29
Prepare for speed
  • prepare() - what happens in the server
  • Receive and parse the SQL statement into internal
    form
  • Get details for all the selected tables
  • Check access rights for each
  • Get details for all the selected fields
  • Check data types in expressions
  • Get details for all the indices of the tables
  • Develop an optimised query 'access plan' for best
    execution
  • Return a handle for all this cached information
  • This can be an expensive process - especially the
    'access plan'
  • Some databases, like MySQL, don't cache the
    information but have simpler, and thus faster,
    plan creation

30
The Best Laid Plans
Access
  • Query optimisation is hard
  • Intelligent high quality cost based query
    optimisation is really hard!
  • Know your optimiser
  • Oracle, Informix, Sybase, DB2, SQL Server etc.
    all slightly different.
  • Check what it's doing
  • Use tools to see the plans used for your queries
    - very helpful
  • Help it along
  • Most 'big name' databases have a mechanism to
    analyse and store the key distributions of
    indices to help the optimiser make good plans
  • beware keep it fresh, old key distributions
    might be worse than none
  • Some also allow you to embed 'hints' into the SQL
    as comments
  • beware take it easy, over hinting hinders
    dynamic optimisation

(extra)
31
Respect your server's SQL cache
  • Optimised Access Plan etc. is cached within the
    server
  • keyed by the exact original SQL string used
  • Without placeholders, SQL string varies each time
  • so cached one is not reused
  • and time is wasted creating a new access plan
  • the new statement and access plan are added to
    cache
  • so the cache fills and other statements get
    pushed out
  • Compare do("insert id")
  • with do("insert ?", id)

32
Hot handles
  • Use prepare() to get a handle on the statement in
    the SQL cache and keep it there.
  • The handle for the pre-prepared statement avoids
    a round-trip to server for cache-check / parsing
    etc.
  • Avoid using dbh-gtdo() in a loop.
  • Use sth dbh-gtprepare()
  • and sth-gtexecute() instead.
  • For example, convert looped do("insert ?", id)
  • to prepare("insert ?")
  • and looped execute(id).

33
Let the DBI cache your handles
  • Sometimes it's not easy to hold all your handles
  • e.g., library code to lookup values from the
    database given a dbh
  • The prepare_cached() method gives you a client
    side statement handle cache
  • while ( (field, value) each search_fields
    )
  • push _at_sql, "field ?"
  • push _at_values, value
  • where ""
  • where "where ".join(" and ", _at_sql) if _at_sql
  • sth dbh-gtprepare_cached("select from
    table where")
  • sth-gtexecute(_at_values)
  • Can avoid the need for global statement handle
    variables

34
Sling less for speed
  • while(_at_row sth-gtfetchrow_array)
  • one field 3,100 fetches per cpu second
  • ten fields 1,000 fetches per cpu second
  • while(row sth-gtfetchrow_arrayref)
  • one field 5,300 fetches per cpu second
  • ten fields 4,000 fetches per cpu second
  • Notes
  • Timings made on an old SPARC 10 using DBDOracle
  • Timings assume instant record fetch within driver
  • Fields all one char. _at_row would be even slower
    for more/bigger fields
  • Use bind_columns() for direct access to fetched
    fields

35
Keep a handle on your databases
  • Connecting to a database can be slow
  • Try to connect once and stay connected where
    practical
  • We'll discuss web server issues later
  • New method
  • The new connect_cached() method acts like
    prepare_cached() but for database handles
  • It also checks the connection and reconnects if
    it's broken
  • Similar, but not quite the same as ApacheDBI
  • New and subject to change with experience
  • Potentially most useful with DBDProxy
    DBIProxyServer

36
Portability
  • A Holy Grail
  • To be taken with a pinch of salt

37
Portability in practice
  • Portability requires care and testing - it can be
    tricky
  • Platform Portability - the easier bit
  • Availability of database client software (and
    server if required)
  • Availability of DBD driver
  • DBDProxy can address both these issues - see
    later
  • Database Portability - more tricky
  • Differences in SQL dialects cause most problems
  • Differences in data types can also be a problem
  • Driver capabilities (placeholders etc)
  • DBIxCompat module (in DBIxRecordSet) may be
    useful.
  • A standard DBI test suite is needed (and planned).

38
The Power of the Proxy
  • Thin clients ...
  • and other buzz words

39
DBDProxy DBIProxyServer
  • Networking for Non-networked databases
  • DBDProxy driver forwards calls over network to
    remote DBIProxyServer
  • No changes in application behavior
  • Only the DBI-gtconnect statement needs to be
    changed
  • Proxy can be made completely transparent by
    setting the DBI_AUTOPROXY environment variable
  • So not even the DBI-gtconnect statement needs to
    be changed!
  • DBIProxyServer works on Win32
  • Access to Access and other Win32 ODBC and ADO
    data sources
  • Developed by Jochen Wiedmann

40
A proxy picture
41
Thin clients and other buzz words
  • Proxying for remote access "thin-client"
  • No need for database client code on the DBI
    client
  • Proxying for network security "encryption"
  • Can use CryptIDEA, CryptDES etc.
  • Proxying for "access control" and "firewalls"
  • handy for web servers
  • Proxying for action control
  • e.g., only allow specific select or insert
    statements
  • Proxying for local access "n-tier middleware"
  • connect_cached(), memory use, licensing limits
  • Proxying for performance "compression"
  • Can compress data transfers using CompressZlib

42
DBI for the Web
  • Hand waving from 30,000 feet

43
Web DBI - Connect speed
  • Databases can be slow to connect
  • Traditional CGI forces a new connect per request
  • Move Perl and DBI into the web server
  • Apache with mod_perl and ApacheDBI module
  • Microsoft IIS with ActiveState's PerlEx
  • Connections can then persist between requests
  • ApacheDBI automatically used by DBI if loaded
  • No CGI script changes required to get persistence
  • Take care not to change the shared session
    behaviour
  • Leave the dbh in the same state you found it!
  • Other alternatives include
  • FastCGI, CGISpeedyCGI and CGIMiniSvr

44
Web DBI - Too many connections
  • Limits on Database Connections
  • Database platform resources or licensing
  • Memory consumption of web servers
  • Partition web servers into General and Database
  • Direct database work to the Database web servers
  • Use Reverse Proxy / Redirect / Rewrite to achieve
    this
  • Allows each subset of servers to be tuned to best
    fit workload
  • And be run on appropriate platforms

45
Web DBI - State-less-ness
  • No fixed client-server pair
  • Each request can be handled by a different
    process.
  • So can't simply stop fetching when one page is
    complete and continue fetching from the same sth
    when the next page is requested.
  • And transactions can't span requests. Even if
    they could you'd have problems with database
    locks being held etc.
  • Need access to 'accumulated state' somehow
  • via the client (e.g., hidden form fields)
  • via the database (records in a session_state
    table)
  • via the web server file system (DBM files etc)
  • Need to purge old state info if stored on server,
    so timestamp it
  • See ApacheSession module
  • DBIProxyServer connect_cached with session ID
    may suit

46
Web DBI - Browsing pages of results
  • Re-execute query each time then count/discard
    (simple but expensive)
  • works well for small results sets or where users
    rarely view many pages
  • fast initial response, degrades gradually for
    later pages
  • count/discard in server is better but still
    inefficient for large result sets
  • affected by inserts and deletes from other
    processes
  • Re-execute query with where clause using min/max
    keys from last results
  • works well where original query can be qualified
    in that way, not common
  • Select and cache full result rows somewhere for
    fast access
  • can be expensive for large result sets with big
    fields
  • Select and cache only the row keys, fetch full
    rows as needed
  • optimisation of above, use ROWID if supported,
    "select where in ()"
  • If data is static and queries predictable then
    custom pre-built indexes may be useful
  • The cache can be on web server, using DBM file
    for example, or on database server using a table
    keyed by session id

47
Web DBI - Concurrent editing
  • How to prevent updates overwriting each other?
  • You can use Optimistic Locking via 'qualified
    update'
  • update table set ......
  • where key old_key
  • and field1 old_field1
  • and field2 old_field2 and for all fields
  • Potential problems with floating point data
    values not matching
  • Some databases support a high-resolution 'update
    timestamp' field that can be checked instead
  • Check the update row count
  • If it's zero then you know the record has been
    changed or deleted by another process

48
Web DBI - Tips for the novice
  • Test one step at a time
  • Test perl DBI DBD driver outside the web
    server first
  • Test web server non-DBI CGI next
  • Remember that CGI scripts run as a different user
    with a different environment - expect to be
    tripped up by that
  • DBI trace is your friend - use it
  • Use the Perl "-w" and "-T" options. Always "use
    strict"
  • Read and inwardly digest the WWW Security FAQ
  • http//www-genome.wi.mit.edu/WWW/faqs/www-security
    -faq.html
  • Read the Perl CGI FAQ
  • http//www.perl.com/cgi-bin/pace/pub/doc/FAQs/cgi/
    perl-cgi-faq.html
  • And, if appropriate, read the mod_perl
    information available from
  • http//perl.apache.org

49
Handling LONG/BLOB data
  • What makes LONG / BLOB / MEMO data special?
  • Not practical to pre-allocate fixed size buffers
    for worst case
  • Fetching LONGs - treat as normal fields after
    setting
  • dbh-gtLongReadLen - buffer size to allocate for
    expected data
  • dbh-gtLongTruncOk - should truncating-to-fit be
    allowed
  • Inserting LONGs
  • The limitations of string literals
  • The benefits of placeholders
  • Chunking / Piecewise processing not supported
  • So you're limited to available memory
  • Some drivers support an unofficial blob_read()
    method

50
DBI security tainting
  • By default the DBI ignores Perl tainting
  • doesn't taint returned data
  • doesn't check that parameters are not tainted
  • The Taint attribute enables that behaviour
  • If Perl itself is in taint mode
  • Each handle has it's own inherited Taint
    attribute
  • So can be enabled for a connection and disabled
    for particular statements, for example
  • dbh DBI-gtconnect(, Taint gt 1 )
  • sth dbh-gtprepare("select from safe_table")
  • sth-gtTaint 0 no tainting on this handle

51
Flexing the Multiplex
  • A new kid on the block

52
DBDMultiplex
  • DBDMultiplex - new and experimental
  • Connects to multiple databases at once (via DBI)
  • Executes statements on all of them by default
  • Could be configured to
  • insert into all databases but select from one
  • fallback to alternate database if primary is
    unavailable
  • select from all and check results (pick most
    common)
  • select round-robin / or at random to distribute
    load
  • Can be used with DBDProxy, either above or
    below
  • May also acquire fancy caching in later versions
  • Watch this space!

53
Reference Materials
  • http//www.symbolstone.org/technology/perl/DBI
  • The DBI Home Page
  • http//www.isc.org/dbi-lists.html
  • To subscribe to the DBI Mailing Lists
  • http//www.perl.com/CPAN/authors/id/TIMB/DBI_Talk1
    _1997.tar.gz
  • My DBI session at The Perl Conference 1.0 -
    general introduction
  • http//www.perl.com/CPAN/authors/id/TIMB/DBI_Talk2
    _1998.tar.gz
  • My DBI session at The Perl Conference 2.0 - more
    depth
  • http//www.perl.com/CPAN/authors/id/TIMB/DBI_Talk3
    _1999.tar.gz
  • This DBI tutorial at The Perl Conference 3.0!
  • http//www.oreilly.com/catalog/perldbi/
  • or http//www.amazon.com/exec/obidos/ASIN/15659269
    94/dbi
  • The forthcoming DBI book! Due late 1999 or early
    2000.
  • (Order in advance at the conference to get
    special discounted rate -)

54
The end.
  • Till next year...
Write a Comment
User Comments (0)