Title: Advanced DBI
1Advanced DBI
- Perl Conference 3.0 Tutorial
- August 1999
- Tim Bunce
2Topical Topics
- Wheels within Wheels
- Error Checking and Handling
- Transactions
- Speed Speed Speed!
- Portability
- The Power of the Proxy
- DBI for the Web
- Handling LONG/BLOB data
- Tainting
- Flexing the Multiplex
3Wheels within Wheels
- The DBI architecture
- and how to watch it at work
4Under the hood
- DBI defines and implements an interface
- Driver modules do much of the real work
- DBI provides default methods, functions, tools
etc for drivers - Not limited to the lowest common denominator -
mechanism provided for driver specific extensions - Designed and built for speed
- Valuable detailed call tracing/debugging built-in
5A picture is worth?
Perl Application
DBI Module
DBDOther
DBDInformix
DBDOracle
Oracle Server
Informix Server
Other Server
6Setting the scene
- Inner and outer worlds
- Application and Drivers
- Inner and outer handles
- DBI handles are references to tied hashes
- The DBI Method Dispatcher
- gateway between the worlds and the heart of the
DBI - Now we'll go all deep and visual for a while...
7Architecture of the DBI classes 1
outer
8Architecture of the DBI classes 2
9Anatomy of a DBI handle
Handle Ref.
10Method call walk-through
- Consider a simple prepare call
dbh-gtprepare(...) - dbh is reference to a DBIdb object (regardless
of driver) - DBIdbprepare is an alias for DBI dispatch
method - Dispatch calls drivers prepare method something
like this - my inner_hash_ref from tie magic
- my implementor_class from DBI magic
data - inner_hash_ref-gtimplementor_classprepare(..
.) - Driver code gets the inner hash so it has fast
access to the hash contents
11Watching the DBI in action
- DBI has detailed call tracing built-in
- The trace can be very helpful in understanding
application behavior and for debugging - Shows parameters and results
- Trace information can be written to a file
- Not used often enough!
12Our program for today...
- 1 !/usr/bin/perl -w
- 2
- 3 use DBI
- 4
- 5 dbh DBI-gtconnect('', '', '', RaiseError
gt 1 ) - 6
- 7 upd dbh-gtprepare("UPDATE prices SET
price? WHERE prod?") - 8 ins dbh-gtprepare("INSERT INTO prices
(prod,price) VALUES(?,?)") - 9 rows upd-gtexecute(42, "Widgets")
- 10 ins-gtexecute("Widgets", 42) if rows 0
- 11
- 12 dbh-gtdisconnect
13Enabling Tracing
- Per handle
- h-gttrace(level)
- h-gttrace(level, filename)
- Only effects that handle and any new child
handles created from it - Child handles get trace level of parent in effect
at time of creation - Global (internal to application)
- DBI-gttrace(...)
- Sets effective global default minimum trace level
- Global (external to application)
- Enabled using DBI_TRACE environment variable
- DBI_TRACEdigits DBI-gttrace(digits)
- DBI_TRACEfilename DBI-gttrace(2, filename)
- DBI_TRACEdigitsfilename DBI-gttrace(digits,
filename)
(extra)
14Trace level 1
- Trace level 1 shows method results and line
numbers - lt- connect DBIdbHASH(0xe0abc) at DBI.pm
line 356. - lt- STORE('PrintError', 1) 1 at DBI.pm line
382. - lt- STORE('AutoCommit', 1) 1 at DBI.pm line
382. - lt- STORE('RaiseError', 1) 1 at DBI.pm line
382. - lt- prepare('UPDATE ') DBIstHASH(0xe1238)
at test.pl line 7. - lt- prepare('INSERT ') DBIstHASH(0xe1504)
at test.pl line 8. - lt- execute '0E0' at test.pl line 9.
- lt- execute 1 at test.pl line 10.
- lt- disconnect 1 at test.pl line 11.
- lt- DESTROY undef
- lt- DESTROY undef
- lt- DESTROY undef
- lt- DESTROY undef
15Trace level 2 and above
- Trace level 2 shows calls with parameters and
more -
- -gt connect for DBDODBCdr
(DBIdrHASH(0x13dfec)0xe14a4 - '' ''
HASH(0xe0a10)) - lt- connect DBIdbHASH(0xe0ab0) at DBI.pm
line 356. - -gt STORE for DBDODBCdb (DBIdbHASH(0xe0a
bc)INNER 'PrintError' 1) - lt- STORE 1 at DBI.pm line 382.
- -gt prepare for DBDODBCdb
(DBIdbHASH(0xe0ab0)0xe0abc - 'UPDATE prices
SET price? WHERE prod?') - lt- prepare DBIstHASH(0xe1274) at test.pl
line 7. - Trace level 3 and above shows more processing
details
16Error Checking and Handling
- To err is human, to detect, divine.
17The importance of error checking
- Errors happen!
- Failure happens when you don't expect errors!
- database crash / network disconnection
- lack of disk space for insert or select (sort
space for order by) - server math error on select (divide by zero)
- and maybe, just maybe, errors in your own code
(Gasp!) - Beat failure by expecting errors!
- Detect errors early to limit effects
- Defensive Programming, e.g., check assumptions
- Through Programming, e.g., check for errors after
fetch loops
18Error checking - ways and means
- Error checking the hard way...
- h-gtmethod die "DBI method failed
DBIerrstr" - h-gtmethod die "DBI method failed
DBIerrstr" - h-gtmethod die "DBI method failed
DBIerrstr" - Error checking the easy way...
- h-gtRaiseError 1
- h-gtmethod
- h-gtmethod
- h-gtmethod
19Handling errors the smart way
- For simple applications immediate death on error
is fine - For more advanced applications greater control is
needed - Life after death
- h-gtRaiseError 1
- eval h-gtmethod
- if (_at_)
- Bonus prize
- Other, non-DBI, code within the eval block may
also raise an exception that will be caught and
handled cleanly
20Transactions
- To do or to undo,
- that is the question
21Transactions - What's it all about?
- Far more than just locking
- The A.C.I.D. test
- True transactions give true safety
- even from power failures and system crashes!
- Incomplete transactions are automatically
rolled-back by the database server when it's
restarted. - Also removes burden of undoing incomplete changes
- Hard to implement (for the vendor) and has
significant performance cost - A very large topic worthy of an entire tutorial
22Transactions - Life Preserving
- Classic system crash between one bank account
being debited and another being credited. - Dramatic power failure during update statement
on 3 million rows after 2 minutes when only part
way through. - Real-world complex series of inter-related
updates, deletes and inserts on many separate
tables fails at the last step due to a duplicate
primary key on an insert. - Transaction recovery would handle all these
situations automatically. - Makes a system far more robust and trustworthy
over the long term. - Use transactions if your database supports them.
- If it doesn't and you need them, switch to a
different database.
23Transactions - How the DBI helps
- Tools of the trade
- Set AutoCommit off, and RaiseError on
- Wrap eval around the code
- Use dbh-gtcommit and dbh-gtrollback
- Disable AutoCommit via dbh-gtAutoCommit 0
- to enable transactions and thus rollback-on-error
- Enable RaiseError via dbh-gtRaiseError 1
- to automatically 'throw an exception' on error
- The surrounding eval
- catches the exception and puts the error message
into _at_ - Test _at_ and rollback() if set, else commit()
24Transactions - Example code
- dbh-gtRaiseError 1
- eval
- dbh-gtmethod() assorted DBI calls
- foo(...) application code
- dbh-gtcommit commit the changes
-
- if (_at_)
- warn "Transaction aborted because _at_"
- dbh-gtrollback
- ...
25Transactions - Further comments
- The eval catches all exceptions
- not just from DBI calls. Also catches fatal
runtime errors from Perl. - Put commit() inside the eval
- ensures commit failure is caught cleanly.
- remember that commit() itself may fail for many
reasons. - Don't forget that rollback() may also fail
- due to database crash or network failure etc.
- Other points
- Always explicitly commit or rollback before
disconnect(). - Destroying a connected dbh should always
rollback. - END blocks can catch exit-without-disconnect to
rollback and disconnect cleanly.
26Speed Speed Speed!
27Give me speed!
- DBI was designed for speed from day one
- DBI dispatch written in hand-crafted XS/C
- Dispatch to XS driver method calls optimized
- Cached attributes returned directly by DBI
dispatch - DBI overhead is generally insignificant
- So we'll talk about other speed issues instead ...
28Partition for speed
- Application partitioning
- stop and think - work smarter not harder
- Pick the right database for the job, if you have
the choice - Work close to the data
- Use stored procedures where appropriate
- Moving data to/from the client is always
expensive - think of latency as well as bandwidth
- Use proprietary bulk-load where appropriate
- Consider database maths vs. Perl maths
- Multiple simple queries with 'joins' in Perl may
be faster - Consider local caching, in memory or DBM file
etc. - Mix 'n Match techniques as needed
29Prepare for speed
- prepare() - what happens in the server
- Receive and parse the SQL statement into internal
form - Get details for all the selected tables
- Check access rights for each
- Get details for all the selected fields
- Check data types in expressions
- Get details for all the indices of the tables
- Develop an optimised query 'access plan' for best
execution - Return a handle for all this cached information
- This can be an expensive process - especially the
'access plan' - Some databases, like MySQL, don't cache the
information but have simpler, and thus faster,
plan creation
30The Best Laid Plans
Access
- Query optimisation is hard
- Intelligent high quality cost based query
optimisation is really hard! - Know your optimiser
- Oracle, Informix, Sybase, DB2, SQL Server etc.
all slightly different. - Check what it's doing
- Use tools to see the plans used for your queries
- very helpful - Help it along
- Most 'big name' databases have a mechanism to
analyse and store the key distributions of
indices to help the optimiser make good plans - beware keep it fresh, old key distributions
might be worse than none - Some also allow you to embed 'hints' into the SQL
as comments - beware take it easy, over hinting hinders
dynamic optimisation
(extra)
31Respect your server's SQL cache
- Optimised Access Plan etc. is cached within the
server - keyed by the exact original SQL string used
- Without placeholders, SQL string varies each time
- so cached one is not reused
- and time is wasted creating a new access plan
- the new statement and access plan are added to
cache - so the cache fills and other statements get
pushed out - Compare do("insert id")
- with do("insert ?", id)
32Hot handles
- Use prepare() to get a handle on the statement in
the SQL cache and keep it there. - The handle for the pre-prepared statement avoids
a round-trip to server for cache-check / parsing
etc. - Avoid using dbh-gtdo() in a loop.
- Use sth dbh-gtprepare()
- and sth-gtexecute() instead.
- For example, convert looped do("insert ?", id)
- to prepare("insert ?")
- and looped execute(id).
33Let the DBI cache your handles
- Sometimes it's not easy to hold all your handles
- e.g., library code to lookup values from the
database given a dbh - The prepare_cached() method gives you a client
side statement handle cache - while ( (field, value) each search_fields
) - push _at_sql, "field ?"
- push _at_values, value
-
- where ""
- where "where ".join(" and ", _at_sql) if _at_sql
- sth dbh-gtprepare_cached("select from
table where") - sth-gtexecute(_at_values)
- Can avoid the need for global statement handle
variables
34Sling less for speed
- while(_at_row sth-gtfetchrow_array)
- one field 3,100 fetches per cpu second
- ten fields 1,000 fetches per cpu second
- while(row sth-gtfetchrow_arrayref)
- one field 5,300 fetches per cpu second
- ten fields 4,000 fetches per cpu second
- Notes
- Timings made on an old SPARC 10 using DBDOracle
- Timings assume instant record fetch within driver
- Fields all one char. _at_row would be even slower
for more/bigger fields - Use bind_columns() for direct access to fetched
fields
35Keep a handle on your databases
- Connecting to a database can be slow
- Try to connect once and stay connected where
practical - We'll discuss web server issues later
- New method
- The new connect_cached() method acts like
prepare_cached() but for database handles - It also checks the connection and reconnects if
it's broken - Similar, but not quite the same as ApacheDBI
- New and subject to change with experience
- Potentially most useful with DBDProxy
DBIProxyServer
36Portability
- A Holy Grail
- To be taken with a pinch of salt
37Portability in practice
- Portability requires care and testing - it can be
tricky - Platform Portability - the easier bit
- Availability of database client software (and
server if required) - Availability of DBD driver
- DBDProxy can address both these issues - see
later - Database Portability - more tricky
- Differences in SQL dialects cause most problems
- Differences in data types can also be a problem
- Driver capabilities (placeholders etc)
- DBIxCompat module (in DBIxRecordSet) may be
useful. - A standard DBI test suite is needed (and planned).
38The Power of the Proxy
- Thin clients ...
- and other buzz words
39DBDProxy DBIProxyServer
- Networking for Non-networked databases
- DBDProxy driver forwards calls over network to
remote DBIProxyServer - No changes in application behavior
- Only the DBI-gtconnect statement needs to be
changed - Proxy can be made completely transparent by
setting the DBI_AUTOPROXY environment variable - So not even the DBI-gtconnect statement needs to
be changed! - DBIProxyServer works on Win32
- Access to Access and other Win32 ODBC and ADO
data sources - Developed by Jochen Wiedmann
40A proxy picture
41Thin clients and other buzz words
- Proxying for remote access "thin-client"
- No need for database client code on the DBI
client - Proxying for network security "encryption"
- Can use CryptIDEA, CryptDES etc.
- Proxying for "access control" and "firewalls"
- handy for web servers
- Proxying for action control
- e.g., only allow specific select or insert
statements - Proxying for local access "n-tier middleware"
- connect_cached(), memory use, licensing limits
- Proxying for performance "compression"
- Can compress data transfers using CompressZlib
42DBI for the Web
- Hand waving from 30,000 feet
43Web DBI - Connect speed
- Databases can be slow to connect
- Traditional CGI forces a new connect per request
- Move Perl and DBI into the web server
- Apache with mod_perl and ApacheDBI module
- Microsoft IIS with ActiveState's PerlEx
- Connections can then persist between requests
- ApacheDBI automatically used by DBI if loaded
- No CGI script changes required to get persistence
- Take care not to change the shared session
behaviour - Leave the dbh in the same state you found it!
- Other alternatives include
- FastCGI, CGISpeedyCGI and CGIMiniSvr
44Web DBI - Too many connections
- Limits on Database Connections
- Database platform resources or licensing
- Memory consumption of web servers
- Partition web servers into General and Database
- Direct database work to the Database web servers
- Use Reverse Proxy / Redirect / Rewrite to achieve
this - Allows each subset of servers to be tuned to best
fit workload - And be run on appropriate platforms
45Web DBI - State-less-ness
- No fixed client-server pair
- Each request can be handled by a different
process. - So can't simply stop fetching when one page is
complete and continue fetching from the same sth
when the next page is requested. - And transactions can't span requests. Even if
they could you'd have problems with database
locks being held etc. - Need access to 'accumulated state' somehow
- via the client (e.g., hidden form fields)
- via the database (records in a session_state
table) - via the web server file system (DBM files etc)
- Need to purge old state info if stored on server,
so timestamp it - See ApacheSession module
- DBIProxyServer connect_cached with session ID
may suit
46Web DBI - Browsing pages of results
- Re-execute query each time then count/discard
(simple but expensive) - works well for small results sets or where users
rarely view many pages - fast initial response, degrades gradually for
later pages - count/discard in server is better but still
inefficient for large result sets - affected by inserts and deletes from other
processes - Re-execute query with where clause using min/max
keys from last results - works well where original query can be qualified
in that way, not common - Select and cache full result rows somewhere for
fast access - can be expensive for large result sets with big
fields - Select and cache only the row keys, fetch full
rows as needed - optimisation of above, use ROWID if supported,
"select where in ()" - If data is static and queries predictable then
custom pre-built indexes may be useful - The cache can be on web server, using DBM file
for example, or on database server using a table
keyed by session id
47Web DBI - Concurrent editing
- How to prevent updates overwriting each other?
- You can use Optimistic Locking via 'qualified
update' - update table set ......
- where key old_key
- and field1 old_field1
- and field2 old_field2 and for all fields
- Potential problems with floating point data
values not matching - Some databases support a high-resolution 'update
timestamp' field that can be checked instead - Check the update row count
- If it's zero then you know the record has been
changed or deleted by another process
48Web DBI - Tips for the novice
- Test one step at a time
- Test perl DBI DBD driver outside the web
server first - Test web server non-DBI CGI next
- Remember that CGI scripts run as a different user
with a different environment - expect to be
tripped up by that - DBI trace is your friend - use it
- Use the Perl "-w" and "-T" options. Always "use
strict" - Read and inwardly digest the WWW Security FAQ
- http//www-genome.wi.mit.edu/WWW/faqs/www-security
-faq.html - Read the Perl CGI FAQ
- http//www.perl.com/cgi-bin/pace/pub/doc/FAQs/cgi/
perl-cgi-faq.html - And, if appropriate, read the mod_perl
information available from - http//perl.apache.org
49Handling LONG/BLOB data
- What makes LONG / BLOB / MEMO data special?
- Not practical to pre-allocate fixed size buffers
for worst case - Fetching LONGs - treat as normal fields after
setting - dbh-gtLongReadLen - buffer size to allocate for
expected data - dbh-gtLongTruncOk - should truncating-to-fit be
allowed - Inserting LONGs
- The limitations of string literals
- The benefits of placeholders
- Chunking / Piecewise processing not supported
- So you're limited to available memory
- Some drivers support an unofficial blob_read()
method
50DBI security tainting
- By default the DBI ignores Perl tainting
- doesn't taint returned data
- doesn't check that parameters are not tainted
- The Taint attribute enables that behaviour
- If Perl itself is in taint mode
- Each handle has it's own inherited Taint
attribute - So can be enabled for a connection and disabled
for particular statements, for example - dbh DBI-gtconnect(, Taint gt 1 )
- sth dbh-gtprepare("select from safe_table")
- sth-gtTaint 0 no tainting on this handle
51Flexing the Multiplex
52DBDMultiplex
- DBDMultiplex - new and experimental
- Connects to multiple databases at once (via DBI)
- Executes statements on all of them by default
- Could be configured to
- insert into all databases but select from one
- fallback to alternate database if primary is
unavailable - select from all and check results (pick most
common) - select round-robin / or at random to distribute
load - Can be used with DBDProxy, either above or
below - May also acquire fancy caching in later versions
- Watch this space!
53Reference Materials
- http//www.symbolstone.org/technology/perl/DBI
- The DBI Home Page
- http//www.isc.org/dbi-lists.html
- To subscribe to the DBI Mailing Lists
- http//www.perl.com/CPAN/authors/id/TIMB/DBI_Talk1
_1997.tar.gz - My DBI session at The Perl Conference 1.0 -
general introduction - http//www.perl.com/CPAN/authors/id/TIMB/DBI_Talk2
_1998.tar.gz - My DBI session at The Perl Conference 2.0 - more
depth - http//www.perl.com/CPAN/authors/id/TIMB/DBI_Talk3
_1999.tar.gz - This DBI tutorial at The Perl Conference 3.0!
- http//www.oreilly.com/catalog/perldbi/
- or http//www.amazon.com/exec/obidos/ASIN/15659269
94/dbi - The forthcoming DBI book! Due late 1999 or early
2000. - (Order in advance at the conference to get
special discounted rate -)
54The end.