Advanced DBI

About This Presentation

Title:

Advanced DBI

Description:

To do or to undo, that is the question. 21. Advanced DBI tutorial. by Tim Bunce. August 1999 ... Also removes burden of undoing incomplete changes ... – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 55

Provided by: timb175

more less

Transcript and Presenter's Notes

Title: Advanced DBI

1
Advanced DBI

Perl Conference 3.0 Tutorial
August 1999
Tim Bunce

2
Topical Topics

Wheels within Wheels
Error Checking and Handling
Transactions
Speed Speed Speed!
Portability
The Power of the Proxy
DBI for the Web
Handling LONG/BLOB data
Tainting
Flexing the Multiplex

3
Wheels within Wheels

The DBI architecture
and how to watch it at work

4
Under the hood

DBI defines and implements an interface
Driver modules do much of the real work
DBI provides default methods, functions, tools
etc for drivers
Not limited to the lowest common denominator -
mechanism provided for driver specific extensions
Designed and built for speed
Valuable detailed call tracing/debugging built-in

5
A picture is worth?
Perl Application
DBI Module
DBDOther
DBDInformix
DBDOracle
Oracle Server
Informix Server
Other Server
6
Setting the scene

Inner and outer worlds
Application and Drivers
Inner and outer handles
DBI handles are references to tied hashes
The DBI Method Dispatcher
gateway between the worlds and the heart of the
DBI
Now we'll go all deep and visual for a while...

7
Architecture of the DBI classes 1
outer
8
Architecture of the DBI classes 2
9
Anatomy of a DBI handle
Handle Ref.
10
Method call walk-through

Consider a simple prepare call
dbh-gtprepare(...)
dbh is reference to a DBIdb object (regardless
of driver)
DBIdbprepare is an alias for DBI dispatch
method
Dispatch calls drivers prepare method something
like this
my inner_hash_ref from tie magic
my implementor_class from DBI magic
data
inner_hash_ref-gtimplementor_classprepare(..
.)
Driver code gets the inner hash so it has fast
access to the hash contents

11
Watching the DBI in action

DBI has detailed call tracing built-in
The trace can be very helpful in understanding
application behavior and for debugging
Shows parameters and results
Trace information can be written to a file
Not used often enough!

12
Our program for today...

1 !/usr/bin/perl -w
2
3 use DBI
4
5 dbh DBI-gtconnect('', '', '', RaiseError
gt 1 )
6
7 upd dbh-gtprepare("UPDATE prices SET
price? WHERE prod?")
8 ins dbh-gtprepare("INSERT INTO prices
(prod,price) VALUES(?,?)")
9 rows upd-gtexecute(42, "Widgets")
10 ins-gtexecute("Widgets", 42) if rows 0
11
12 dbh-gtdisconnect

13
Enabling Tracing

Per handle
h-gttrace(level)
h-gttrace(level, filename)
Only effects that handle and any new child
handles created from it
Child handles get trace level of parent in effect
at time of creation
Global (internal to application)
DBI-gttrace(...)
Sets effective global default minimum trace level
Global (external to application)
Enabled using DBI_TRACE environment variable
DBI_TRACEdigits DBI-gttrace(digits)
DBI_TRACEfilename DBI-gttrace(2, filename)
DBI_TRACEdigitsfilename DBI-gttrace(digits,
filename)

(extra)
14
Trace level 1

Trace level 1 shows method results and line
numbers
lt- connect DBIdbHASH(0xe0abc) at DBI.pm
line 356.
lt- STORE('PrintError', 1) 1 at DBI.pm line
382.
lt- STORE('AutoCommit', 1) 1 at DBI.pm line
382.
lt- STORE('RaiseError', 1) 1 at DBI.pm line
382.
lt- prepare('UPDATE ') DBIstHASH(0xe1238)
at test.pl line 7.
lt- prepare('INSERT ') DBIstHASH(0xe1504)
at test.pl line 8.
lt- execute '0E0' at test.pl line 9.
lt- execute 1 at test.pl line 10.
lt- disconnect 1 at test.pl line 11.
lt- DESTROY undef
lt- DESTROY undef
lt- DESTROY undef
lt- DESTROY undef

15
Trace level 2 and above

Trace level 2 shows calls with parameters and
more
-gt connect for DBDODBCdr
(DBIdrHASH(0x13dfec)0xe14a4
'' ''
HASH(0xe0a10))
lt- connect DBIdbHASH(0xe0ab0) at DBI.pm
line 356.
-gt STORE for DBDODBCdb (DBIdbHASH(0xe0a
bc)INNER 'PrintError' 1)
lt- STORE 1 at DBI.pm line 382.
-gt prepare for DBDODBCdb
(DBIdbHASH(0xe0ab0)0xe0abc
'UPDATE prices
SET price? WHERE prod?')
lt- prepare DBIstHASH(0xe1274) at test.pl
line 7.
Trace level 3 and above shows more processing
details

16
Error Checking and Handling

To err is human, to detect, divine.

17
The importance of error checking

Errors happen!
Failure happens when you don't expect errors!
database crash / network disconnection
lack of disk space for insert or select (sort
space for order by)
server math error on select (divide by zero)
and maybe, just maybe, errors in your own code
(Gasp!)
Beat failure by expecting errors!
Detect errors early to limit effects
Defensive Programming, e.g., check assumptions
Through Programming, e.g., check for errors after
fetch loops

18
Error checking - ways and means

Error checking the hard way...
h-gtmethod die "DBI method failed
DBIerrstr"
h-gtmethod die "DBI method failed
DBIerrstr"
h-gtmethod die "DBI method failed
DBIerrstr"
Error checking the easy way...
h-gtRaiseError 1
h-gtmethod
h-gtmethod
h-gtmethod

19
Handling errors the smart way

For simple applications immediate death on error
is fine
For more advanced applications greater control is
needed
Life after death
h-gtRaiseError 1
eval h-gtmethod
if (_at_)
Bonus prize
Other, non-DBI, code within the eval block may
also raise an exception that will be caught and
handled cleanly

20
Transactions

To do or to undo,
that is the question

21
Transactions - What's it all about?

Far more than just locking
The A.C.I.D. test
True transactions give true safety
even from power failures and system crashes!
Incomplete transactions are automatically
rolled-back by the database server when it's
restarted.
Also removes burden of undoing incomplete changes
Hard to implement (for the vendor) and has
significant performance cost
A very large topic worthy of an entire tutorial

22
Transactions - Life Preserving

Classic system crash between one bank account
being debited and another being credited.
Dramatic power failure during update statement
on 3 million rows after 2 minutes when only part
way through.
Real-world complex series of inter-related
updates, deletes and inserts on many separate
tables fails at the last step due to a duplicate
primary key on an insert.
Transaction recovery would handle all these
situations automatically.
Makes a system far more robust and trustworthy
over the long term.
Use transactions if your database supports them.
If it doesn't and you need them, switch to a
different database.

23
Transactions - How the DBI helps

Tools of the trade
Set AutoCommit off, and RaiseError on
Wrap eval around the code
Use dbh-gtcommit and dbh-gtrollback
Disable AutoCommit via dbh-gtAutoCommit 0
to enable transactions and thus rollback-on-error
Enable RaiseError via dbh-gtRaiseError 1
to automatically 'throw an exception' on error
The surrounding eval
catches the exception and puts the error message
into _at_
Test _at_ and rollback() if set, else commit()

24
Transactions - Example code

dbh-gtRaiseError 1
eval
dbh-gtmethod() assorted DBI calls
foo(...) application code
dbh-gtcommit commit the changes
if (_at_)
warn "Transaction aborted because _at_"
dbh-gtrollback
...

25
Transactions - Further comments

The eval catches all exceptions
not just from DBI calls. Also catches fatal
runtime errors from Perl.
Put commit() inside the eval
ensures commit failure is caught cleanly.
remember that commit() itself may fail for many
reasons.
Don't forget that rollback() may also fail
due to database crash or network failure etc.
Other points
Always explicitly commit or rollback before
disconnect().
Destroying a connected dbh should always
rollback.
END blocks can catch exit-without-disconnect to
rollback and disconnect cleanly.

26
Speed Speed Speed!

What helps,what doesn't

27
Give me speed!

DBI was designed for speed from day one
DBI dispatch written in hand-crafted XS/C
Dispatch to XS driver method calls optimized
Cached attributes returned directly by DBI
dispatch
DBI overhead is generally insignificant
So we'll talk about other speed issues instead ...

28
Partition for speed

Application partitioning
stop and think - work smarter not harder
Pick the right database for the job, if you have
the choice
Work close to the data
Use stored procedures where appropriate
Moving data to/from the client is always
expensive
think of latency as well as bandwidth
Use proprietary bulk-load where appropriate
Consider database maths vs. Perl maths
Multiple simple queries with 'joins' in Perl may
be faster
Consider local caching, in memory or DBM file
etc.
Mix 'n Match techniques as needed

29
Prepare for speed

prepare() - what happens in the server
Receive and parse the SQL statement into internal
form
Get details for all the selected tables
Check access rights for each
Get details for all the selected fields
Check data types in expressions
Get details for all the indices of the tables
Develop an optimised query 'access plan' for best
execution
Return a handle for all this cached information
This can be an expensive process - especially the
'access plan'
Some databases, like MySQL, don't cache the
information but have simpler, and thus faster,
plan creation

30
The Best Laid Plans
Access

Query optimisation is hard
Intelligent high quality cost based query
optimisation is really hard!
Know your optimiser
Oracle, Informix, Sybase, DB2, SQL Server etc.
all slightly different.
Check what it's doing
Use tools to see the plans used for your queries
- very helpful
Help it along
Most 'big name' databases have a mechanism to
analyse and store the key distributions of
indices to help the optimiser make good plans
beware keep it fresh, old key distributions
might be worse than none
Some also allow you to embed 'hints' into the SQL
as comments
beware take it easy, over hinting hinders
dynamic optimisation

(extra)
31
Respect your server's SQL cache

Optimised Access Plan etc. is cached within the
server
keyed by the exact original SQL string used
Without placeholders, SQL string varies each time
so cached one is not reused
and time is wasted creating a new access plan
the new statement and access plan are added to
cache
so the cache fills and other statements get
pushed out
Compare do("insert id")
with do("insert ?", id)

32
Hot handles

Use prepare() to get a handle on the statement in
the SQL cache and keep it there.
The handle for the pre-prepared statement avoids
a round-trip to server for cache-check / parsing
etc.
Avoid using dbh-gtdo() in a loop.
Use sth dbh-gtprepare()
and sth-gtexecute() instead.
For example, convert looped do("insert ?", id)
to prepare("insert ?")
and looped execute(id).

33
Let the DBI cache your handles

Sometimes it's not easy to hold all your handles
e.g., library code to lookup values from the
database given a dbh
The prepare_cached() method gives you a client
side statement handle cache
while ( (field, value) each search_fields
)
push _at_sql, "field ?"
push _at_values, value
where ""
where "where ".join(" and ", _at_sql) if _at_sql
sth dbh-gtprepare_cached("select from
table where")
sth-gtexecute(_at_values)
Can avoid the need for global statement handle
variables

34
Sling less for speed

while(_at_row sth-gtfetchrow_array)
one field 3,100 fetches per cpu second
ten fields 1,000 fetches per cpu second
while(row sth-gtfetchrow_arrayref)
one field 5,300 fetches per cpu second
ten fields 4,000 fetches per cpu second
Notes
Timings made on an old SPARC 10 using DBDOracle
Timings assume instant record fetch within driver
Fields all one char. _at_row would be even slower
for more/bigger fields
Use bind_columns() for direct access to fetched
fields

35
Keep a handle on your databases

Connecting to a database can be slow
Try to connect once and stay connected where
practical
We'll discuss web server issues later
New method
The new connect_cached() method acts like
prepare_cached() but for database handles
It also checks the connection and reconnects if
it's broken
Similar, but not quite the same as ApacheDBI
New and subject to change with experience
Potentially most useful with DBDProxy
DBIProxyServer

36
Portability

A Holy Grail
To be taken with a pinch of salt

37
Portability in practice

Portability requires care and testing - it can be
tricky
Platform Portability - the easier bit
Availability of database client software (and
server if required)
Availability of DBD driver
DBDProxy can address both these issues - see
later
Database Portability - more tricky
Differences in SQL dialects cause most problems
Differences in data types can also be a problem
Driver capabilities (placeholders etc)
DBIxCompat module (in DBIxRecordSet) may be
useful.
A standard DBI test suite is needed (and planned).

38
The Power of the Proxy

Thin clients ...
and other buzz words

39
DBDProxy DBIProxyServer

Networking for Non-networked databases
DBDProxy driver forwards calls over network to
remote DBIProxyServer
No changes in application behavior
Only the DBI-gtconnect statement needs to be
changed
Proxy can be made completely transparent by
setting the DBI_AUTOPROXY environment variable
So not even the DBI-gtconnect statement needs to
be changed!
DBIProxyServer works on Win32
Access to Access and other Win32 ODBC and ADO
data sources
Developed by Jochen Wiedmann

40
A proxy picture
41
Thin clients and other buzz words

Proxying for remote access "thin-client"
No need for database client code on the DBI
client
Proxying for network security "encryption"
Can use CryptIDEA, CryptDES etc.
Proxying for "access control" and "firewalls"
handy for web servers
Proxying for action control
e.g., only allow specific select or insert
statements
Proxying for local access "n-tier middleware"
connect_cached(), memory use, licensing limits
Proxying for performance "compression"
Can compress data transfers using CompressZlib

42
DBI for the Web

Hand waving from 30,000 feet

43
Web DBI - Connect speed

Databases can be slow to connect
Traditional CGI forces a new connect per request
Move Perl and DBI into the web server
Apache with mod_perl and ApacheDBI module
Microsoft IIS with ActiveState's PerlEx
Connections can then persist between requests
ApacheDBI automatically used by DBI if loaded
No CGI script changes required to get persistence
Take care not to change the shared session
behaviour
Leave the dbh in the same state you found it!
Other alternatives include
FastCGI, CGISpeedyCGI and CGIMiniSvr

44
Web DBI - Too many connections

Limits on Database Connections
Database platform resources or licensing
Memory consumption of web servers
Partition web servers into General and Database
Direct database work to the Database web servers
Use Reverse Proxy / Redirect / Rewrite to achieve
this
Allows each subset of servers to be tuned to best
fit workload
And be run on appropriate platforms

45
Web DBI - State-less-ness

No fixed client-server pair
Each request can be handled by a different
process.
So can't simply stop fetching when one page is
complete and continue fetching from the same sth
when the next page is requested.
And transactions can't span requests. Even if
they could you'd have problems with database
locks being held etc.
Need access to 'accumulated state' somehow
via the client (e.g., hidden form fields)
via the database (records in a session_state
table)
via the web server file system (DBM files etc)
Need to purge old state info if stored on server,
so timestamp it
See ApacheSession module
DBIProxyServer connect_cached with session ID
may suit

46
Web DBI - Browsing pages of results

Re-execute query each time then count/discard
(simple but expensive)
works well for small results sets or where users
rarely view many pages
fast initial response, degrades gradually for
later pages
count/discard in server is better but still
inefficient for large result sets
affected by inserts and deletes from other
processes
Re-execute query with where clause using min/max
keys from last results
works well where original query can be qualified
in that way, not common
Select and cache full result rows somewhere for
fast access
can be expensive for large result sets with big
fields
Select and cache only the row keys, fetch full
rows as needed
optimisation of above, use ROWID if supported,
"select where in ()"
If data is static and queries predictable then
custom pre-built indexes may be useful
The cache can be on web server, using DBM file
for example, or on database server using a table
keyed by session id

47
Web DBI - Concurrent editing

How to prevent updates overwriting each other?
You can use Optimistic Locking via 'qualified
update'
update table set ......
where key old_key
and field1 old_field1
and field2 old_field2 and for all fields
Potential problems with floating point data
values not matching
Some databases support a high-resolution 'update
timestamp' field that can be checked instead
Check the update row count
If it's zero then you know the record has been
changed or deleted by another process

48
Web DBI - Tips for the novice

Test one step at a time
Test perl DBI DBD driver outside the web
server first
Test web server non-DBI CGI next
Remember that CGI scripts run as a different user
with a different environment - expect to be
tripped up by that
DBI trace is your friend - use it
Use the Perl "-w" and "-T" options. Always "use
strict"
Read and inwardly digest the WWW Security FAQ
http//www-genome.wi.mit.edu/WWW/faqs/www-security
-faq.html
Read the Perl CGI FAQ
http//www.perl.com/cgi-bin/pace/pub/doc/FAQs/cgi/
perl-cgi-faq.html
And, if appropriate, read the mod_perl
information available from
http//perl.apache.org

49
Handling LONG/BLOB data

What makes LONG / BLOB / MEMO data special?
Not practical to pre-allocate fixed size buffers
for worst case
Fetching LONGs - treat as normal fields after
setting
dbh-gtLongReadLen - buffer size to allocate for
expected data
dbh-gtLongTruncOk - should truncating-to-fit be
allowed
Inserting LONGs
The limitations of string literals
The benefits of placeholders
Chunking / Piecewise processing not supported
So you're limited to available memory
Some drivers support an unofficial blob_read()
method

50
DBI security tainting

By default the DBI ignores Perl tainting
doesn't taint returned data
doesn't check that parameters are not tainted
The Taint attribute enables that behaviour
If Perl itself is in taint mode
Each handle has it's own inherited Taint
attribute
So can be enabled for a connection and disabled
for particular statements, for example
dbh DBI-gtconnect(, Taint gt 1 )
sth dbh-gtprepare("select from safe_table")
sth-gtTaint 0 no tainting on this handle

51
Flexing the Multiplex