Title: Varsity SQL for the Busy DBA
1Varsity SQL for the Busy DBA
- Fred Sobotka
- FRS Consulting, Inc.
- fred_at_frsconsulting.com
2Stick around for
- How to spot the SQL shortcut for a seemingly
tedious task - Common table expressions, the foundation for
versatile SQL that is structured, testable, and
easily debugged - Replacing unreliable, hand-coded XML generators
with ANSI-standard SQL/XML - Identifying the most useful parts of the system
catalog - SQL functions and patterns that provide the most
bang for the buck
3What does subsistence SQL smell like?
- Application joins, high ratio of statements per
transaction - Always with the temp tables (or perma-temp
tables, even) - Hand-rolled programs that exist solely to
generate XML - App anti-patterns that cleverly avoid SQLs
strength in sets - Poor filtering of rows, fetching a row only to
discard it
4Which is worse - being asked these questions by
developers
- How can I get this report to run faster?
- Whats a more flexible alternative to EXPORT?
- Can DB2 escape special characters for HTML XML?
- Is it possible to audit changes to a few specific
tables? - Why arent I getting the same access plans in
production?
or not being asked at all?
5Why might you as the DBA care about scruffy SQL?
- Are long-running, inefficiently written report
jobs crowding your batch window and cutting into
your maint. time? - Have cumbersome ETL tools become a crutch that
programmers rely on instead of coding smarter
SQL? - Do developers swear up and down that they need
you to build a replicated ODS just to handle some
basic reports? - Do the problematic programs use cursors to
iterate through data that is changing underneath
it? - Is your manager hearing complaints that database
performance is slow or unreliable when you
know the primary cause is poor application
design?
6One Possible Solution Write Better SQL Queries
- Strive to minimize the number of SQL roundtrips
from app - Make user-defined temp tables the exception, not
the rule - Keep joins inside the database, where they belong
- Exploit DB2s mature, built-in XML generator
- Use EXPLAIN to identify which parts drive up the
cost
7Common Table ExpressionsNot just for recursive
SQL youll rarely use
- At the very least, CTEs can add meaning and
structure - Often easier to read and follow than a single
monster join
WITH mdctables AS ( SELECT DISTINCT tabschema,
tabname FROM syscat.indexesWHERE indextype
'DIM' )SELECT m.tabschema, m.tabname, t.npages,
t.fpagesFROM mdctables m INNER JOIN
syscat.tables t ON t.tabschema m.tabschema
AND t.tabname m.tabname ORDER BY t.fpages
DESC
CTE
Final SELECT
8More about CTEs
- Introduced in SQL1999 (a.k.a. SQL3)
- Multiple CTEs can be declared used in a single
stmt - CTEs can reference each other, or themselves
(recursion) - Some CTE support offered in Oracle, MS SQL, all
DB2 - CTEs dont necessarily result in a worse access
plan (EXPLAIN reveals that the statement is
heavily re-written)
9Why not just create views instead?
- If its a vendor app, you may not be allowed to
add them - Views create hard dependencies that could
complicate schema evolution or even block a DB2
upgrade - The query may be a one-off or an infrequent
report/extract - Will the benefits be worth the team code review
that new database objects may (should) require?
10Ever make stir-fry?
- The real work is in the prep, not the little bit
of stirring - Do you ever sample ingredients before throwing
them in? - Each CTE is a like a prep bowl with an ingredient
of data - Sample (SELECT), sniff (EXPLAIN) until youre
confident - Then uncomment the final SELECT at the bottom
cook
11Reasons to add more CTEs to your SQL diet
- Testing/debugging a big stmt without disruptive
changes - Grouping/aggregation can be combined w/ detailed
rows - Targeted, precise row numbering ranking
capabilities - Separate the business logic and filtering from
final output - Allows you to query the logical data model you
really want, not the physical model youre stuck
with
12Test any CTE at just about any time
WITH filteredorders AS (SELECT orderID, custID,
saleDT, itemID, salesregion FROM
sales.ordersWHERE referralsource IN
('WWW','EMAIL','FACEBOOK')) --Orig. final
SELECT is temporarily commented out SELECT
COUNT(), MIN(fo.saleDT), MAX(fo.saleDT),
COUNT(DISTINCT fo.itemID), COUNT(DISTINCT
fo.salesregion) FROM filteredorders fo
- Are we grabbing the orders we really want?
Theres no need to wait until the query is
completely coded to find out.
13Easy switching between literals and parameters
WITH parms( custID, saleDT ) AS ( SELECT--
While testing interactively, comment-- out
parameter markers -- CAST( ? AS INTEGER ), CAST(
? AS DATE ) 999999, '2010-05-01' FROM
sysibm.sysdummy1,
The rest of your query still references
parms.custID and parms.saleDT, and is not
impacted when you switch back and forth between
hardcoded literals and parameter markers.
14Make ROW_NUMBER() RANK() work your way
WITH ordertimeline AS ( SELECT custID, saleDT,
ROWNUMBER() OVER ( PARTITION BT custID ORDER BY
saleDT ASC ) orderseq FROM sales.ordersWHERE
saleDT gt '2010-01-01 000000' ) SELECT
o1.custID, DATE( o1.saleDT ) saleDT1, DATE(
o3.saleDT ) saleDT3, TIMESTAMPDIFF( 16, CHAR(
o3.saleDT o1.saleDT )) elapsedDaysFROM
ordertimeline o1 INNER JOIN ordertimeline o3ON
o3.custID o1.custIDWHERE o1.orderseq 1 AND
o3.orderseq 3ORDER BY elapsedDays ASC
Which customers have already racked up three or
more orders this year? How many days elapsed
between each customers 1st and 3rd order?
15Generating one-liners from SQL
- Use CONCAT or to intersperse text literals in
output - Numerics will need to be cast as CHAR() to mix
with text - Always RTRIM() trailing spaces off of schema
names - Escape any apostrophe (') with a double
apostrophe ('') - Thats two apostrophe characters, not a quote
SELECT 'GRANT SELECT ON 'RTRIM( tabschema
)'.'tabname' TO ROLE READONLY'FROM
syscat.tables WHERE create_time gt CURRENT
TIMESTAMP 2 DAYS
and another
SELECT 'ALTER TABLESPACE 'tbspace'
REBALANCE' FROM syscat.tablespaces
16Selecting and sorting are two different things
- Your ORDER BY doesnt need to reference SELECT
cols - ORDER BY can be very sophisticated if necessary
- Re-order the alphabet with a CASE expression in
a CTE - When something absolutely, positively needs to
follow an exact ordering sequence, DB2 SQL has
got you covered
17Generating more complex code from SQL data
- Build a query or CTE for each distinct section of
the code - Initial preamble/heading, then repeating list of
items, then a closer - Include a numeric literal in each result set for
easy sorting - Define a second sorting column just for items in
a list - Usually produced by a ROWNUMBER() expression in
that query - Typical column layout for each query/CTE is
- Key/name, code fragment, major sort seq, minor
sort seq - Glue them all together via UNION ALL order the
result by the PK/name, major sort seq, and the
minor sort seq - But select just the column that contains the code
fragments
18How can SQL generate sophisticated, valid code?
- obj_name code_fragment outer_sort inner_sort
prod.sales CREATE VIEW 0 0
UNION ALL
prod.sales some_colname 1 0 prod.sales ,
other_colname 1 1 prod.sales ,
yet_another_col 1 2
UNION ALL
prod.sales FROM x.tbl.. 2 0
Just be sure to ORDER BY obj_name, outer_sort
ASC, inner_sort ASCbut dont include them in
your SELECT list!
19Generating complete CREATE VIEW statements
WITH codelines AS (SELECT tabschema, tabname,
'CREATE VIEW NEWSCHEMA.' tabname' AS SELECT
' AS code, 0 AS outersort, 0 AS innersort FROM
syscat.tables -- THE HEADER UNION ALL SELECT
tabschema, tabname, colname AS code, 1 AS
outersort, ROWNUMBER() OVER (PARTITION BY
tabschema, tabname ORDER BY colno) AS
innersortFROM syscat.columns WHERE remarks ltgt '
' --THE COLUMNS UNION ALL SELECT tabschema,
tabname, 'FROM 'RTRIM(tabschema)'.'tabname
' ' AS code, 2 AS outersort, 0 AS innersort
FROM syscat.tables -- THE FOOTER) -- NOW FOR
THE FINAL SELECT SELECT CASE WHEN ( innersort gt
1 ) THEN ', ' codeELSE code END AS code FROM
codelinesWHERE tabschema in () AND tabname IN
()ORDER BY tabschema, tabname, outersort,
innersort
20How to handle commas when generating lists
- You have 2 choices comma at end of line or at
beginning - Its a lot easier to find the first item than the
last item - Dont trust sequence numbers you didnt generate
- Their sequence number 1 could be duplicated, or
missing entirely - So, generate your own sequence number for your
list, and make sure you put a comma before items
2 and up
SELECT CASE WHEN ( innersort gt 1 ) THEN ', '
codeELSE code END AS code FROM
21So what kind of code should we generate?
- RUNSTATS/REORGCHK/REORG/REBIND
- Recreate inoperative views
- Grants and Revokes
- Export Import/Load scripts with
business-specific twists - Extract stored procedures to separate, named
files - DDL for change capture tables
- Mass ATTACH/DETACH scripts for partitioned tables
- One-at-a-time program calls for each primary key
of user data in need of some sort of ETL or other
processing
22Picking the least worst data for a report
WITH allchoices ( itemid, dataitem, weightscore
)AS ( SELECT itemkey, goodattr, 1 FROM
best.optionsUNION ALLSELECT itemid, altval, 2
FROM notasgood.infoUNION ALLSELECT item_id,
someguess, 3 FROM lastresort.hunch)SELECT
a.itemid,b,c,( SELECT x.dataitem FROM allchoices
x WHERE x.itemid a.itemid ORDER BY
weightscoreFETCH FIRST ROW ONLY ) AS
itemstatus,
Think of how often youve seen people write
entire programs full of subsistence SQL just to
handle this.
23SQL/XML publishing functions
- XML may not be in your database, but lots of
people probably want XML out of your database - XML/SQL is a mature solution, introduced in
SQL2003 - Creates sophisticated, well-formed XML, including
namespaces, from your tabular data, entirely from
SQL - Does not require pureXML structures, XPath, or
XSL - DB2 optimizer continues to improve at predicate
pushdown for SQL/XML, yielding better access plans
24XMLSERIALIZE and XMLELEMENT
- XMLSERIALIZE specifies the output data type
returned - Next is typically the outermost (top-level)
XMLELEMENT - Everything else is embedded within that element
SELECT XMLSERIALIZE( XMLELEMENT(NAME
"table", XMLELEMENT(NAME "schemaname",
tabschema ), XMLELEMENT(NAME "tablename",
tabname ) )AS VARCHAR( 255 ) )FROM
syscat.tables
This returns the same number of rows as a
traditional SELECT that doesnt use SQL/XML.
25Suppressing empty XML tags with XMLFOREST
- XMLELEMENT will always generate an XML tag, even
when the payload is NULL ltexcusesgtlt/excusesgt - If that isnt what you want, use XMLFOREST, which
omits the tag whenever the payload happens to be
NULL - XMLFOREST is also handy for generating a series
of sibling elements for a list of values
SELECT XMLSERIALIZE( XMLELEMENT( NAME
"table", XMLFOREST( tabschema AS
"schemaname", tabname AS
tablename, remarks AS "comment" ) )
AS VARCHAR( 400 ) )FROM syscat.tables
26Collapsing repeating items into a child element
- XMLAGG and XMLGROUP are used to squash
potentially multiple rows into sibling elements - XMLGROUP is the more sophisticated of the two
- Offers more features with less typing
- You can do one XMLAGG in an expression, but not
more - Cartesian products will occur if you dont
separate them - One XMLAGG containing five rows and another
XMLAGG for three rows in the same SELECT will
produce 15 elements inside each XMLAGG, which is
probably not what you want.
27Example of unwanted Cartesian product
- WITH purchases( p ) AS ( VALUES 0 ),
- coupons( p, c ) AS ( VALUES( 0, 1 ), ( 0, 2 ),(
0, 3 ) ), - purchaseditems( p, s ) AS ( VALUES( 0, 'A' ), (
0, 'B' ), ( 0,'C' ), ( 0,'D' ), ( 0,'E' ) - )
- SELECT XMLSERIALIZE( XMLELEMENT( NAME "Purchase",
- XMLAGG( XMLELEMENT( NAME "CouponUsed", c ) ) ,
- XMLAGG( XMLELEMENT( NAME "ItemBought", s ) )
- )
- AS VARCHAR( 2048 )
- )
- FROM
- purchases p
- LEFT OUTER JOIN coupons c ON p.p c.p
- LEFT OUTER JOIN purchaseditems s ON p.p s.p
- GROUP BY p.p
Although the GROUP BY collapses the final output
to one row, the number of items produced by each
XMLAGG is incorrect.
28Correct results when grouping is pushed to CTE
- WITH purchases( p ) AS ( VALUES 0 ),
- coupons( p, c ) AS ( VALUES( 0, 1 ), ( 0, 2 ),(
0, 3 ) ), - purchaseditems( p, i ) AS ( VALUES( 0, 'A' ), (
0, 'B' ), ( 0,'C' ), ( 0,'D'), ( 0,'E' ) - ),
- cx( p, cxml ) AS (SELECT p, XMLAGG( XMLELEMENT(
NAME "CouponUsed", c ) ) FROM coupons GROUP BY p
), - ix(p, ixml ) AS (SELECT p, XMLAGG( XMLELEMENT(
NAME "ItemBought", i ) ) FROM purchaseditems
GROUP BY p ) - SELECT XMLSERIALIZE( XMLELEMENT( NAME "Purchase",
cxml , ixml ) - AS VARCHAR( 2048 )
- )
- FROM purchases p
- LEFT OUTER JOIN cx ON cx.p p.p
- LEFT OUTER JOIN ix ON ix.p p.p
-
The CTEs will return either one row or nothing,
which eliminates the risk of a cross-product.
29Building the Perfect (XML) Beast
- Build and test CTEs for major components/elements
- The CTEs should handle any filtering and
formatting - A second layer of CTEs collapses repeating
children - Its easier if the CTEs referenced by the final
SELECT all return at most one row each - The final SELECT lays out the XML tags for all of
the CTEs, which are connected via LEFT JOIN to
the main table
30Varsity SQL isnt necessarily complex SQL
- Describe the ideal collection of datasets youd
like to see - Make the query do as much heavy lifting as
possible - Assume the client app cant or wont refine the
results - Favor a single moderately expensive query over
hundreds or thousands of primitive SQL statements
sent by an app - Think in stages finding, filtering, grouping,
and presenting
31Noteworthy SQL resources
- DB2 SQL Cookbook by Graeme Birchall (free e-book)
- Anything written by Joe Celko
- DB2 XML Cookbook by Matthias Nicola and Pav
Kumar-Chatterjee - Questions and answers posted to StackOverflow.com
- DB2-L and IDUG Solutions Journal
32Just a few presenters to read and watch
- DB2 z/OS optimizer Terry Purcell
- DB2 z/OS SQL Sheryl Larsen
- DB2 LUW optimizer Calisto Zuzarte
- DB2 LUW SQL Serge Rielau
33Fred SobotkaIBM Gold Consultant IBM Data
ChampionFRS Consulting, Inc.fred_at_frsconsulting.c
om503-484-5032Skype/Twitter/AIM/Yahoo! db2fred