Oracle8i Analytical SQL Features - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Oracle8i Analytical SQL Features

Description:

All Regions Shoes 130. All Regions Suits 90. All Regions Sweaters ... QUERY: Calculate a running total of the amount of sales by region. select region, amount, ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 49
Provided by: thomascsh
Category:

less

Transcript and Presenter's Notes

Title: Oracle8i Analytical SQL Features


1
Oracle8i Analytical SQL Features

2
Analytical SQL Features Overview
  • Available in Oracle 8.1.6 and above
  • Analytical Features/Enhancements
  • GROUP BY Extensions
  • SQL GROUP BY clause has been augmented to make
    querying and reporting easier
  • Analytical SQL Functions
  • Analytical functions enabling rankings, moving
    window calculations, lead/lag analysis
  • CASE Expressions
  • Increased/Efficient if-then-else capabilities
    provided

3
Oracle8i Analytical SQL Features
  • GROUP BY
  • Extensions

4
GROUP BY Extensions
  • ROLLUP
  • Calculates subtotals at increasing levels of
    aggregation from the most detail to a grand total
  • used to generate simple cross tabular reports
  • CUBE
  • Calculates all possible combinations of subtotals
  • used to generate full cross tabular reports

5
GROUP BY Extensions Syntax
  • ROLLUP
  • SELECT ltcolumn listgt
  • FROM lttablegt
  • GROUP BY ROLLUP(column_list)
  • CUBE
  • SELECT ltcolumn listgt
  • FROM lttablegt
  • GROUP BY CUBE(column_list)

6
Standard GROUP BY Example
  • select region, product,
  • SUM(amount)
  • from region, product,
  • product_sales
  • where region_id reg_id
  • and region 'East'
  • and product_id prod_id
  • group by region,product
  • REGION PRODUCT SUM(AMOUNT)
  • ------------ ------------ -----------
  • East Hats 75
  • East Jackets 100
  • East Pants 100
  • East Shirts 75
  • East Shoes 130
  • East Suits 90
  • East Sweaters 75
  • East T-Shirts 20
  • East Ties 60

7
GROUP BY with ROLLUP
select region, product, SUM(amount)
from region, product, product_sales
where region_id reg_id and region 'East'
and product_id prod_id group by
ROLLUP(region,product)
REGION PRODUCT SUM(AMOUNT) ------------
------------ ----------- East Hats
75 East Jackets
100 East Pants 100 East
Shirts 75 East Shoes
130 East Suits
90 East Sweaters 75 East
T-Shirts 20 East Ties
60 East -
725 - - 725
8
GROUP BY with CUBE
select region, product, SUM(amount)
from region, product, product_sales
where region_id reg_id and region 'East'
and product_id prod_id group by
CUBE(region,product)
REGION PRODUCT SUM(AMOUNT) ------------
------------ -----------
East Ties
60 East - 725 -
Hats 75 - Jackets
100 - Pants
100 - Shirts 75 -
Shoes 130 - Suits
90 - Sweaters
75 - T-Shirts 20 -
Ties 60 - -
725
9
Comments about ROLLUP/CUBE
  • ROLLUP creates subtotals at n1 levels where n
    equals the number of grouping columns
  • CUBE creates 2n combinations of subtotals where
    n equals the number of grouping columns
  • Sub-total generation more efficient than
    equivalent SQL code (a 4 column CUBE grouping,
    has a 93.75 reduction in table access, ROLLUP
    has 80)

10
Comments about ROLLUP/CUBE
  • Partial rollups/cubes can be specified
  • GROUP BY exp1, CUBE(exp2, exp3, ....)
  • ROLLUP/CUBE can be used with all aggregating
    functions (MAX, MIN, AVG, etc.)
  • HAVING clause applies to all the data returned
  • NULLs are generated for dimensions at subtotal
    levels

11
GROUPING() Function
  • Used to distinguish between NULLs in data and
    NULLs generated by ROLLUP/CUBE extensions
  • GROUPING() return values
  • 1 for extension-generated NULLs, 0 for NULL data
    values
  • Can be passed to DECODE for custom interpretation
  • SYNTAX
  • SELECT .. GROUPING(column name) .. GROUP BY ..
  • SELECT .. DECODE(GROUPING(col), 1, Sub, col))

12
GROUPING() Function Example
select DECODE(GROUPING(region), 1, 'All
Regions', 0, region) region,
DECODE(GROUPING(product), 1, 'All Products',
0, product) product, SUM(amount) from region,
product, product_sales where region_id
reg_id and region 'East' and product_id
prod_id group by CUBE(region,product)
REGION PRODUCT SUM(AMOUNT) ------------
------------ -----------
East Ties
60 East All Products 725 All
Regions Hats 75 All Regions
Jackets 100 All Regions Pants
100 All Regions Shirts
75 All Regions Shoes 130 All
Regions Suits 90 All Regions
Sweaters 75 All Regions T-Shirts
20 All Regions Ties
60 All Regions All Products 725
13
Oracle8i Analytical SQL Features
  • Analytical SQL
  • Functions

14
Analytical SQL Functions
  • Analytical Function Categories
  • Ranking Functions
  • Windowing Functions
  • Reporting Functions
  • Lag/Lead Functions
  • Statistics Functions
  • Functions are applied after all joins, WHERE,
    GROUP BY and HAVING clauses are performed, but
    before the ORDER BY clause is applied

15
Basic Analytical Function Syntax
  • Syntax
  • ltfunction_namegt() OVER
  • (
  • PARTITION BY ltexp1gt ,
  • ORDER BY ltexp2gt ASCDESC
  • NULLS FIRSTNULLS LAST
  • )
  • Example
  • SELECT RANK(amount) OVER
  • (PARTITION BY region ORDER BY amount)
  • FROM REG_SALES

16
Function Syntax Comments
  • PARTITION BY ltexpgt , - this clause divides
    the query result into groups within which the
    analytical function operates
  • If the PARTITION BY clause is missing, the
    function operates over the entire dataset
  • ltexpgt can be any valid expression involving
    column references

17
Function Syntax Comments
  • ORDER BY ltexpgtASCDESC
    NULLS FIRSTNULLS LAST
  • specifies how the data is ordered within a group
    (partition)
  • ASCDESC specifies the sorting order for the
    grouping. The default sorting order is ASC.
  • The presence of ORDER BY affects the outcome of
    analytical functions
  • With ORDER BY, the set of rows used is the
    current row and all preceding rows in the
    partition (a growing window)
  • Without ORDER BY, all the rows in the partition
    will be used
  • The ORDER BY clause can be used to resolve ties
    between repeated values in a set.

18
Function Syntax Comments
  • ORDER BY ltexpgtASCDESC
    NULLS FIRSTNULLS LAST
  • The NULLS FIRSTNULLS LAST clause determines the
    position of NULLs in the ordered sequence. If
    omitted, the position depends on the ASC, DESC
    arguments.
  • NULLs are considered to be larger than any other
    values.
  • It is not guaranteed that the data will be sorted
    on the measures. Use the ORDER BY clause to
    specify the ordering sequence.

19
Ranking Functions
  • Computes the rank of a record with respect to
    other records in the dataset based on the values
    of a set of measures
  • Ranking Functions
  • RANK() and DENSE_RANK()
  • CUME_DIST() and PERCENT_RANK()
  • NTILE()
  • ROW_NUMBER()

20
RANK() and DENSE_RANK() Functions
  • The RANK() and DENSE_RANK() functions allow you
    to rank items in a dataset or sub-group.
  • The RANK() function leaves gaps in the ranking
    sequence when there are ties in the rankings.
  • The DENSE_RANK() function does not leave gaps in
    the ranking sequence when there are ties in the
    rankings.

21
Example of RANK()/DENSE_RANK()
select amount, RANK() OVER (ORDER BY
amount) AS rank_asc, DENSE_RANK() OVER
(ORDER BY amount) AS dense_rank from
product_sales, product, region where
prod_idproduct_id and region_idreg_id
and region'East' order by amount
  • AMOUNT RANK_ASC DENSE_RANK
  • ------------ ------------ -----------
  • 20 1 1
  • 60 2 2
  • 75 3 3
  • 75 3 3
  • 75 3 3
  • 90 6 4
  • 100 7 5
  • 100 7 5
  • 130 9 6
  • NOTES
  • - Ranking value will repeat (leaving gaps) when
    the same data values are found in the dataset
  • - The order or the rows with repeated values is
    non-deterministic
  • - DENSE_RANK() does not leave gaps in the rank
    values for repeated data values (RANK() does)
  • - The largest rank value produced by DENSE_RANK
    equals the number of distinct values in the
    dataset

22
Using RANK() For Top-N Values List
select from ( select region,
product,SUM(amount) amt, SUM(profit) profit,
RANK() OVER (PARTITION BY region ORDER BY
SUM(amount) DESC) AS rank from product_sales,
product, region where prod_idproduct_id
and region_idreg_id GROUP BY region, product
) where rank_sum_amt lt 4
Region Product Amt Profit Rank ------
------- ---- ------ ---- Central Sweaters
242 92 1 Central Shirts 213 87
2 Central Pants 123 57 3 East
Shoes 130 30 1 East Jackets
100 28 2 East Pants 100
24 2 West Shoes 100 13
1 West Jackets 99 17 2 West
T-Shirts 89 23 3
NOTES - Using RANK() or DENSE_RANK(), you can
get the top N ranks within a dataset - DENSE_RANK(
) could yield different results than RANK()
depending on repeated values within the dataset.
(The function ROW_NUMBER() would provide more
accurate results.) - The bottom N rankings can be
generated by changing the ordering sequence
within the rank expression (e.g., ORDER BY
SUM(amount) ASC))
23
CUME_DIST() and PERCENT_RANK()
  • CUME_DIST() computes the position of a specified
    value relative to the set of values (also known
    as inverse of percentile in statistics books)
  • CD ( values different from or equal to
    x)/(total of values)
  • Return values are between 0 and 1
  • PERCENT_RANK() returns the percent rank of a
    value relative to a group of values
  • PR (rank of row in partition - 1)/( of rows in
    the partition - 1)

24
Example of CUME_DIST() PERCENT_RANK()
select region, product, SUM(amount),
CUME_DIST() OVER (PARTITION BY region ORDER BY
SUM(amount) ASC) AS cume_dist, PERCENT_RANK()
OVER (PARTITION BY region ORDER BY SUM(amount)
ASC) AS pct_rnk from product_sales, product,
region where prod_idproduct_id and
region_idreg_id GROUP BY region, product
Region Product Amt Cume_Dist Pct_Rnk Central
Belts 85 0.1429 0 Central Suits 98 0.2857
0.1667 Central Ties 104 0.4286
0.3333 Central Hats 111 0.5714 0.5 Central
Pants 123 0.7143 0.6667 Central Shirts 213
0.8571 0.8333 Central Sweaters 242 1
1 East T-Shirts 20 0.1111 0 East
Ties 60 0.2222 0.125 East Hats 75
0.5556 0.25 East Shirts 75 0.5556
0.25 East Sweaters 75 0.5556 0.25 East
Suits 90 0.6667 0.625 East Jackets 100
0.8889 0.75 East Pants 100 0.8889
0.75 East Shoes 130 1 1


25
NTILE(n) and ROW_NUMBER()
  • NTILE(n)
  • Divides dataset into a specified number of
    buckets
  • Takes the number of buckets as an argument
  • ROW_NUMBER()
  • assigns a unique number to each row within a
    partition
  • row numbers start with 1 and increase
    sequentially within each partition
  • better than RANK() or DENSE_RANK() for top-N
    queries
  • Rows with rankings that are ties will not
    necessarily be assigned to the same bucket (if
    they span buckets) or the same row number in
    subsequent runs of the query using the same
    dataset

26
Example of NTILE() ROW_NUMBER()
select region, product, SUM(amount), NTILE(3)
OVER (PARTITION BY region ORDER BY SUM(amount)
ASC) AS bucket, ROW_NUMBER() OVER (PARTITION BY
region ORDER BY SUM(amount) ASC) AS row from
product_sales, product, region where
prod_idproduct_id and region_idreg_id GROUP
BY region, product
Region Product Amt Bucket Row -----------
------------- -------
------------ ----- Central Belts 85
1 1 Central Suits 98 1 2 Central Ties 104
1 3 Central Hats 111 2 4 Central Pants 123
2 5 Central Shirts 213 3 6 Central Sweaters 242
3 7 East T-Shirts 20 1 1 East Ties 60
1 2 East Hats 75 1 3 East Shirts 75
2 4 East Sweaters 75 2 5 East Suits 90
2 6 East Jackets 100 3 7 East Pants 100
3 8 East Shoes 130 3 9
27
Oracle8i Analytical SQL Features
  • Windowing Functions

28
Windowing Functions
  • Used to compute cumulative, moving or centered
    aggregates
  • Returns a value for each row in a dataset which
    depends on other rows in the corresponding window
  • Windowing functions include moving sum, moving
    average, moving min/max, cumulative sum and
    statistical functions, first and last value in
    window

29
Windowing Function Syntax
  • SUMAVGMAXMINCOUNTFIRST_VALUELAST_VALUE
    OVER
  • ( PARTITION BY ltexp1gt ,
  • ORDER BY ltexp2gt ASCDESC NULLS FIRSTNULLS
    LAST
  • ROWS RANGE
  • UNBOUNDED PRECEDING ltexp3gt PRECEDING
  • BETWEEN
  • UNBOUNDED PRECEDING ltexp4gt
    PRECEDING
  • AND
  • CURRENT ROW ltexp5gt FOLLOWING )

30
Windowing Function Syntax Comments
  • ltexpgt
  • Must be a constant or an expression which
    evaluates to a positive value
  • If ROWS was specified, its a physical offset
    which represents number of rows in the window
  • If RANGE was specified, its a logical offset
    (value or interval literal)
  • An interval literal is specified as follows
  • RANGE INTERVAL n DAYSMONTHSYEARS
  • RANGE x PRECEDINGFOLLOWING

31
Windowing Function Syntax Comments
  • ROWS RANGE
  • ROWS specifies the window in physical units
  • RANGE specifies the window as a logical offset
  • BETWEEN AND
  • Specifies the start and end point of the window
  • If BETWEEN is omitted and an end point is
    specified, that point will be considered the
    start point and the current row will be used as
    the end point

32
Windowing Function Syntax Comments
  • UNBOUNDED PRECEDING
  • Specifies that the window starts at the first row
    of the partition, (or the start of the dataset,
    if the PARTITION BY clause is omitted)
  • UNBOUNDED FOLLOWING
  • Specifies that the window ends at the last row of
    the partition, (or the last row of the dataset,
    if the PARTITION BY clause is omitted)

33
Windowing Function Syntax Comments
  • CURRENT_ROW
  • As a start point
  • If ROWS was specified, makes the current row the
    start of the window.
  • If RANGE was specified, then the current value is
    the start of the window
  • As an end point
  • If ROWS was specified, makes the current row the
    end of the window.
  • If RANGE was specified, then the current value is
    the end of the window

34
Windowing Function Syntax Comments
  • ltexpgt FOLLOWING
  • If this is the start point, then the end point
    must be ltexpgt FOLLOWING or UNBOUNDED FOLLOWING
  • ltexpgt PRECEDING
  • If this is the end point, then the start point
    must be ltexpgt PRECEDING or UNBOUNDED PRECEDING
  • This applies whether ROWS or RANGE was specified

35
Example of a Partition (Sub-Grouping) Based
Moving Window
QUERY Calculate a running total of the amount of
sales by region select region, amount,
SUM(amount) OVER (PARTITION BY region
ORDER BY amount ROWS UNBOUNDED PRECEDING)
as mov_amt_sum from product_sales, product,
region where prod_idproduct_id and
region_idreg_id
  • REGION AMOUNT MOV_AMT_SUM
  • ------------ ------------ -----------
  • Central 85 85
  • Central 98 183
  • Central 104 287
  • Central 111 398
  • Central 123 521
  • Central 213 734
  • Central 242 976
  • East 20 20
  • East 60 80
  • East 75 155
  • East 75 230
  • East 75 305
  • East 90 395
  • East 100 495
  • East 100 595
  • East 130 725

36
Example of Date Based Moving Window Summaries
select cust_id, trans_dt, amt, sum(amt) over
(partition by cust_id order by trans_dt range
interval '1' month preceding) sum_1_mnth,
sum(amt) over (partition by cust_id order by
trans_dt range between interval '1' month
preceding and interval '1' month following)
sum_2_mnth, sum(amt) over (partition by cust_id
order by trans_dt range between interval
'7' DAY preceding and interval '7' DAY
following) sum_wk from cust_daily_summary order
by cust_id, trans_dt
cust_id Trans_dt Amt Sum_1_mth Sum_2_mth
Sum_wk ------- ------- ------ ------ ---------
------ 1643060 3/15/00 -3.08 -3.08 118.78
-3.08 1643060 4/15/00 121.86 118.78 103.27
121.86 1643060 5/11/00 -6.80 115.06 101.85
-15.51 1643060 5/12/00 0.25 115.31 100.88
-15.51 1643060 5/13/00 -0.85 114.46 96.63
-15.51 1643060 5/14/00 -1.68 112.78 92.06
-15.51 1643060 5/15/00 -6.43 106.35 88.95
-15.51 1643060 6/11/00 -4.50 -20.01 -36.91
-17.40 1643060 6/12/00 -0.97 -14.18 -32.51
-17.40 1643060 6/13/00 -4.25 -18.68 -35.76
-17.40 1643060 6/14/00 -4.57 -22.40 -35.42
-17.40 1643060 6/15/00 -3.11 -23.83 -34.63
-17.40 1643060 7/11/00 -4.00 -21.40 -28.95
-10.80 1643060 7/12/00 -2.40 -19.30 -27.57
-10.80 1643060 7/13/00 -3.00 -21.33 -27.73
-10.80 1643060 7/14/00 -0.51 -17.59 -24.83
-10.80 1643060 7/15/00 -0.89 -13.91 -23.82
-10.80
37
Oracle8i Analytical SQL Features
  • Reporting Functions

38
Reporting Functions
  • Allow for the calculation of aggregate values
    within a data partition
  • Return the same aggregate value for every row in
    a partition
  • Syntax
  • SUM AVG MAX MIN COUNT STDDEV
    VARIANCE
  • (ALL DISTINCT ltvalue expression1gt )
  • OVER (PARTITION BY ltvalue expression2gt,...
    )

39
Example of Using Reporting Functions
Query Find the region where each product was
best seller SELECT product, region,
sum_amt FROM (SELECT product, region,
SUM(amount) AS sum_amt, MAX(SUM(amount)) OVER
(PARTITION BY product) AS max_sum_amt
FROM product_sales, region, product
WHERE region_idreg_id AND product_idprod_id
GROUP BY product, region) WHERE sum_amt
max_sum_amt
  • REGION REGION SUM_AMT
  • ------------ ------------ ---------
  • BELTS Central 85
  • HATS Central 111
  • JACKETS East 100
  • JEANS West 50
  • PANTS Central 123
  • SHIRTS Central 213
  • SHOES East 130
  • SOCKS West 78
  • SUITS Central 98
  • SWEATERS Central 242
  • T-SHIRTS West 89
  • TIES Central 104

40
New Reporting Functions
  • RATIO_TO_REPORT(exp)
  • Computes the ratio of a value to the sum of a set
    of values
  • LEAD() and LAG()
  • Useful for comparing values in different time
    period
  • Allows access to more than one row in a table
    without
  • a self-join
  • LAG() provides access to a prior row (at a given
    offset)
  • LEAD() provides access to a row after the current
    position
  • These functions are position, not value based

41
New Reporting Function Syntax
  • Syntax
  • RATIO_TO_REPORT(ltexp1gt) OVER
  • (PARTITION BY ltexp2gt ,)
  • LEAD LAG
  • (ltexp1gt ,ltoffsetgt , ltdefaultgt) OVER
  • (PARTITION BY ltexp2gt ,)
  • ORDER BY ltexp3gt ASCDESC
  • NULLS FIRST NULLS LAST ,)
  • ltoffsetgt is optional and defaults to 1
  • ltdefaultgt is optional and is the value returned
    if the ltoffsetgt falls outside the bounds of the
    dataset

42
Example of RATIO_TO_REPORT()
Query Find ratio of total sales per product
to total sales SELECT product, SUM(amount)
AS sum_amt, SUM(SUM(amount)) OVER() AS
total_amt, RATIO_TO_REPORT(SUM(amount))
OVER () AS ratio FROM product_sales,
product, region WHERE prod_id product_id
AND region'East' GROUP BY product
PRODUCT SUM_AMT TOTAL_AMT RATIO --------
------- ---------- ------- Belts 150
2443 .061 Hats 186 2443
.076 Jackets 199 2443 .081 Jeans
50 2443 .020 Pants 268
2443 .110 Shirts 363 2443
.149 Shoes 230 2443 .094 Socks
78 2443 .032 Suits 188
2443 .077 Sweaters 392 2443
.160 T-Shirts 109 2443 .045 Ties
230 2443 .094
43
Example of LAG Function
Query Compare custs present amount to the
amount 2 days ago SELECT cust_id, acct_date,
sum(amt_usd) amount, LAG(SUM(amt_usd),2,-999)
OVER (PARTITION BY cust_id ORDER BY
acct_trans_date ) AS old_amt FROM
acct WHERE acct_date gt '01-NOV-00' GROUP BY
cust_id, acct_date
  • cust_ID ACCT_DATE AMOUNT OLD_AMT
  • -------- ---------- ------- -------
  • 1643060 11/11/2000 147.47 -999
  • 1643060 11/12/2000 -5.5 -999
  • 1643060 11/13/2000 -2.27 147.47
  • 1643060 11/14/2000 -11.72 -5.5
  • 1643060 11/15/2000 -4.25 -2.27
  • 1643060 12/11/2000 -7.15 -11.72
  • 1643060 12/12/2000 -2.25 -4.25
  • 1643060 12/13/2000 -2.05 -7.15
  • 1643060 12/14/2000 -15.13 -2.25
  • 1643060 12/15/2000 -0.71 -2.05
  • 1659880 11/11/2000 169.17 -999
  • 1659880 11/12/2000 -14.25 -999
  • 1659880 11/13/2000 -12.5 169.17

44
Oracle8i Analytical SQL Features
  • CASE Expressions

45
CASE Expressions
  • Used for bucketing data
  • allows for differently sized buckets
  • Very similar to DECODE statement
  • Provides more flexibility and logical power
  • Offers better performance and is easier to read
  • Syntax
  • CASE WHEN ltcond1gt THEN ltv1gt
  • WHEN ltcond2gt THEN ltv2gt
  • ELSE ltvngt END

46
Example 1 of Using CASE Expressions
SELECT SUM(CASE WHEN SUM(amount) BETWEEN 0
AND 50 THEN 1 ELSE 0 END) AS "0-50",
SUM(CASE WHEN SUM(amount) BETWEEN 51 AND 150
THEN 1 ELSE 0 END) AS "51-150", SUM(CASE WHEN
SUM(amount) BETWEEN 151 AND 250 THEN 1 ELSE 0
END) AS "151-250", SUM(CASE WHEN SUM(amount) gt
251 THEN 1 ELSE 0 END) "251" FROM
product_sales, product, region WHERE
prod_id product_id AND region'East' GROUP
BY product
0-50 51-150 151-250 251 ---- ------ -------
---- 1 3 5 5
47
Example 2 of Using CASE Expressions
SELECT CASE WHEN amount BETWEEN 0 AND 50 THEN
' 0-50' WHEN amount BETWEEN 51 AND 150 THEN
' 51-150' WHEN amount BETWEEN 151 AND 250
THEN '151-250' WHEN amount gt 250 THEN '251' END
bucket, COUNT() cnt, SUM(amount) amt FROM
product_sales, product, region WHERE prod_id
product_id AND region'East' GROUP BY
CASE WHEN amount BETWEEN 0 AND 50 THEN 0-50'
WHEN amount BETWEEN 51 AND 150 THEN 51-150'
WHEN amount BETWEEN 151 AND 250 THEN '151-250'
WHEN amount gt 250 THEN '251' END ORDER BY
bucket
BUCKET CNT AMT ------ --- ---- 0-50
3 115 51-150 21 1873 151-250 2
455 251 3 176
48
Summary
  • New analytical functionality in Oracle 8.1.6 ()
  • Makes it easier to code certain types of SQL
  • Allows for more efficient SQL code when compared
    to the equivalent pure SQL implementation
  • Enhancements include
  • SQL GROUP BY clause has been augmented to make
    querying and reporting easier
  • Analytical functions enabling rankings, moving
    window calculations, lead/lag analysis
  • Better if-then-else capabilities provided
    through CASE
Write a Comment
User Comments (0)
About PowerShow.com