UserDefined Aggregates for Advanced Database Applications - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

UserDefined Aggregates for Advanced Database Applications

Description:

Commercial success and dominance through SQL standards (in the 80's) ... State of the Art: the seamy side. A patchwork of major extensions ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 51
Provided by: bru24
Category:

less

Transcript and Presenter's Notes

Title: UserDefined Aggregates for Advanced Database Applications


1
User-Defined Aggregates for Advanced Database
Applications
  • Haixun Wang
  • hxwang_at_cs.ucla.edu
  • Computer Science Dept.
  • University of California, Los Angeles

2
State of the Art the big picture
  • Relational Databases
  • Single data model of spartan simplicity
  • Logic-based database languages
  • Commercial success and dominance through SQL
    standards (in the 80s)
  • A new wave of DB applications (in the 90s)
  • Knowledge discovery, Data Mining, OLAP, Time
    Series analysis, Spatial/Temporal DBs, XML,
  • Underscores limitations of RDBMS
  • Prompted major extensions leading to SQL3
    standards (SQL99 is a subset of SQL3)

3
State of the Art RD highlights
  • Deductive DBs
  • Rule based syntax
  • Logic based formalization of query language
    semantics e.g., nonmonotonic reasoning and
    stratification
  • Recursive queries
  • OO-DBs
  • Complex data types / inheritance
  • Expressive power by merging PL and query
    languages
  • OR-DBs
  • Rich data types / Path Expression (SQL)
  • UDFs and Database Extenders (Data Blades)

4
State of the Art the seamy side
  • A patchwork of major extensions
  • DBMSs have become more powerful but much hard and
    complex to build and use
  • Still not powerful enough
  • Data Mining clustering, classification,
    association
  • New language constructs not helping either
  • Limited expressive power in other applications
  • Bill of Materials (BoM) type of applications
  • Temporal reasoning

5
This thesis Many of the problems can be solved
by UDAs
  • User Defined Aggregates (UDAs)
  • insufficient support in commercial world and DB
    standards
  • Our claim UDAs provide a more general and
    powerful mechanism for DB extensions
  • AXL a system to make it easier to define UDAs
  • AXL where SQL and Data Mining intersect

6
A Brief History of AXL (and of my thesis)
  • Logic formalization of aggregates
    DDLP98, LBAI00
  • Early returns, monotonic aggregates used freely
    in recursive queries. Extensions of LDL (Logic
    Database Language)
  • SQL-AG Implementing and extending SQL3 UDAs
    DBPL99
  • Implemented on DB2 using extended user-defined
    aggregates with early returns
  • SADL Simple Aggregate Definition Language
    ICDE00
  • using SQL to define new aggregates
  • easy to use, but with limited performance and
    power
  • AXL Aggregate eXtension Language
    VLDB00
  • powerful, efficient and still SQL-based

7
Defining UDAs in SQL3
  • AGGREGATE FUNCTION myavg(val NUMBER)
  • RETURN NUMBER
  • STATE state
  • INITIALIZE myavg_init
  • ITERATE myavg_iterate
  • TERMINATE myavg_return
  • INITIALIZE gives an initial value to the
    aggregate
  • ITERATE computes the intermediate aggregate
    value for each new record
  • TERMINATE returns the final value computed for
    the aggregate
  • myavg_init, myavg_iterate, myavg_return are 3
    functions that the user must write in a
    procedural programming language

8
Limitation of SQL3 UDAs
  • UDAs in SQL3, Postgres, Informix, and early
    versions of LDL share the same limitations
  • Aggregates can not be used inside recursion
  • No support for early returns and on-line
    aggregation
  • Also Ease of Use is a major issue (except for
    LDL)

9
Ease of Use
  • THE PROBLEM UDFs are very hard to write and
    debug. In unfenced mode they jeopardize the
    integrity of the system. UDAs defined using
    several UDFs are prone to the same problem.
  • A SOLUTION Use a high-level language for
    defining UDAs. But there are many potential
    problems with any new language.
  • THE IDEAL SOLUTION use SQL to define new
    aggregates. Substantial benefits
  • Users are already familiar with SQL
  • No impedance mismatch of data types and
    programming paradigms
  • DB advantages scalability, data independence,
    optimizability, parallelizability

10
Simple Aggregates
  • AGGREGATE avg(value INT) REAL
  • TABLE state(sum INT, cnt INT)
  • INITIALIZE
  • INSERT INTO state (value, 1)
  • ITERATE
  • UPDATE state SET sumsumvalue,
    cntcnt1
  • TERMINATE
  • INSERT INTO RETURN SELECT sum/cnt FROM
    state

11
Avoiding Multiple Scans
  • Show the average salary of senior managers
    who make 3 times more than the average employees.
  • SQL SELECT avg(salary)
  • FROM employee
  • WHERE title senior manager
  • AND salary gt 3 (SELECT avg(salary)
    FROM employee)
  • Two scans of the employee table required
  • With AXL UDAs SELECT sscan(title, salary)
  • FROM employee

12
AXL Using a Single Scan
  • AGGREGATE sscan(title CHAR(20), salary INT)
    REAL
  • TABLE state(sum INT, cnt INT) AS VALUES
    (0,0)
  • TABLE seniors(salary INT)
  • INITIALIZE ITERATE
  • UPDATE state SET sumsumsalary,
    cntcnt1
  • INSERT INTO seniors VALUES(salary) WHERE
    title senior manager
  • TERMINATE
  • INSERT INTO RETURN
  • SELECT avg(s.salary) FROM seniors
    AS s
  • WHERE s.salary gt 3 (SELECT
    sum/cnt FROM state)

13
Ordered Sequences and Time Series
  • We have a sequence of events, each of which
    is active during a certain interval (from, end).
    Find out at which point of time we have the
    largest number of active events.
  • SQL
  • Group-by on the start time of each interval, and
    count!
  • With AXL UDAs SELECT density(from, end)
  • FROM events

14
AXL Using a Single Scan
  • AGGREGATE density(start TIME, end TIME) (time
    TIME, count INT)
  • TABLE state(time TIME, count INT) AS (0, 0)
  • TABLE active(endpoint TIME)
  • INITIALIZE ITERATE
  • DELETE FROM active WHERE endpoint lt start
  • INSERT INTO active VALUES(end)
  • UPDATE state SET timestart, count count 1
  • WHERE count lt (SELECT
    count() FROM active)
  • TERMINATE
  • INSERT INTO RETURN
  • SELECT time, count FROM state

15
Define UDA based on other UDAs
  • Instead of using INITIALIZE/ITERATE/TERMINATE
    structure, we can also build new UDAs on top of
    UDAs already defined.
  • ORDER BY,
  • AGGREGATE sortdensity(from TIME, to TIME) (time
    TIME, count INT)
  • ORDER BY from ASC
  • INSERT INTO RETURN
  • SELECT density(from, to)

16
Early Returns
  • AVG normally converges early an early
    approximation is all is needed in several
    applications
  • Online aggregation means that early returns are
    produced during the computation
  • Many applications e.g., find the local max and
    min in a sequence of values, various temporal
    aggregates, rollups, etc.
  • These might depend on the order same as new
    OLAP extensions in SQL3

17
Return avg for Every 100 Records
  • AGGREGATE olavg(value INT) REAL
  • TABLE state(sum INT, cnt INT)
  • INITIALIZE
  • INSERT INTO state VALUES (value,1)
  • ITERATE
  • UPDATE state SET sumsumvalue, cntcnt1
  • INSERT INTO RETURN
    SELECT sum/cnt FROM state WHERE cnt MOD 100 0
  • TERMINATE
  • INSERT INTO RETURN SELECT sum/cnt FROM state

18
Temporal Coalescing
  • AGGREGATE coalesce(from TIME, to TIME) (start
    TIME, end TIME)
  • TABLE state(cFrom TIME, cTo TIME)
  • INITIALIZE
  • INSERT INTO state VALUES (from, to)
  • ITERATE
  • UPDATE state SET cTo to WHERE cTo gt from
    AND cTo lt to
  • INSERT INTO RETURN SELECT
    cFrom, cTo FROM state WHERE cTo lt from
  • UPDATE state SET cFrom from, cTo to
    WHERE cTo lt from
  • TERMINATE
  • INSERT INTO RETURN SELECT cFrom, cTo FROM
    state

19
Recursive Aggregates
  • In AXL, aggregates can call other aggregates.
    Particularly, an aggregate can call itself
    recursively.
  • AGGREGATE alldesc(P CHAR(20)) CHAR(20)
  • INITIALIZE ITERATE
  • INSERT INTO RETURN
  • VALUES(P)
  • INSERT INTO RETURN
  • SELECT alldesc(Child)
  • FROM children
  • WHERE Parent P

Find all the descendents of Tom SELECT
alldesc(Child) FROM children WHERE Parent
Tom
20
Check Point
  • Simple applications AXL UDAs provide a solution
    with better performance and good ease of use.
  • Many advanced database applications
  • Time Series, Temporal Database, Spatial Database
  • In particular data mining applications

21
Data Mining and Database Systems
  • Current Approach
  • Cursor-based loose-coupling, stored procedures
  • UDFs ease of use problems
  • Cache-Mine
  • Using DBMSs as containers of data
  • Many attempts to closely integrate data mining
    functions into DBMS have shown major problems

22
What we need
  • SQL-aware Data Mining Systems
  • Surajit Chaudhuri Data Mining and Database
    Systems Where is the Intersection? IEEE Data
    Engineering Bulletin, 1998

23
AXL Where SQL and Data Mining Intersects
  • SQL-based language
  • Mining data created on the fly
  • Strong expressive power
  • Better performance

24
Bayesian Classifiers
  • Bayesian Classifier computes a series of
    probabilities

25
Decision Tree Classifiers
Training set tennis
Stream of Column/Value Pairs (together with
RecId and Category)
26
Convert training set to column/value pairs
  • AGGREGATE dissemble(v1 INT, v2 INT, v3 INT, v4
    INT, yorn INT)
  • (col INT, val INT, YorN INT)
  • INITIALIZE ITERATE
  • INSERT INTO RETURN
  • VALUES(1, v1, yorn), (2,v2,yorn),
    (3,v3,yorn), (4,v4,yorn)
  • CREATE VIEW col-val-pairs(recId INT, col INT, val
    INT, YorN INT)
  • SELECT mcount(), dissemble(Outlook, Temp,
    Humidity, Wind, PlayTennis)
  • FROM tennis
  • SELECT classify(recId, col, val, YorN)
  • FROM col-val-pairs

27
Categorical Classifier in AXL
1 AGGREGATE classify(RecId INT, iNode INT,
iCol INT, iValue REAL, iYorN INT) 2 TABLE
treenodes(RecId INT,Node INT, Col INT, Val REAL,
YorN INT) 3 TABLE summary(Col
INT, Value INT, Yc INT, Nc INT, KEY Col,
Value) 4 TABLE mincol(Col INT, MinGini
REAL) 5 TABLE ginitable(Col INT, Gini
REAL) 6 INITIALIZE ITERATE 7
INSERT INTO treenodes VALUES (RecId, iNode, iCol,
iValue, iYorN) 8 UPDATE summary 9
SET YcYciYorN, NcNc1-iYorN
WHERE Col iCol AND Value iValue 10
INSERT INTO summary SELECT iCol,
iValue, iYorN, 1-iYorN WHERE SQLCDE0 11
12 TERMINATE 13 INSERT
INTO ginitable SELECT Col, sum((YcNc)/(YcNc))/su
m(YcNc) 14
FROM summary GROUP BY
Col 15 INSERT INTO mincol SELECT
minpointvalue(Col, Gini) FROM ginitable 16
INSERT INTO result SELECT iNode, Col FROM
mincol 17 SELECT classify(t.RecId,
t.NodeMAXVALUEm.Value1, t.Col, t.Value,
t.YorN) 18 FROM treenodes AS t, 19
(SELECT tt.RecId
RecId, tt.Value Value FROM treenodes tt, mincol
m 20 WHERE
tt.Colm.Col AND m.MinGinigt0 ) AS m 21
WHERE t.RecId m.RecId GROUP BY m.Value 22
23
28
Performance
SPRINT Algorithm AXL vs. C
Categorical Classifier AXL vs. C
29
SPRINT Algorithm in AXL
  • 1 AGGREGATE sprint(iNode INT, iRec INT,
    iCol INT, iValue REAL, iYorN INT)
  • 2 TABLE treenodes(Rec INT, Col INT, Val
    REAL, YorN INT, KEY(Col, Value))
  • 3 TABLE summary(Col INT,
    SplitGini REAL, SplitVal REAL, Yc INT, Nc INT)
  • 4 TABLE split(Rec INT, LeftOrRight INT, KEY
    (RecId))
  • 5 TABLE mincol(Col INT, Val REAL, Gini
    REAL)
  • 6 TABLE node(Node INT) AS VALUES(iNode)
  • 7 INITIALIZE ITERATE
  • 8 INSERT INTO treenodes VALUES (iRec,
    iCol, iValue, iYorN)
  • 9 UPDATE summary
  • 10 SET YcYciYorN,
    NcNc1-iYorN, (SplitGini, SplitVal)
    giniudf(Yc, Nc, N, SplitGini, SplitVal)
  • 11 WHERE ColiCol
  • 12
  • 13 TERMINATE
  • 14 INSERT INTO mincol SELECT
    minpointvalue(Col, SplitGini, SplitVal) FROM
    summary
  • 15 INSERT INTO result SELECT n.Node,
    m.Col, m.Value FROM mincol AS m, node AS n
  • 16 INSERT INTO split SELECT t.Rec,
    (t.Valuegtm.Value) FROM treenodes AS t, mincol AS
    m
  • 17
    WHERE t.Col m.Col AND m.Gini gt 0
  • 18 SELECT sprint(n.Node2s.LeftOrRight,
    t.Rec, t.Col, t.Val, t.YorN)
  • 19 FROM treenodes AS t, split AS s, node
    AS n WHERE t.Rec s.Rec

30
Comparison with Other Architectures
Datasets with 6.6 million records, support level
.25.
31
Implementation of AXL
Standalone Mode
DB2 Add-on Mode
.axl
.axl
.cc
Berkeley DB
.cc
DB2 .lib
SQL
.exe
UDFs
DB2
32
Implementation of AXL
  • Open interface of physical data model.
  • Currently using Berkeley DB as our storage
    manager
  • In memory tables
  • Limited Optimization
  • Using B-Tree indexes to support equality/range
    query
  • Predicate push-down / push-up
  • User Defined Aggregates
  • Hash based
  • Return multiple rows early return
  • Return multiple columns employees name and
    salary

33
Implementation of AXL (contd)
  • Non-blocking aggregation
  • Keeping the state of aggregation between calls to
    the aggregate routines
  • Local tables defined inside aggregation are
    passed as parameters to the aggregates
  • Explicit sorting (and implicit hash-based
    aggregation)
  • AXL V1.2 above 30,000 lines of code

34
Check Point
  • Simple applications AXL UDAs provide a solution
    with better performance and good ease of use.
  • Data Mining applications
  • Formal Semantics of Aggregates and Monotonic
    Aggregation

35
Aggregates in Recursion
  • Stratification
  • shaves(barber, X) - villager(X), ?shaves(X, X).
  • villager(barber).
  • Aggregates? p ? count(p) 0
  • Aggregates in many applications are actually
    monotonic (and should be allowed inside
    recursion).

36
Beyond Stratification
  • Significant previous efforts
  • I. S. Mumick, H. Pirahesh and R. Ramakrishnan,
    The magic of duplicates and aggregates, VLDB
    1990
  • A. Van Gelder, Foundations of aggregates in
    deductive databases, DOOD 1993
  • K. A. Ross and Y. Sagiv, Monotonic aggregates in
    deductive databases, JCSS 1997
  • S. Greco and C. Zaniolo, Greedy algorithms in
    Datalog with choice and negation, JICSLP 1998

37
Formal Semantics of Aggregates
  • choice((X),(Y))
  • Enforcing functional dependency. FD X-gtY
  • Multiplicity of stable models, monotonic
    transformation
  • Ordering a domain
  • Once (X,Y) is generated, choice ensures this is
    the only arc leaving source node X and entering
    sink node Y
  • Formal semantics of UDA
  • return(Y,V) - ordered(X,Y), ?ordered(Y,_),
    terminate(Y,V).

38
Early Returns ? Monotonic Aggregates
Aggregates with only early returns and no
final returns are monotonic w.r.t. set
containment
  • AGGREGATE mcount() INT

  • TABLE state(cnt INT) AS VALUES (0)
  • INITIALIZE ITERATE
  • UPDATE state SET cntcnt1
  • INSERT INTO RETURN SELECT cnt FROM state

39
Early Returns ? Monotonic Aggregates
SELECT mcount() FROM employee
v.s. SELECT count() FROM employee
mcountJohn, Mary, Tom ? 1,2,3 countJohn,
Mary, Tom ? 3
mcountJohn, Mary, Tom, Jerry ? 1,2,3,4
countJohn, Mary, Tom, Jerry ? 4
40
Return sum at the nth value
  • AGGREGATE sumat(value INT, n INT) INT
  • TABLE state (sum INT, cnt INT) AS VALUES (0,0)
  • INITIALIZE ITERATE
  • UPDATE state SET sumsumvalue, cntcnt 1
  • INSERT INTO RETURN
  • SELECT sum FROM state WHERE
    cnt n

41
Extended UDAs in DBMSs
  • A logic-based formalization of aggregates
  • Simple characterization of monotonic aggregates
  • Monotonic aggregates can be used freely inside
    recursion
  • BoM, Company Control, Join-the-Party
  • Greedy optimization algorithms, such as
    Dijkstras single source shortest path.

42
Monotonic Aggregation
  • Monotonic aggregates can be used without any
    restriction and without changing the underlying
    implementation.
  • This solves the problem that had eluded database
    researchers since the introduction of relational
    systems
  • BoM, Company Control, Join-the-Party
  • Greedy optimization algorithms, such as
    Dijkstras single source shortest path.

43
Join-the-Party Problem
Some people will come to the party no matter
what, and their names are stored in a sure(PName)
relation. But many other people will join only
after they know that at least K3 of their
friends will be there.
WITH wllcm(Name) AS((SELECT Pname FROM
sure)UNION ALL (SELECT f.Pname FROM friend
AS f, wllcm AS w WHERE w.Name f.Fname
GROUP BY f.Pname HAVING mcount()3))SELECT
Name FROM wllcm
Density-based Clustering M. Ester et al. KDD
96
44
BoM Cost of Parts
001
assembly
1
5
1
004
3
002
003
100
2
3
part-cost
5
10
8
006
fan-out
005
45
BoM Using AXL
  • Bottom up solution.
  • Computes the cost for each part once and only
    once.
  • Monotonic sumat(cost, n) returns sum when exactly
    n items are aggregated.
  • Works in DB2 after AXL rewrites callings of
    sumat() to callings of automatically generated
    UDFs.
  • WITH cst(part, cost) AS
  • ((SELECT part, cost
  • FROM part-cost)
  • UNION ALL
  • (SELECT a.part, sumat(a.qty
    c.cost, p.ChC)
  • FROM assembly AS a, cst AS c, fan-out AS p
  • WHERE a.subpart c.part
  • AND p.part a.part
  • GROUP BY a.part))
  • SELECT part, cost
  • FROM cst

46
BoM Using Recursive SQL
  • WITH mpath(subpart, qty) AS
  • ((SELECT subpart, qty
  • FROM assembly
  • WHERE part 001)
  • UNION ALL
  • (SELECT c.subpart, m.qty c.qty
  • FROM mpath m, assembly c
  • WHERE m.subpart c.part))
  • SELECT sum(m.qty c.cost)
  • FROM mpath m, part_cost c
  • WHERE m.subpart c.part
  • Top down solution computes the cost of part
    001
  • Explosion all edges that descend from part 001
    are stored in mpath
  • What if we want to compute the cost of each part?

47
Check Point
  • Simple applications
  • Data Mining and Decision Support
  • Formal Semantics Monotonic aggregates
  • OLAP and other aggregate extensions
  • SUCH THAT
  • CUBE, ROLLUP, GROUPING SET
  • OLAP Functions

48
Such That
  • For each division, show the average salary of
    senior managers who make 3 times more than the
    average employees, and the average salary of
    senior engineers who make 2 times more than the
    average employees (in the same output record).
  • D. Chatziantoniou, Kenneth Ross, VLDB 1996
  • SELECT division, avg(X.salary),
  • avg(Y.salary)
  • FROM employee
  • GROUP BY division X, Y
  • SUCH THAT X.title senior manager AND
    X.salary gt 3avg(salary) AND Y.title senior
    engineer AND Y.salary gt 2avg(salary)

49
Expressing Such That in AXL
  • TABLE seniors(salary INT)
  • AGGREGATE sscan2(title CHAR(20), salary INT,
    qtitle CHAR(20), ratio INT) REAL
  • TABLE state(sum INT, cnt INT) AS VALUES
    (0,0)
  • INITIALIZE ITERATE UPDATE state SET
    sumsumsalary, cntcnt1 INSERT INTO
    seniors VALUES (salary) WHERE title qtitle
  • TERMINATE
  • SELECT avg(s.salary)
  • FROM seniors AS s
  • WHERE s.salary gt ratio (SELECT sum/cnt
    FROM state)

50
Using UDA sscan2
  • SELECT division, sscan2(title, salary, senior
    manager, 3),
  • sscan2(title, salary, senior engineer, 2)
  • FROM employee
  • GROUP BY division
  • No joins or sub-queries required.
  • One pass through the employee relation (standard
    SQL requires at least 2 passes).

51
Other Aggregate Extensions
  • GROUPING SETS, ROLL-UP, CUBE
  • New OLAP extensions
  • Windows containing a partitioning, an ordering of
    rows and an aggregate group
  • every standard must be prepared to tackle new
    issues that arise as the market evolves. If SQL
    does not respond positively to this challenge,
    SQL risks becoming irrelevant -- F. Zemke,
    K. Kulkarni, A. Witkowski, B. Lyle Introduction
    to OLAP Functions

52
Building New UDAs Based On UDAs Already Defined
  • AGGREGATE gssum(territory CHAR(10), product
    CHAR(10), sum INT) (territory CHAR(10), product
    CHAR(10), sum INT)
  • SELECT territory, -all-, sum(sales) GROUP
    BY territory
  • SELECT -all-, -all-, sum(sales)
  • SELECT -all-, product, sum(sales) GROUP
    BY product
  • Equivalent to the GROUPING SET construct
  • SELECT territory, product, sum(sales)
  • FROM data
  • GROUP BY GROUPING SET ((territory), (product),
    ())

53
Ordering Input Stream
  • Some aggregates require that the input stream
    ordered
  • AGGREGATE gssum(territory CHAR(10), product
    CHAR(10), sum INT)
    (territory CHAR(10), product CHAR(10), sum INT)
  • SELECT -all-, product, sum(sales) GROUP BY
    product
  • SELECT -all-, -all-, sum(sales)
  • SORT BY territory ASC
  • SELECT territory, -all-, sum(sales) GROUP BY
    territory
  • We can also use SORT BY GROUPBY to inform the
    compiler that all the group by columns to the
    aggregate should be ordered.

54
OLAP Functions
  • An OLAP function is defined using a window, which
    may specify a partitioning, an ordering of rows
    within partitions, and an aggregate group.

SELECT d.product, d.territory
d.sales / sum(d.sales) OVER Whole
d.sales / sum(d.sales) OVER By_product,
d.sales / sum(d.sales) OVER
By_territory FROM data AS d WHERE year
1999 WINDOW Whole AS (),
By_product AS (PARTITION BY d.product),
By_territory AS (PARTITION BY
d.territory)
55
Rewrite OLAP Functions in AXL
  • AGGREGATE mwin(territory CHAR (10), product
    CHAR(10), sales INT)
  • (territory CHAR(10), product
    CHAR(10), p0 REAL, p1 REAL, p2 REAL)
  • TABLE summary (territory CHAR(10), product
    CHAR(10), sum INT) AS
  • (SELECT gssum(territory,
    product, sales) FROM data)
  • INITIALIZE ITERATE
  • INSERT INTO RETURN
  • SELECT territory, product,
    sales/s1.sales, sales/s2.sales, sales/s3.sales
  • FROM summary AS s1, s2, s3
  • WHERE s1.territory
    territory AND s1.produce -all-
  • AND s2.territory
    -all- AND s2.product product
  • AND s3.territory
    -all- AND s3.product -all-

56
Contributions
  • Adding extended UDAs to O-R systems
  • high level language, minimal additions to SQL
  • monotonic aggregates, recursive aggregates
  • designed for general purpose applications
  • Tightly couple data mining functions with DBMS
  • SPRINT Algorithm, Categorical Classifier,
  • Performance
  • More efficient than cursor-based languages like
    PL/SQL, JDBC and UDF-based approaches

57
Future Directions
  • Parallelization
  • Extenders/Data Blades
  • build on top of UDAs instead of UDFs
  • Decision Support
  • the Apriori algorithm
  • Windows and OLAP functions
  • Spatial/Temporal extensions
  • the TENOR system

58
Future Direction Parallelization
  • Current parallel aggregation algorithms valid for
    AXL
  • Inter-Partition parallelism all tuples of the
    same group-by value are in one node
  • Two phase algorithm user provides a COMBINE
    routine
  • Unlike SQL3, AXLs aggregate routines are written
    in SQL, so we can apply traditional query
    parallelization techniques to INITIALIZE,
    ITERATE, and TERMINATE
  • Since aggregate routines are written in SQL, the
    COMBINE routine can be generated automatically by
    the system for simple UDAs

59
SQL X Data Mining
  • X Procedural Extension, e.g. PL/SQL
  • X C/C
  • A minimalists point of view
  • X extended User Defined Aggregates

60
Conclusion
  • UDAs are more useful than UDFs in many
    applications
  • AXL is an SQL-like language for developing UDAs
  • AXL is a powerful database language
  • more efficient than cursor-based languages like
    PL/SQL, JDBC, and etc.
  • AXL is suitable for Decision Support applications
  • Future work optimization, parallelizability and
    O-R extensions
Write a Comment
User Comments (0)
About PowerShow.com