Integrating Query Processing and Data Mining in Relational DBMSs - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Integrating Query Processing and Data Mining in Relational DBMSs

Description:

Integrating Query Processing and Data Mining in Relational DBMSs Qiang Ding (North Dakota State University) William Perrizo (ditto) Victor Shi (ditto) – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 18

Provided by: DAT110

Learn more at: http://www.cs.ndsu.nodak.edu

Category:

more less

Transcript and Presenter's Notes

Title: Integrating Query Processing and Data Mining in Relational DBMSs

1
Integrating Query Processing and Data Mining
in Relational DBMSs

Qiang Ding (North Dakota State University)
William Perrizo (ditto)
Victor Shi (ditto)
Kirk Scott (University of Alaska)

2
Introduction

Our Goal
To optimize data mining and query processing
together
A unified approach
To minimize I/O
To reduce disk storage (compression)

3
Introduction (Cont.)

Vertical Partitioning
Decomposition Storage Model(DSM, Copeland et al)
Attribute Transposed File (ATF)
Band Sequential (BSQ)
Bit Transposed File (BTF, Wang et al)
bSQ P-tree

4
P-trees

Represent data bit-by-bit in a recursive
quadrant-by-quadrant arrangement
Lossless representations of the original data
Facilitate compression and fast ANDing

5
bSQ, 2-D Peano order, and P-trees
11111100111110001111110011111110111100001111000011
11000001110000
6
SPJ Queries

Consider a SPJ query involving more than one join
Constellation model
Our strategy
Selection masks
Semi-joins
Full elimination of all non-participants

7
An Example
SELECT DISTINCT C.c, R.capacityFROM S, C, E, O,
RWHERE S.sE.s AND C.cO.c AND O.oE.o
AND O.rR.r AND C.credgt1 AND (E.grade'B'
OR E.grade'A') AND R.capacitygt10 AND
S.gen'F'ORDER BY C.c DESC
C
S
s ngen 0 000AM 0 1 001TM 0 2 010SF
1 3 011BF 1 4 100CF 1 5 101JF 1
c ncred 0 00B1 01 1 01D3 11 2 10M3 11 3
11S2 10
E
s o grade 0 0001 001B 10 0 0000 000A
11 3 0111 001A 11 3 0113 011D 00 1 0013
011D 00 1 0010 000B 10 2 0102 010B 10 2
0103 011A 11 4 1004 100B 10 5 1015 101B
10
O
o c r 0 0000 000 01 1 0010 001 01 2
0101 010 00 3 0111 011 01 4 1002 100 00 5
1012 102 10 6 1102 103 11 7 1113 112 10
R
r capacity 0 0030 11 1 0120 10 2 1030
11 3 1110 01
8
Full Vertical Partitioning
Ss1 Ss2 Ss3 Sgen Sn 0011 0000
0101 0011 ATSBCJ 00 11 01
11 Es1 Es2 Es3 Eo1 Eo2 Eo3
Egrade1 Egrade2 0000 0000 0011 0000 0010
1010 1101 0100 0000 1111 1100 0000 0111
1101 1011 1001 11 00 01 11 00
01 11 00 Cc1 Cc2 Ccred1
Ccred2 Cn 00 01 01 11
BDMS 11 01 11 10 Oo1 Oo2 Oo3
Oc1 Oc2 Or1 Or2 0011 0000 0101
0011 0000 0001 1100 0011 1111 0101
0011 1101 0011 0110 Rr1 Rr2
Rcap1 Rcap2 00 01 11 10 11
01 10 11
9
Applying Selection Masks
selection masks mE Egrade1 mR Rcap1 mC
Ccred1 mS Sgen 1101 11 01
0011 1011 10 11 11 11
results in, Es1 Es2 Es3 Eo1
Eo2 Eo3 Ss1 Ss2 Ss3 000 000 001
000 000 100 11 00 01 000 111
100 000 011 101 00 11 01 11
00 01 11 00 01 Rr1 Rr2
Cc1 Cc2 00 01 0
1 1 0 11 01
10
Semijoining Toward Center
Semijoining toward center S?E(on
s2,3,4,5) E?O(on o0,1,2,3,4,5), R?O(on
r0,1,2), C?O(on c1,2,3) Oo1 Oo2 Oo3 Oc1
Oc2 Or1 Or2 0011 0000 0101 0011 0000
0001 1100 0011 1111 0101 0011 1101 0011
0110 Oo1 Oo2 Oo3 Oc1 Oc2 Or1 Or2
0011 0000 0101 11 00 0001 1100 00
11 01 0011 1101 001 010 Thus, the
participants are o2,3,4,5.
11
Semijoining Back
Semijoining back again produces Cc1 Cc2
Rr1 Rr2 0 1 00 01 1
0 1 0 Es1 Es2 Es3
Eo1 Eo2 Eo3
00 11 00 00 11 01 11
00 01 11 00 01 Thus the
participants are c1,2 r0,1,2 s2,4,5. Ss1
Ss2 Ss3 11 00 01 0
1 0
12
Generating Output
C.c 2 C.c 1 Oc1 Oc2
Oc1 Oc2 11 11 11
00 00 00 00 00 00
11 11 11 O.r 0, 2
O.r 0, 1 Semijoin to R R.capacity
R.capacity 30
30, 20 Final output c capacity
2 30 1 30
1 20
13
Data Mining Operations

P-tree-based mining algorithms
Association, Classification, and Clustering
Faster and/or more accurate
P-trees data-mining ready compressed data
structures
P-ARM, Closed P-KNN

14
Data Mining Using P-trees P-ARM
15
Data Mining Using P-trees P-KNN
16
Integrating Query Processing and Data Mining

Without necessitation the creation of a massive
universal relation
Full vertical partitioning
Saving space
Efficiently and directly (boolean operations)

17
Conclusion

SPJ strategies can be combined with proven data
mining strategies in a unified way
Achieved by using P-trees
Complete vertical decomposition
Only participating fields are retrieved
Fast and accurate
I/O minimized
Indexes eliminated

Write a Comment

User Comments (0)