Title: Mining for Empty Rectangles in Large Data Sets
1Mining for Empty Rectangles in Large Data Sets
Jeff Edmonds Jarek Gryz Dongming Liang Renee
Miller
2Matrix representation
?A,B(R
S)
3Find All Maximal 0-Rectangles
?A,B(R
S)
4Example
?A,B(R
S)
95 96 97
0
0
0
0
1
BMW Z3
1
0
0
Honda L2
0
0
1
Toyota 6A
First BMW Z3 series cars were made in 1997.
5Relation to Previous Work
Namaad, Hsu, Lee
Our Work
Lui, Ku, Hsu Orlowski
Problem
Purpose
of maximal 0-rectangles
6Relation to Previous Work
Namaad, Hsu, Lee
Our Work
Lui, Ku, Hsu Orlowski
Time
Space
7Relation to Previous Work
Namaad, Hsu, Lee
Our Work
Lui, Ku, Hsu Orlowski
Practical Implementation
Scalable
Practical?
8Structure of Algorithm
- loop y 1..Y
- loop x 1..X
- Construct staircase(x,y)
- Output all maximal 0-rectangles
- with ltx,ygt as bottom-right corner
-
1
X
Y
1
Timing O(1) amortized time per ltx,ygt
1
0
1
1
ltx,ygt
1
9Structure of Algorithm
- loop y 1..Y
- loop x 1..X
- Construct staircase(x,y)
- Output all maximal 0-rectangles
- with ltx,ygt as bottom-right corner
-
1
X
Y
1
Query Optimization Experimental Results
1
0
1
1
ltx,ygt
1
10Staircase(x,y)
Staircase(x,y)
1
Y
1
ltx,ygt
X
11Constructing Maximal Rectangles
ltx,ygt
12Constructing Maximal Rectangles
- Too Narrow
- Maximal
- Too short
ltx,ygt
13Constructing staircase(x,y)from staircase(x-1,y)
1
1
0
Case 1
0
0
1
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
0
ltx-1,ygt
1
0
1
0
0
0
0
14Constructing staircase(x,y)from staircase(x-1,y)
1
Case 2
1
1
1
0
1
0
1
0
0
0
0
1
0
ltx-1,ygt
1
0
1
0
0
0
0
15Constructing staircase(x,y)from staircase(x-1,y)
1
- Too Narrow
- Maximal
- Too short
( x ,y )
r
r
1
1
Y
1
1
0
0
1
0
0
0
0
0
1
0
( x ,y )
1
1
ltx-1,ygt
1
0
( x, y )
1
0
0
0
0
X
16Constructing x(x,y) y(x,y)
1
( x ,y )
r
r
1
1
1
0
1
0
0
1
0
0
0
0
0
1
0
( x ,y )
1
1
ltx-1,ygt
1
0
( x, y )
x(x-1,y)
1
0
0
0
0
17Constructing x(x,y) y(x,y) from x(x-1,y)
y(x,y-1)
1
( x ,y )
r
r
1
y(x,y-1)
1
1
(saved)
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
( x ,y )
1
1
ltx-1,ygt
1
0
( x, y )
x(x-1,y)
1
0
0
0
0
18Structure of Algorithm
- loop y 1..Y
- loop x 1..X
- Construct staircase(x,y)
- Output all maximal 0-rectangles
- with ltx,ygt as bottom-right corner
-
1
X
Y
1
Timing O(1) amortized time per ltx,ygt
1
0
1
1
ltx,ygt
ltx.ygt
1
19Timing
Only work that is not constant Time
Delete
1
- Too Narrow
- Maximal
- Too short
( x ,y )
r
r
1
1
Y
1
1
0
0
0
1
0
0
0
0
0
1
0
( x ,y )
1
1
ltx,ygt
1
0
( x, y )
1
0
0
0
0
X
20Timing
Amortized of steps deleted (per ltx,ygt)
of steps created (per ltx,ygt) 1
21Number of Maximal Rectangles
of maximal 0-rectangles
- O( ( 1s)2 ) Namaad, Hsu, Lee
- Running time of alg O( 0s )
22How many empty rectangles are there?
Tests done on 4 pairs of attributes with
numerical domain present in typical joins in a
real-world workload of a health insurance
company.
23How big are the rectangles?
24Query rewrite simple case
select from R, S,... where R.CS.C and
60ltR.Alt80 and 20ltS.Blt80 and...
select from R, S,... where R.CS.C and
60ltR.Alt80 and 20ltS.Blt60 and...
25Query rewrite complex case
select from R, S,... where R.CS.C and
60ltR.Alt80 and 20ltS.Blt80 and...
select from R, S,... where R.CS.C and (
and ) or ( and ) or ( and ) or ...
26How much do the rectangles overlap with queries?
27Query optimization experiments
- real-world workload of 26 queries
- 5 of the queries qualified for the rewrite
- only simple rewrites were considered
- all rewrites led to improved performance