Finding skyline on the fly - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Finding skyline on the fly

Description:

A new operator (like 'ORDER BY') in database systems ... We use a linear regression based heuristic to minimize the number of source ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 32
Provided by: Eri6
Category:

less

Transcript and Presenter's Notes

Title: Finding skyline on the fly


1
Finding skyline on the fly
  • HKU CS DB Seminar
  • 21 July 2004
  • Speaker Eric Lo

2
Skyline
  • A new operator (like ORDER BY) in database
    systems
  • A set of data points that is not dominated by any
    other data points

3
Example
  • Find some good places for us to hold the next DB
    Seminar
  • Good ? Closer to HKU (Min)
  • Good ? Larger Area (Max)
  • Return those homes that are not worse than any
    others in ALL DIMENSIONS
  • Dataset (Table Homes)

4
Outline
  • Introduction to skyline queries
  • Non-progressive skylining on the Web
  • Basic Distributed Skyline Algorithm (BDS)
  • Progressive skylining on the Web
  • Experimental result
  • Conclusion and future directions

5
Skylining on the Web
  • One distributed site holds one attribute
  • Attribute Distance from HKU stored at HKU
  • Attribute Area (m2) stored at Purdue

Internet
HKU
Purdue
6
Accessing interfaces
Purdue
Internet
HKU
  • Interfaces of Web-accessible sites
  • Sorted Access (SA)
  • HKU?getNext() returns rank 1st data tuple K.K
    Loo
  • HKU?getNext()? 2nd Ivy , HKU?getNext()?3rd
    Nikos, .
  • Random Access (RA)
  • Purdue?getScore(K.K. Loo) ? 10 m2
  • HKU?getScore(Nikos) ? 8 km

7
Basic distributed skyline algorithm (EDBT 04)
  • Phase 1 find all possible skyline
  • Perform sorted access on each source 1-by-1
  • S1?getNext(), S2?getNext(), S3?getNext()
  • S1?getNext(), S2?getNext() .
  • .
  • Stop until there is an object which attribute
    values are all known

8
Phase 1
  • f is the terminating object

9
Phase 1 (15 sorted accesses)
10
Implication
  • f is the terminating object ? Objects that do not
    appear must be dominated by f

11
Phase 2
  • Find skyline from candidates in phase 1
  • During sequential scanning of sources, data
    structures K1, K2, K3, , Kn are created
  • n is the no. of dimension
  • If source i?getNext() returns a data object d
  • create an entry in Ki
  • update the lower_bound of the source i

12
Phase 2 find skyline from candidates Ki
  • A lemma shows that Objects can only be dominated
    by objects in the same set Ki

13
Motivations
  • BDS returns skyline results in a batch
  • In practice, it would be useful to return skyline
    results progressively such that users could
    adjust their decisions right away
  • Consider the next DB seminar skyline example
  • minimize Distance from HKU, maximize Area
  • ltNikos 8km, 250m2gt is first returned
  • From HKU to Nikoss home needs to take a 50 bus!
  • Add the travel-expense attribute into the
    skyline query

14
Progressive Distributed Skylining (PDS)
  • Goal
  • Evaluates skyline queries progressively with
    minimal overhead
  • Overhead
  • Network/Data source accesses
  • Computational time

15
Enable progressiveness
  • To identify a data point belongs to the final
    skyline or not, we rely on the following lemma
    (assume the data values are distinct)
  • If a data source Di returns data objects in a
    strictly monotonic order, an object O retrieved
    from Di would only be dominated by objects that
    are retrieved from Di before O

16
  • If an object O is retrieved from a data source by
    sorted access, we could only need to test if O is
    dominated by any objects that appears before O in
    the same source only
  • 2 usages
  • We dont need to consider objects appear in other
    data sources
  • After the test, we can output O as a skyline
    immediately ? O must be a skyline, we do not need
    to worry about objects appear later would
    dominate O

17
An R-tree approach
  • Build an r-tree Ri for each attribute/data source
    i involved in the skyline query
  • For each object O retrieved from source i, we
    check to see if any object in Ri dominates O
  • If no such objects exists, O is a skyline (output
    it immediately)
  • If some objects dominates O in Ri, O is not a
    skyline object (O is discarded immediately)

18
D3.getNext() the 1st time
D2
e(7,4)
D1
  • SA on D3 returns elt1gt
  • e is a skyline (no object is better than e on
    D3), e(7,4) is projected into r-tree R3

19
D3.getNext() the 2nd time
c(2,5)
e(7,4)
  • SA on D3 returns clt2gt
  • Construct a query Q(origin, c) on R3
  • Q returns no answer ? c is a skyline ? insert c
    into R3

20
D3.getNext() the 3rd time
j(6,10)
c(2,5)
e(7,4)
  • SA on D3 returns jlt3gt
  • Construct a query Q(origin, j) on R3
  • Q returns c as an answer ? j is dominated by c ?
    discard j

21
D3.getNext() the 4th time
c(2,5)
e(7,4)
  • SA on D3 returns flt4gt, construct a query
    Q(origin, f) on R3
  • Q returns no answer ? f is a skyline
  • Delete e after insertion of f to make the R-tree
    more compact and efficient

22
The R-tree approach
  • The R-tree is very small in size since it stores
    skyline objects with highest pruning power
  • Containment query operation is very efficient

23
A linear regression based heuristic
  • The R-tree approach enable progressiveness with
    better efficiency
  • We use a linear regression based heuristic to
    minimize the number of source accesses during the
    evaluation process

24
A rank based approach
  • We use linear regression to estimate the rank of
    objects along the process
  • Assume the object with lowest rank is the real
    terminating object and probe the sources
    accordingly (rather than round-robin)

25
Extensions
  • Evaluation of top-K skyline queries
  • Progress indicator (based on the estimated ranks)

An clipart of Kevin Yip
26
Experimental results Number of source accesses
27
Experimental results Number of source accesses
Random Distribution
Denormalized Domain
28
Experimental results progressive behavior
29
Experimental results progress indicator
30
Conclusion and future directions
  • Skyline queries on the Web
  • Return skyline points on-the-fly
  • Future work
  • Improve the usability of PDS by allowing the
    users to barter between progressiveness and
    efficiency
  • Compute skyline from real-time stream data
  • Only 1 data source supports sorted access and the
    rest support random access only

31
References
  • S.Borzonyi, D.Kossmann, K.Stocker, The Skyline
    Operator, in ICDE 2001.
  • D.Kossmann, F.Ramsak, S. Rost, Shooting Stars in
    the Sky An Online Algorithm for Skyline Queries,
    in VLDB 2002.
  • W.T.Balke, U.Guntzer, J.X. Zheng, Efficient
    Distributed Skylining for Web Information
    Systems, in EDBT 2004
Write a Comment
User Comments (0)
About PowerShow.com