GiST - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

GiST

Description:

Compress/Decompress Method? Key storage vs. search time tradeoff. Compress(E): E -- Ec. Decompress(Ec): -- E'.p can be 'looser' than E.p (less pruning power) ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 32
Provided by: poste5
Category:
Tags: gist | decompress

less

Transcript and Presenter's Notes

Title: GiST


1
GiST
  • Yong

Most slides are adapted from ChengXiang Zhais
lecture slides
2
My taste
My slides
Zhais Lecture Slides
3
Not from the scratch
Domain-specific data type Queries definable
B-tree R-tree R -tree kDB-tree hB-tree RD-tree
Your own tree
Your methods
GiST (Generalized Search Tree) Data Structure
API
4
GiST Generalized Search Tree
  • General cover B-tree, R-tree, etc
  • Extensible
  • domain-specific data types queries definable
  • Easy to extend six key methods for a new tree
  • Efficient can match specialized trees
  • Reusable concurrency, recovery for indexes

5
Example Indexing Book Titles
  • You are a programmer in Postech Digital Library

Develop a library book search system
XX
6
O.K. Lets Start
Titles of 4 books - T1 database
optimization - T2 web database - T3
complexity of optimization
algorithms - T4 algorithms and complexity
Indexable with (extensible) B-tree? linear
ordering T4, T3, T1, T2
7
Extensible B-Tree for Titles
d
c
w
T4 alg.
T3 complexity
T1 database
T2 web
  • Observations
  • indexed values have linear ordering T4, T3, T1,
    T2
  • keys are simply separators T4, c, T3, d, T1, w,
    T2

8
Queries on Titles
  • Equality predicates
  • WHERE book.title web databases
  • Containment predicates
  • WHERE book.title has web
  • Prefix predicates
  • WHERE book.title start-with web
  • RegEx predicates (generalize all the others)
  • WHERE book.title like web database

9
Using B-Tree Whats Wrong?
  • What predicates can Btree support well?
  • equality, containing, prefix, regex?

d
c
w
T4 alg.
T3 complexity
T1 database
T2 web
10
New Index
  • index pages on disk
  • the algorithms
  • for searching indexes
  • deleting from indexes,
  • complex transactional details
  • page-level locking
  • for high concurrency
  • write-ahead logging
  • for crash recovery

11
We need API for New Search Index
12
GiST Generalizing Balanced Search Trees
  • GiST is not universal (just reasonable
    generalization)
  • balanced tree of ltkey, ptrgt pairs, keys can
    overlap
  • B-Tree R-Tree R-Tree GiST
  • What is the key generalization?

key1 key2

internal nodes (directory)
leaf nodes (linked list)
13
The Key Generalization The Key
  • Key evolution 1-D separator --gt 2-D MBR --gt
    predicates
  • R-Tree B-Tree
  • generalizing key from 1-D line to 2-D area
  • bounding range to (minimal) bounding region
  • GiST R-Tree
  • generalizing key from 2-D MBR to predicates
  • a predicate that all values v in subtree will
    satisfy
  • B-tree keys
  • k1k2) --gt contains(k1k2), v)
  • R-tree keys
  • (x1,y1,x2,y2) --gt contains((x1,y1, x2,y2), v)
  • RD-tree keys
  • x1,xk ? subset(x1,,xk,v)

14
Gist for Title Indexing Predicates
  • Must first determine predicates
  • What query predicates to support?
  • equality equal(v, web db)
  • containing has(v, web)
  • What key predicate to use?
  • Criteria for choosing key predicates?
  • What do you suggest?

15
GiST for Title Indexing Predicates
  • Key predicates Contains(S, v)

SL
SR
alg, comp, opt
db, opt, web
SLL
SLR
SRL
SRR
alg, comp
comp, opt
db, opt
db, web
T4 alg.
T3 complexity
T1 database
T2 web
16
GiST Built-in Tree Operations
  • Search(root R, predicate q)
  • Insert(root R, entry E, level l)
  • Delete(root R, entry E)

17
GiST Application-Specific Methods
  • Search
  • Consistent(E, q) search subtree E for predicate
    q?
  • Labeling
  • Union(E1, , En) how to label the union of E1,
    , En?
  • Categorization
  • Penalty(E1, E2) penalty for inserting E2 in
    subtree E1
  • PickSplit(E1, , En) how to split into two
    groups of entries
  • Compression (storage/time tradeoff)
  • Compress(E) E --gt Ec
  • Decompress(Ec) --gt E such that E.p implies E.p

18
Search Operation Consistent Method
  • Search(root R, predicate q)
  • traverse subtrees where Consistent true
  • return leaf entries that are consistent

19
Consistent Method
  • Consistent(E, q)
  • Can E.p and q both hold?
  • Does E.p imply (not q)?
  • Title GiST
  • key predicate p Contains(S, v) or simply S
  • e.g., SL alg, comp, opt
  • e.g., SR db, opt, web
  • Consistent(SL, has(v, web))?
  • how to implement? SLnweb ? Ø
  • Consistent(SR, equals(v, web database))?
  • how to implement? SL web database

20
Insert Operation
  • Insert(root R, entry E, level l )
  • descend tree minimizing potential increase in
    Penalty
  • stop at level specified
  • if there is room at node, insert there
  • else split according to PickSplit
  • propagate changes using Union to adjust keys

21
Title GiST Insert
  • Where to insert T5complexity of web algorithms
    ?

SL
SR
alg, comp, opt
db, opt, web
SLL
SLR
SRL
SRR
alg, comp
comp, opt
db, opt
db, web
T4 alg.
T3 complexity
T1 database
T2 web
22
Penalty Method
  • Penality(E1, E2)
  • penalty for inserting E2 in subtree E1
  • Title GiST
  • E2 with S comp,web, alg (i.e., T5complexity
    of web algorithms)
  • Where to insert?
  • root SL alg, comp, opt vs. SR db, opt,
    web?
  • Penalty
  • how to implement? E1?E2 - E1

23
PickSplit Method
  • PickSplit(E1, , En)
  • how to split into two groups of entries
  • Title GiST
  • suppose we have 3 entries (after an Insert)
  • S1 alg, comp
  • S2 comp, opt
  • S3 comp, web, alg (new)
  • ? how to split S1, S2, S3 into two?
  • something similar to R-tree algorithm will do

24
Union Method
  • Union(E1, , En)
  • Generates a label for the subtree with E1, , En
  • Title GiST
  • key predicate p Contains(S, v) or simply S
  • S1 alg, comp, S2 comp, opt
  • Combined key alg, comp, opt
  • Union(E1(SL, ptr1), E2(SR, ptr2)) ?
  • how to implement? ?

25
Compress/Decompress Method?
  • Key storage vs. search time tradeoff
  • Compress(E) E --gt Ec
  • Decompress(Ec) --gt E.p can be looser than E.p
    (less pruning power)
  • Lossy compression may need more time for search

26
Title GiST Compress/Decompress
  • Example 1 no compression
  • Compress(E) --gt Ec E
  • Decompress(Ec) --gt E Ec
  • Example 2 compress by taking word initials
  • Compress
  • algorithm, complexity, optimization --gt al,
    co, op
  • Decompress
  • al, co, op --gt al, co, op

27
GiST No Magic
  • It offers (only) what its model is based on
  • It does not represent all possible index
    structures
  • e.g. duplicate objects by multiple inserts
    (R-tree)
  • e.g. support notion of distance and similarity
  • rather than Boolean based predicates
  • any more?

28
What You Should Know
  • What is GiST?
  • What are the six key methods?
  • How does GiST generalize other more specialized
    trees?
  • What are some limitations of GiST?

29
Carry Away Messages
  • Once again, generalize whenever its possible
  • 1-dimension indexing (B-tree, interval-based)-gt
    Multi-dimension indexing (R-tree, region-based)
    -gt Arbitrary objects (GiST, predicate-based)
  • Avoid over-generalization
  • While predicate is quite general, it doesnt
    guarantee pruning power
  • Wheres the notion of bounding in GiST?
  • Whenever you see yet another X, think about
    possibilities for a more general formulation of X

30
Usage GiST
  • PostgreSQL Spatial Indexes
  • http//www.cmarschner.net/mtree.html

31
QA
Write a Comment
User Comments (0)
About PowerShow.com