Spatial data structures –kd-trees - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Spatial data structures –kd-trees

Description:

SPATIAL DATA STRUCTURES KD-TREES Jianping Fan Department of Computer Science UNC-Charlotte DECISION TREE Database indexing structure is built for decision making ... – PowerPoint PPT presentation

Number of Views:803
Avg rating:3.0/5.0
Slides: 42
Provided by: coitwebU1
Category:

less

Transcript and Presenter's Notes

Title: Spatial data structures –kd-trees


1
Spatial data structures kd-trees
  • Jianping Fan
  • Department of Computer Science
  • UNC-Charlotte

2
Summary
  • This lecture introduces multi-dimensional queries
    in databases, as well as addresses how we can
    query and represent multi-dimensional data
  • A reasonable man adapts himself to his
    environment. An unreasonable man persists in
    attempting to adapt his environment to suit
    himself Therefore, all progress depends on
    unreasonable man
  • George Bernard Shaw

3
Contents
  • Definitions
  • Basic operations and construction
  • Range queries on multi-attributes
  • Variants
  • Applications

4
Usage
  • Rendering
  • Surface reconstruction
  • Collision detection
  • Vision and machine learning
  • Intel Interactive technology

5
KD tree definition
  • A recursive space partitioning tree.
  • Partition along x and y axis in an alternating
    fashion.
  • Each internal node stores the splitting node
    along x (or y).

6
K-d tree
  • Used for point location and multiple database
    quesries, k number of the attributes to perform
    the search
  • Geometric interpretation to perform search in
    2D space 2-d tree
  • Search components (x,y) interchange!

7
K-d tree example
d
d
e
c
f
b
f
c
a
e
b
a
8
Kd tree example
9
3D kd tree
10
Construction
  • The canonical method of kd-tree construction is
    the following
  • As one moves down the tree, one cycles through
    the axes used to select the splitting planes.
    (For example, the root would have an x-aligned
    plane, the root's children would both have
    y-aligned planes, the root's grandchildren would
    all have z-aligned planes, the next level would
    have an x-aligned plane, and so on.)
  • Points are inserted by selecting the median of
    the points being put into the subtree, with
    respect to their coordinates in the axis being
    used to create the splitting plane. (Note the
    assumption that we feed the entire set of points
    into the algorithm up-front.)

11
Consruction
  • This method leads to a balanced kd-tree, in which
    each leaf node is about the same distance from
    the root. However, balanced trees are not
    necessarily optimal for all applications.
  • Note also that it is not required to select the
    median point. In that case, the result is simply
    that there is no guarantee that the tree will be
    balanced. A simple heuristic to avoid coding a
    complex linear-time median-finding algorithm or
    using an O(n log n) sort is to use sort to find
    the median of a fixed number of randomly selected
    points to serve as the cut line

12
Kd tree mean vs median
kd-tree partitions of a uniform set of data
points, using the mean (left image) and the
median (right image) thresholding options.
Median The middle value of a set of values.
Mean The arithmetic average. (Andrea Vivaldi and
Brian Fulkersson) http//www.vlfeat.org/overview/k
dtree.html
13
Example of using Median
14
Additions
  • One adds a new point to a kd-tree in the same way
    as one adds an element to any other search tree.
  • First, traverse the tree, starting from the root
    and moving to either the left or the right child
    depending on whether the point to be inserted is
    on the "left" or "right" side of the splitting
    plane.
  • Once you get to the node under which the child
    should be located, add the new point as either
    the left or right child of the leaf node, again
    depending on which side of the node's splitting
    plane contains the new node.
  • Adding points in this manner can cause the tree
    to become unbalanced, leading to decreased tree
    performance

15
Deletions
  • To remove a point from an existing kd-tree,
    without breaking the invariant, the easiest way
    is to form the set of all nodes and leaves from
    the children of the target node, and recreate
    that part of the tree.
  • Another approach is to find a replacement for the
    point removed. First, find the node R that
    contains the point to be removed. For the base
    case where R is a leaf node, no replacement is
    required. For the general case, find a
    replacement point, say p, from the sub-tree
    rooted at R. Replace the point stored at R with
    p. Then, recursively remove p.

16
Balancing
  • Balancing a kd-tree requires care. Because
    kd-trees are sorted in multiple dimensions, the
    tree rotation technique cannot be used to balance
    them this may break the invariant.
  • Several variants of balanced kd-tree exists. They
    include divided kd-tree, pseudo kd-tree,
    K-D-B-tree, hB-tree and Bkd-tree. Many of these
    variants are adaptive k-d tree.

17
Quering
  • Kdtree query uses a best-bin first search
    heuristic. This is a branch-and-bound technique
    that maintains an estimate of the smallest
    distance from the query point to any of the data
    points down all of the open paths.
  • Kdtree query supports two important operations
    nearest-neighbor search and k-nearest neighbor
    search. The first returns nearest-neighbor to a
    query point, the latter can be used to return the
    k nearest neighbors to a given query point Q. For
    instance

18
Nearest-neighbor search
  • Starting with the root node, the algorithm moves
    down the tree recursively (i.e. it goes right or
    left depending on whether the point is greater or
    less than the current node in the split
    dimension).
  • Once the algorithm reaches a leaf node, it saves
    that node point as the "current best"
  • The algorithm unwinds the recursion of the tree,
    performing the following steps at each node

19
Recursion step
  • If the current node is closer than the current
    best, then it becomes the current best.
  • The algorithm checks whether there could be any
    points on the other side of the splitting plane
    that are closer to the search point than the
    current best. In concept, this is done by
    intersecting the splitting hyperplane with a
    hypersphere around the search point that has a
    radius equal to the current nearest distance.
  • If the hypersphere crosses the plane, there could
    be nearer points on the other side of the plane,
    so the algorithm must move down the other branch
    of the tree from the current node looking for
    closer points, following the same recursive
    process as the entire search.
  • If the hypersphere doesn't intersect the
    splitting plane, then the algorithm continues
    walking up the tree, and the entire branch on the
    other side of that node is eliminated.

20
Nearest-neighbor search
  • kd-trees are not suitable for efficiently finding
    the nearest neighbour in high dimensional spaces.
  • In very high dimensional spaces, the curse of
    dimensionality causes the algorithm to need to
    visit many more branches than in lower
    dimensional spaces. In particular, when the
    number of points is only slightly higher than the
    number of dimensions, the algorithm is only
    slightly better than a linear search of all of
    the points.
  • The algorithm can be improved. It can provide the
    k-Nearest Neighbors to a point by maintaining k
    current bests instead of just one. Branches are
    only eliminated when they can't have points
    closer than any of the k current bests.

21
Range search
  • Kd tree provide convenient tool for range search
    query in databases with more than one key. The
    search might go down the root in both directions
    (left and right), but can be limited by strict
    inequality on key value at each tree level.
  • Kd tree is the only data structure that allows
    easy multi-key search.

22
Kd tree
http//upload.wikimedia.org/wikipedia/en/9/9c/KDTr
ee-animation.gif
23
Complexity
  • Building a static kd-tree from n points takes O(n
    log 2 n) time if an O(n log n) sort is used to
    compute the median at each level.
  • The complexity is O(n log n) if a linear
    median-finding algorithm such as the one
    described in Cormen et al. is used.
  • Inserting a new point into a balanced kd-tree
    takes O(log n) time.
  • Removing a point from a balanced kd-tree takes
    O(log n) time.
  • Querying an axis-parallel range in a balanced
    kd-tree takes O(n1-1/k m) time, where m is the
    number of the reported points, and k the
    dimension of the kd-tree.

24
Kd tree of rectangles
  • Instead of points, a kd-tree can also contain
    rectangles.
  • A 2D rectangle is considered a 4D object (xlow,
    xhigh, ylow, yhigh).
  • Thus range search becomes the problem of
    returning all rectangles intersecting the search
    rectangle.
  • The tree is constructed the usual way with all
    the rectangles at the leaves. In an orthogonal
    range search, the opposite coordinate is used
    when comparing against the median. For example,
    if the current level is split along xhigh, we
    check the xlow coordinate of the search
    rectangle. If the median is less than the xlow
    coordinate of the search rectangle, then no
    rectangle in the left branch can ever intersect
    with the search rectangle and so can be pruned.
    Otherwise both branches should be traversed.
  • Note that interval tree is a 1-dimensional
    special case.

25
Applications
  • Query processing in sensor networks
  • Nearest-neighbor searchers
  • Optimization
  • Ray tracing
  • Database search by multiple keys

26
Examples of applications
27
Progressive Meshes
Developed by Hugues Hoppe, Microsoft Research
Inc. Published first in SIGGRAPH 1996.
28
Terrain visualization applications
29
Geometric subdivision
Problems with Geometric Subdivisions
30
ROAM principle
The basic operating principle of ROAM
31
Quad-tree and Bin-tree for ROAM (real-time
adaptive mesh)
32
Review questions
  • Define kd tree
  • What is the difference from B tree? R tree? Quad
    tree? Grid file? Interval tree?
  • Define complexity of basic operations
  • What is the difference between mean and median kd
    tree?
  • List typical queries nearest-neighbor, k
    nearest neighbors
  • Provide examples of kd tree applciations

33
Sources
  • In-line references to current research in the
    area and variety of research papers and web
    sources and applications.

34
Decision Tree
  • Database indexing structure is built for decision
    making and tries to make the decision as fast as
    possible!

Color Green?
yes
no
Size Big?
Color Yellow?
yes
yes
no
no
Shape Round?
Size small?
watermelon
Size Medium?
yes
no
yes
no
no
yes
apple
Size Big?
banana
Taste sweet?
apple
yes
no
yes
no
Grape
grapefruit
lemon
cherry
grape
35
Decision Tree
  • How to obtain decision for a database?
  • Obtain a set of labeled training data set from
    the database.
  • Calculate the entropy impurity

c. Classifier is built by
36
KD-tree
  • By treating query as a decision making procedure,
    we can use decision to build more effective
    database indexing!

Database root node
no
Salary gt 75000?
Age gt 60?
yes
no
yes
no
Data table
Age gt 60?
no
yes
37
KD-tree
  • Each inter-node, only one attribute is used!
  • It is not balance! Search from different node may
    have different I/O cost!
  • It can support multiple attribute database
    indexing like R-tree!
  • It has integrated decision making and database
    query!

38
KD-tree
Tree levels N Leaf nodes M Number of data
entries for leaf node K The inter-nodes for
kd-tree at the same level are stored on the same
page.
  • Equal query N M
  • Range query N M
  • Insert N M 1
  • Delete N M 1

39
Storage Management for High-Dimensional Indexing
Structures
We want to put the similar data in the same page
or neighboring pages!
UNCLUSTERED
Index entries
direct search for
CLUSTERED
UNCLUSTERED
data entries
Index entries
CLUSTERED
direct search for
data entries
Data entries
Data entries
(Index File)
Data entries
Data entries
(Data file)
(Index File)
(Data file)
Data Records
Data Records
Data Records
Data Records
40
Storage Management for High-Dimensional Indexing
Structures
It is very hard to do multi-dimensional data
sorting!
Hilbert Curve scale multi-dimensional data into
one dimension.
00 01 10 11
41
Storage Management for High-Dimensional Indexing
Structures
0
14
15
1
2
3
12
13
4
7
11
8
5
9
6
10
From multi-dimensional indexing to
one-dimensional storage in disk!
Write a Comment
User Comments (0)
About PowerShow.com