Implementation on Cluster Feature Tree - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Implementation on Cluster Feature Tree

Description:

Implementation. on. Cluster Feature Tree. MAIDS group. NCSA and Dept. of CS. University of Illinois at Urbana-Champaign. www.maids.ncsa.uiuc.edu. 11/4/09 ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 10
Provided by: jiaw186
Category:

less

Transcript and Presenter's Notes

Title: Implementation on Cluster Feature Tree


1
Implementationon Cluster Feature Tree
  • MAIDS group
  • NCSA and Dept. of CS
  • University of Illinois at Urbana-Champaign
  • www.maids.ncsa.uiuc.edu

2
CF Tree For Cluster Analysis
Root
Non-leaf node
CF1
CF3
CF2
CF5
child1
child3
child2
child5
Leaf node
Leaf node
CF1
CF2
CF6
prev
next
CF1
CF2
CF4
prev
next
3
Cluster Feature Tree
  • Built dynamically as new data are inserted
  • Used to guide a new insertion into the correct
    cluster
  • A height-balanced tree
  • With three parameters branching factor B for
    non-leaf node, branching factor L for leaf node,
    and threshold T for diameter (or radius)
  • Each non-leaf node contains at most B entries
  • Each leaf node contains at most L entries
  • Each leaf node has prev and next pointers to
    chain all leaf nodes together
  • Diameter (or radius) of all entries in a leaf
    node must be less than T

4
Non-leaf Node in CF Tree
  • Each entry has a CF vector to store summary
    information
  • CF vector is defined as a triple CF (N, LS,
    SS)
  • N Number of data points
  • LS ?Ni1Xi
  • SS ?Ni1Xi2
  • CF vector is the sum of all children, and is
    used for computing centroid, radius, diameter and
    distance.
  • Each entry has a pointer to its child node.

N, LS, SS
Child
5
Leaf Node in CF Tree
  • Data structure in each entry of the leaf node
  • Data structure in each time slot of the tilted
    time window
  • Each entry of the leaf node is a micro-cluster

N, LS, SS (Sum of all time slots in the tilted
time window)
Tilted Time Window
N, LS, SS
6
Insert into a CF Tree
  • Start from the root, recursively descends the CF
    tree by choosing the closest child node.
  • Modify the leaf
  • If diameter lt threshold, data is absorbed by the
    leaf.
  • If diameter gt threshold, add a new entry to the
    leaf.
  • Split the leaf if number of entries in the leaf gt
    L.
  • Modify the path to the leaf
  • Update the CF info for each non-leaf entry on
    the path.
  • Non-leaf node may split as well.
  • If the root split, the tree height increases by
    one.

7
Delete or Merge Micro-Cluster
  • May need to delete or merge micro-clusters to
    free memory space for new micro-clusters.
  • Delete any of the current micro-clusters as
    outliers if It is safe.
  • Merge clusters if no clusters can be deleted.
  • Need modify the path to the leaf after deletion
    and merging.
  • Never split a micro-cluster because there is no
    tracking information for individual data point.

8
Macro-Cluster Creation
  • Modified K-Mean algorithm
  • Initialization stage
  • Seeds are sampled with probability proportional
    to the number of points in a given micro-cluster
  • Partitioning stage
  • Distance is computed from seed to the centroid of
    the corresponding micro-cluster
  • Seed adjustment stage
  • New seed is defined as the weighted centroid of
    the micro-clusters in that partition

9
Usage of CF Tree
  • Create macro-clusters for user requested time
    horizon using modified K-Mean algorithm
  • Find clusters of the network flow between 8AM to
    10AM
  • Retrieve all CF vectors of leaf nodes for the
    time slots 8AM 10AM
  • Compute macro-clusters
  • Evolution analysis of clusters
  • Find changes of the network flow from yesterday
    to today
  • Retrieve all CF vectors of leaf nodes for
    yesterday and today
  • Compute macro-clusters for yesterday and today
  • Compare the difference of two sets of
    macro-clusters
Write a Comment
User Comments (0)
About PowerShow.com