Tracking High Quality Clusters over Uncertain Data Streams - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Tracking High Quality Clusters over Uncertain Data Streams

Description:

To find some clusters in one dataset. Cluster over uncertain data ... U(C) = U(C) U(C U {xt, ut}) Experiment. Validation Measures. UM = Conclusion ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 12
Provided by: Jos373
Category:

less

Transcript and Presenter's Notes

Title: Tracking High Quality Clusters over Uncertain Data Streams


1
Tracking High Quality Clusters over Uncertain
Data Streams
  • Chen Zhang, Ming Gao and Aoying Zhou
  • ICDE 2009

2
Outline
  • Introduction
  • Uncertain Stream Model
  • Definition
  • Clustering Algorithm
  • Experiment
  • Conclusion
  • My thought

3
Introduction
  • Goal of clustering
  • To find some clusters in one dataset.
  • Cluster over uncertain data
  • Data quality should be considered
  • For example

4
Uncertain Stream Model
  • Probability density function (pdf)
  • Denote as (Dt, ft)
  • Discrete probability
  • In order to approximate the pdf
  • Denote as (xt1, 0.2) (xt2, 0.3)(xtk, 0.35)
  • Standard error/deviation
  • Denote as xt r

5
Definition
  • Definition 1 (instance uncertainty)
  • U(xti) -log2(pt(xti))
  • Definition 2 (tuple uncertainty)
  • For example
  • U(xa) -6 (1/6) log2(1/6) 2.585
  • U(xb) -2 (1/2) log2(1/2) 1

6
Clustering Algorithm
7
Implementation
  • Two phase method
  • Reduce computation
  • Firstly (K-NN)
  • Select top k clusters
  • Second
  • ?U(C) U(C) U(C U xt, ut)

8
Experiment
  • Validation Measures
  • UM

9
(No Transcript)
10
Conclusion
  • This paper main contributions
  • Propose a two phase clustering over uncertain
    data stream
  • Add quantify tuple uncertainty

11
My thought
  • The idea is very simple.
  • How to implement on sliding window model?
Write a Comment
User Comments (0)
About PowerShow.com