S?stad?p???s? I - PowerPoint PPT Presentation

About This Presentation
Title:

S?stad?p???s? I

Description:

Title: Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997 Author: Computations Last modified by: ii Created Date: 3/18/1998 1:44:31 PM – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 143
Provided by: Computations
Category:
Tags: petro | stad | tech

less

Transcript and Presenter's Notes

Title: S?stad?p???s? I


1
S?stad?p???s? I
??sa???? ? a??????µ?? k-means ?p?st?se?? ?e?a?????
S?stad?p???s?
????? t?? d?afa?e??? e??a? ap? t? P.-N. Tan,
M.Steinbach, V. Kumar, Introduction to Data
Mining, Addison Wesley, 2006
2
?? e??a? s?stad?p???s?
???es? s?st?d?? (?µ?d??) a?t??e?µ???? ?ts? ?ste
ta a?t??e?µe?a se ???e s?st?da ?a e??a? ?µ??a (?
?a s?et????ta?) ?a? d?af??et??? (? µ?
s?et???µe?a) ap? ta a?t??e?µe?a t?? ????? s?st?d??
3-d??stata s?µe?a, e???e?de?a ap?stas?
3
?fa?µ????
  • ?µad?p???s? ????d??? ?a? p??te???? p?? ????? t??
    ?d?a ?e?t?????a,
  • e????e?,
  • ?a?a?t???st??? as?e?e???
  • µet???? µe pa??µ??a d?a??µa?s? t?µ??,
  • ?µad?p???s? weblog ??a e??es? pa??µ???? p??t?p??
    p??sp??as??,
  • ?µad?p???s? s?et???µe??? a??e??? ??a browsing,
  • ?µad?p???s? ?e?µ????
  • pe??te? µe pa??µ??a s?µpe??f???

S?stad?p???s? ep?p?d?? ß????? (precipitation)
st?? ??st?a??a!
4
?fa?µ????
  • ?ata???s? Stand-alone efa?µ???/e??a?e??
  • ?pt???p???s?, s?µpe??sµata ??a t?? ?ata??µ?
  • ??µa ???epe?e??as?a?
  • ?e?????? ???tt?s? t?? µe?????? µe????? s??????
    ???s? a?t?p??s?pe?t???? s?µe??? ap? ???e s?st?da
    p??t?t?pa (prototypes),
  • S?µp?es? ?
  • ?p?d?t??? ?atas?e?? e??et????? e??es?
    ???t???te??? ?e?t??a ??p

5
?s?fe?a
6
??sa????
??te µ?a s?stad?p???s? e??a? ?a??
  • ??a µ???d?? s?stad?p???s?? e??a? ?a?? a? pa???e?
    s?st?de? ?a??? p???t?ta?
  • ?e???? ?µ???t?ta e?t?? t?? s?st?da? ?a?
  • ????? ?µ???t?ta a??µesa st?? s?st?de?
  • ? p???t?ta e?a?t?ta? ap? t?
  • ??t??s? ?µ???t?ta? ?a?
  • ????d? ???p???s?? t?? s?stad?p???s??

7
??d? s?stad?p???s??
??a s?stad?p???s? e??a? ??a s????? ap?
s?st?de? ?as??? d?????s? a??µesa st? ?e?a?????
(hierarchical) ?a? d?a????st??? (partitional)
s????? ap? ?µ?de? ??a????st??? S?stad?p???s?
(Partitional Clustering) ??a? d?aµe??sµ?? t??
a?t??e?µ???? se µ? ep??a??pt?µe?a -
non-overlapping - ?p?s????a (s?st?de?) t?t????
?ste ???e a?t??e?µe?? a???e? se a???ß?? ??a
?p?s????? ?e?a????? S?stad?p???s? (Hierarchical
clustering ??a s????? ap? eµf??e?µ??e? (nested)
?µ?de? ?p?t??p??µe se µ?a s?st?da ?a ??e?
?p?-s?st?de? ???a??µ??e? se ??a ?e?a????? d??t??
8
??a????st??? ?a? ?e?a????? S?stad?p???s?
9
??a????st??? ?a? ?e?a????? S?stad?p???s?
??a????st??? S?stad?p???s?
p1

p3

p4

p
2


?a?ad?s?a?? ???d??-???µµa (Dendrogram)
?e?a????? S?stad?p???s?
  • F???a ap?? s?µe?a ? ap??? s?st?de?
  • O? a???????a d?a????st????
  • ?a ?????µe t? d??t??

10
??p?? s?st?d?? ?a??? ??a????sµ??e? S?st?de?
??a s?st?da e??a? ??a s????? ap? s?µe?a t?t??a
?ste ???e s?µe?? µ?a? s?st?da? e??a? ???t???te??
se (? p?? ?µ??? µe) ??a ta ???a s?µe?a t??
s?st?da? ap? ?t? se ?p???d?p?te ???? s?µe?? p??
de? a???e? st? s?st?da.
3 ?a???-d?a????sµ??e? s?st?de?
S???? ?p???e? ? ?????a t?? ?at?f????
(threshold) ??? apa?a?t?ta ???????? (?p???d?p?te
s??µa)
11
??p?? s?st?d?? S?st?de? ßas?sµ??e? se ???t?? ?
p??t?p?
  • ??a s?st?da e??a? ??a s????? ap? a?t??e?µe?a
    t?t??? ?ste ??a a?t??e?µe?? st?? s?st?da e??a?
    ???t???te?? se (? p?? ?µ??? µe) t? ???t?? ?
    p??t?p? t?? s?st?da? ap? ?t? ap? t? ???t??
    ?p??asd?p?te ????? s?st?da?.
  • ?? ???t?? t?? ?µ?da? e??a? s????
  • centroid, ? µ?s?? ???? t?? s?µe??? t?? s?st?da?,
    ?
  • a medoid, t? p?? a?t?p??s?pe?t??? s?µe?? t??
    s?st?da? (p? ?ta? ?at??????? ?????sµata)

?e????? st? ?a e??a? ????????
4 s?st?de? ßas?sµ??e? se ???t??
12
??p?? s?st?d?? S??e??? S?st?de?
S??e??? S?st?de? (Contiguous Cluster)
(???t???te??? ?e?t??a ? µetaßat???) ??se?
?e?t??as?? ??a s?st?da e??a? ??a s????? s?µe???
t?t??? ?ste ???e s?µe?? e??a? p?? ???t? se ??a ?
pe??ss?te?a s?µe?a t?? s?st?da? ap? ?,t? se
?p???d?p?te ???? s?µe?? e?t?? s?st?da? S???? se
pe??pt?se?? s?st?d?? µe µ? ?a?????? s??µa ? µe
a?????p?e??µe?a s??µata ? ?ta? ????µe ??af?µata
?a? ?????µe ?a ß???µe s??e?t???
?p???af?µata ???ß??µa µe ????ß?
8 s??e?e?? s?st?de?
13
??p?? s?st?d?? S?st?de? ßas?sµ??e? st?? p????t?ta
??a s?st?da e??a? µ?a p???? pe????? ap? s?µe?a
t?? ?p??a ???????? ap? ???e? pe?????? µe?????
p????t?ta? pe?????? ?aµ???? p????t?ta? S???? se
pe??pt?se?? s?st?d?? µe µ? ?a?????? s??µa ? µe
a?????p?e??µe?a s??µata ? ?ta? ????ß?? ? outliers
6 s?st?de? ßas?sµ??e? st?? p????t?ta
14
??p?? s?st?d?? ???????????? s?stad?p???s?
S?st?de? µe ????? ?d??t?ta ? e????????????
s?st?de?.
2 a??????a??pt?µe??? ??????
15
??p?? s?st?d?? S?st?de? ßas?sµ??e? se µ?a
??t??e?µe???? S????t?s?
  • ???es? s?st?d?? p?? e?a??st?p????? ?
    µe??st?p????? µ?a a?t??e?µe???? s????t?s?
  • ?pa???µ?s? ???? t?? d??at?? t??p?? ????sµ?? t??
    s?µe??? se s?st?de? ?a? ?p?????sµ?? t?? p?s?
    ?a?? (goodness) e??a? ???e p??a?? s????? ap?
    s?st?de? ???s?µ?p????ta? t? d??e?sa a?t??e?µe????
    s????t?s? (NP-hard)
  • ?? st???? (objectives) µp??e? ?a e??a? ??????
    (global) ? t?p???? (local)
  • ?? ?e?a?????? s?????? t?p????
  • ?? d?a????st???? ??????

16
Ge????? ?pa?t?se??
??t?µet?p?s? ????ß?? ?a? outliers
s?st?da
outliers
Outlier (a??a?? s?µe??) t?µ?? p?? e??a?
e?a???se?? ?? p??? ta s?????sµ??e? ? a?aµe??µe?e?
t?µ??
17
???e? d?a???se?? µeta?? s?????? s?st?d??
?p??a??pt?µe?? ? ??? ??a s?µe?? a???e? se
pe??ss?te?e? ap? µ?a s?st?de? (p? ???a??
s?µe?a) ?saf? s?stad?p???s? St?? asaf?
s?stad?p???s? ??a s?µe?? a???e? se ???e s?st?da
µe ??p??? ß???? µeta?? t?? 0 ?a? t?? 1 S???? ta
ß??? ??a ???e s?µe?? ????? ?????sµa 1 ?
p??a??t??? s?stad?p???s? ??e? pa??µ??a
?a?a?t???st??? ?e???? - ?????? Se ???sµ??e?
pe??pt?se?? ?????µe ?a ?µad?p???s??µe µ??? ??p??a
ap? ta ded?µ??a (???a ????ß??, ? µ? e?d?af????sa
p????f???a) ?te???e?? - ?µ??e?? S?st?de? µe p???
d?af??et??? µe????, s??µata ?a? p????t?te?
(densities)
18
???????µ?? S?stad?p???s??
Ta d??µe a??µesa se ?????? t???
  • K-means ?a? pa?a??a???
  • ?e?a????? S?stad?p???s?
  • S?stad?p???s? µe ß?s? t?? ?????t?ta (DBSCAN)
  • BIRCH (ded?µ??a st? d?s??!)

19
K-means
20
K-means Ge????
??a????st???? a??????µ?? (ßas?sµ???? se
p??t?p?) ???e s?st?da s?s?et??eta? µe ??a
?e?t???? s?µe?? (centroid) ???e s?µe??
a?at??eta? st? s?st?da µe t? ???t???te?? ?e?t????
s?µe?? ? a???µ?? t?? ?µ?d??, ?, e??a? e?s?d??
st?? a??????µ?
21
K-means ?as???? ???????µ??
?as???? a??????µ??
22
K-means ?as???? ???????µ??
?a??de??µa 2 4 10 12 3 20 30 11 15
?st? k 2, ?a? a????? ep??????µe t? 3 ?a? t? 4
23
K-means ?as???? ???????µ??
?a?at???se??
  • 1. ?a a????? ?e?t???? s?µe?a s?????? ep??????ta?
    t??a?a
  • ?? s?st?de? p?? pa?????ta? d?af????? ap? t? ??a
    t????µ? t?? a??????µ?? st? ????

24
K-means ?as???? ???????µ??
?a?at???se?? (s????e?a)
  • 2. ? e???t?ta t?? s?µe??? ?p??????eta? µe ß?s?
    ??p??a ap?stas? p?? e?a?t?ta? ap? t? e?d?? t??
    s?µe???, sta pa?ade??µata ?a ?e???s??µe t??
    ????e?de?a ap?stas?
  • ?pe?d? ? ap?stas? ?p??????eta? s???? o
    ?p?????sµ?? t?? p??pe? ?a e??a? s?et??? ap???
  • 3. ?? ?e?t???? s?µe?? e??a? (s??????) t? µ?s?
    (mean) t?? s?µe??? t?? s?st?da? (t? ?p??? µp??e?
    ?a µ?? e??a? ??a ap? ta ded?µ??a e?s?d??)

25
K-means ?as???? ???????µ??
?????? ?at?stas?, ? 3 s?st?de? ?????? s?µe?a
k1, k2, k3
26
K-means ?as???? ???????µ??
?a s?µe?a a?at??e?ta? st? p?? ?e?t????? ap? ta 3
a????? s?µe?a
27
K-means ?as???? ???????µ??
?pa?a-?p?????sµ?? t?? ???t??? (???t??? ß?????)
???e s?µe???
28
K-means ?as???? ???????µ??
??a a???es? t?? s?µe??? ??a ???t?a ß?????
29
K-means ?as???? ???????µ??
?e? a????e? t?p?ta -gt ????S
30
K-means ?as???? ???????µ??
?a?at???se?? (s????e?a)
  • ????? ap????e???µe µ??a ta ???t?a
  • ? p???p????t?ta e??a? O(I n K d)
  • n a???µ?? s?µe???,
  • K a???µ?? s?st?d??,
  • I a???µ?? epa?a???e??,
  • d a???µ?? ?????sµ?t?? (d??stas?)

31
K-means ?as???? ???????µ??
?a?at???se?? (s????e?a)
  • G?a s?????sµ??a µ?t?a ?µ???t?ta?, ? a??????µ??
    s??????e?
  • ? s?????s? s?µßa??e? s?????? t?? a??????
    p??te? epa?a???e??
  • S???? ? te???? s?????? a????e? se
  • Until
  • s?et??? ???a s?µe?a ?a a??????? s?st?da ?
  • ? ap?stas? µeta?? t?? ???? ?e?t????? s?µe??? ap?
    ta pa??? ?a e??a? µ????

32
K-means ??t?µ?s? p???t?ta?
??s?ast???, ? a??????µ?? p??spa?e? epa?a??pt???
?a µe??se? t?? ap?stas? ???? t?? s?µe??? ap?
??a s?µe?? t?? s?st?da?
  • ? p?? s?????sµ??? µ?t??s? e??a? t? ?????sµa t??
    tet??????? t?? ?????? (Sum of Squared Error
    (SSE))
  • G?a ???e s?µe??, t? ????? e??a? ? ap?stas? t??
    ap? t?? ???t???te?? s?st?da
  • G?a ?a p????µe t? SSE, pa?????µe t? tet??????
    a?t?? t?? ?a??? ?a? ta p??s??t??µe
  • ?p?? dist ????e?de?a ap?stas?, x e??a? ??a s?µe??
    st? s?st?da Ci ?a? mi e??a? ? a?t?p??s?p??
    (?e?t???? s?µe??) t?? s?st?da? Ci
  • ?p????µe ?a de????µe ?t? t? s?µe?? p??
    e?a??st?p??e? t? SSE ??a t? s?st?da e??a? ? µ?s??
    ???? ci 1/mi S x? Ci x
  • ?????t?? d?? s?st?d??, µp????µe ?a ep??????µe
    a?t?? µe t? µ????te?? ?????

33
K-means ??t?µ?s? p???t?ta?
??a? t??p?? ?a ße?t??s??µe t? s?stad?p???s?
(e??tt?s? t?? SSE) e??a? ?a µe?a??s??µe t?
K ???? ?e???? µ?a ?a?? s?stadop???s? µe µ???? ?
µp??e? ?a ??e? µ????te?? SSE ap? µ?a ?a??
s?stad?p???s? µe µe???? K
34
K-means ?as???? ???????µ??
  • ?? ap?t??esµa e?a?t?ta? ap? t?? ep????? t??
    a?????? s?µe???

35
K-means ?a??de??µa
?????? s?µe?a
36
K-means ?p????? a?????? s?µe???
37
K-means ?p????? a?????? s?µe???
38
K-means ?p????? a?????? s?µe???
39
K-means ?p????? a?????? s?µe???
40
?a??de??µa 10 s?st?d??
?e?????ta? µe d?? a????? s?µe?a se ???e s?st?da
???e ?e????? s?st?d??
41
?a??de??µa 10 s?st?d??
?e?????ta? µe d?? a????? s?µe?a se ???e s?st?da
???e ?e????? s?st?d??
42
?a??de??µa 10 s?st?d??
?e?????ta? µe ??p??a ?e?????a s?st?d?? ?a ?????
t??a ?e?t???? s?µe?a ?a? ???a µ??? ??a
43
?a??de??µa 10 s?st?d??
?e?????ta? µe ??p??a ?e?????a s?st?d?? ?a ?????
t??a ?e?t???? s?µe?a ?a? ???a µ??? ??a
44
K-means ?p????? a?????? s?µe???
?? ?p?????? K p?a?µat???? s?st?de? ? p??a??t?ta
?a ep??????µe ??a ???t?? ap? ???e s?st?da e??a?
µ????, s???e???µ??a a? ??e? ?? s?st?de? ????? t?
?d?? µ??e??? n, t?te
G?a pa??de??µa, a? ? 10, ? p??a??t?ta e??a?
10!/1010 0.00036
45
K-means ??se?? ??a t?? ep????? a?????? s?µe???
  • ????ap?? t?e??µata
  • ?????, a??? p????? pe??pt?se??
  • ?e??µat?????a ?a? ???s? ??p??a? ?e?a??????
    te??????
  • ?p????? pa?ap??? ap? k a?????? s?µe??? ?a? µet?
    ep????? k ap? a?t? ta a????? ?e?t???? s?µe?a (p?
    ta p?? ap?µa???sµ??a µeta?? t???)
  • Stad?a?? ep?????
  • ?p????? t?? p??t?? s?µe??? t??a?a ? ?? t? µ?s?
    ???? t?? s?µe???
  • G?a ?a???a ap? ta ?p????pa a????? s?µe?a
  • ep??e?e a?t? p?? e??a? p?? µa???? ap? ta µ????
    t??a ep??e?µ??a a????? s?µe?a
  • ?p??e? ?a ?d???se? st?? ep????? outliers
  • ? ?p?????sµ?? t?? p?? ap?µa???sµ???? s?µe???
    e??a? dapa?????
  • S???? efa?µ??eta? se de??µata

46
K-means ?de?e? s?st?de?
? ßas???? a??????µ?? µp??e? ?a ?d???se? se ?de?e?
a?????? s?st?de? ?????? st?at?????? ?p????? t??
s?µe??? p?? e??a? p?? µa???? ap? ??a ta t?????
???t?a ep????? t?? s?µe??? p?? s?µß??e?
pe??ss?te?? st? SSE ??a s?µe?? ap? t? s?st?da
µe t? ?????te?? SSE ?a ?d???se? se sp?s?µ?
t?? ??a se µe??s? t?? ?????? ?? p????? ?de?e?
s?st?de?, ta pa?ap??? ß?µata µp??e? ?a
epa?a??ft??? p????? f????
47
K-means Stad?a?? e??µ???s? ?e?t????? s?µe???
  • St? ßas??? ?-means, t? ???t?a e??µe???eta? af??
    ??? ta s?µe?a ????? a?ate?e? st? ???t??
  • ??a pa?a??a?? e??a? ?a e??µe?????ta? ta ???t?a
    µet? ap? ???e a???es? (incremental approach)
  • ???e a???es? e??µe???e? 0 ? 2 ???t?a
  • ??? dapa????
  • ??e? s?µas?a ? se??? e?sa?????/e??tas?? t??
    s?µe???
  • ?e? ?p?????? ?de?e? s?st?de?
  • ?p??e? ?a ???s?µ?p??????? ß??? a? ?p???e?
    ??p??a t??a?a a?t??e?µe???? s????t?s? ??e????
    t? s?µf??e? ???e f???

48
??? ?a? ?eta ?pe?e??as?a
????? SSE ?a? SSE S?st?da? ???-epe?e??as?a ?a?????
?p???s? t?? ded?µ???? ?p?µ?????s?
outliers Post-processing Split-Merge
(d?at????ta? t? ?d?? K) ??a????sµ?? (split)
s?st?d?? µe t? s?et??? µe?a??te?? SSE ??µ??????a
µ?a ??a? s?st?da? p? ep??????ta? t? s?µe?? p??
e??a? p?? µa???? ap? ??a ta ???t?a ? t??a?a
ep????? s?µe??? ? ep????? t?? s?µe??? µe t?
µe?a??te?? SSE S?????s? (merge) s?st?d?? p??
e??a? s?et??? ???t???? (ta ???t?a t??? ????? t??
µ????te?? ap?stas?) ? t?? d?? s?st?de? p??
?d????? st?? µ????te?? a???s? t?? SSE ??a??af?
s?st?da? ?a? a?a?ata??µ? t?? s?µe??? t?? se ???e?
s?st?de? (a?t? p?? ?d??e? st?? µ????te?? a???s?
t?? SSE)
49
K-means µe d???t?µ?s? (bisecting k-means)
?a?a??a?? p?? µp??e? ?a pa???e? µ?a d?a????st???
? ?e?a????? s?stad?p???s?
50
K-means µe d???t?µ?s? (bisecting k-means)
  • ???a s?st?da ?a d?asp?s??µe
  • ?? µe?a??te??
  • ??t? µe t? µe?a??te?? SSE
  • S??d?asµ? t?? pa?ap???
  • ?p??e? ?a ???s?µ?p????e? ?a? ?? ?e?a??????

51
K-means µe d???t?µ?s?
52
K-means ?e?????sµ??
O K-means ??e? p??ß??µata ?ta? ?? s?st?de? ?????
??af??et??? ?e???? ??af??et????
?????t?te? Non-globular shapes ??e? p??ß??µata
?ta? ta ded?µ??a ????? outliers
53
K-means ?e?????sµ?? d?af??et??? µe????
K-means (3 s?st?de?)
?????? s?µe?a
?e? µp??e? ?a ß?e? t? µe???? ???????, ??at? e??a?
p??? µe?a??te??? ap? t??? ??????
54
K-means ?e?????sµ?? d?af??et???? p????t?te?
K-means (3 s?st?de?)
?????? s?µe?a
?e? µp??e? ?a d?a????se? t??? d?? µ?????? ??at?
e??a? p??? p????? se s??s? µe t?? ??a µe????
55
K-means ?e?????sµ?? µ? ??????? s??µata
?????? s?µe?a
K-means (2 s?st?de?)
?e? µp??e? ?a ß?e? t?? d?? s?st?de? ??at? ?????
µ? ??????? s??µata
56
K-means ?e?????sµ??
?????? S?µe?a K-means S?st?de?
??a ??s? e??a? ?a ???s?µ?p??????? p?????
s?st?de? ???s?e? tµ?µata t?? s?st?d??, a???
p??pe? ?a ta s???e?t??s??µe
57
K-means ?e?????sµ??
?????? s?µe?a K-means S?st?de?
58
K-means ?e?????sµ?? d?af??et??? µe????
?????? S?µe?a K-means S?st?de?
59
K-medoid
S?????? s??e?? d-d??stat? ???? ??a???e? ??a
a?t?p??s?pe?t??? s?µe?? ap? ta ded?µ??a ?a?
e?a??st?p??e? t?? ap?stas? ap? a?t? Medoid t?
p?? ?e?t???? s?µe?? t?? s?st?da? (a?t? ?a
???s?µ?p??e? t? mean) Me???e? t?? e?a?s??s?a se
outliers ?p??e? ?a efa?µ?ste? se ded?µ??a
?p????d?p?te t?p?? (p? ?a? ??a ?at???????
ded?µ??a)
60
?a????es? ???sµ??
??a pa????es? ??s? t?µ? ?p?stas?
61
Ge???? ??s?
  • ????µ?t??? ??s?/??s? ??µ?- Mean (a??eß????
    µ?t??s?) (sample vs. population)
  • ????µ?t??? µ?s? µe ß???? (Weighted arithmetic
    mean)
  • Trimmed mean ??ß??µe t?? a??a?e? t?µ?? (p? ta
    µe?a??te?a ?a? µ????te?a (p/2))
  • ??s? µesa?a t?µ? (median) - d??µes?
  • ?esa?a t?µ? a? µ???? a???µ??, ? µ?s?? ???? t??
    d?? µesa??? t?µ??, a?????

?? µ?s? s?µpe??f??eta? ?a??te?a ?ta? ded?µ??a µe
µ? ?µ???µ??f? ?ata??µ? (skewed)
?a??de??µa 1 2 3 4 5 90 ??s? ??s?
t?µ? Trimmed 40
62
Ge???? ??s?
  • Mode
  • ? t?µ? p?? eµfa???eta? p?? s???? sta ded?µ??a
  • Unimodal, bimodal, trimodal (p?? s????
    eµfa???eta? µ?a, d?? ? t?e?? d?af??et???? t?µ??)
  • Midrange (µ?s? d?ast?µat??)
  • (min()max())/2

63
Ge???? ??s?
Median, mean and mode of symmetric, positively
and negatively skewed data
64
Ge???? ??s?
Distributed measure (?ata?eµ?µ??? µ?t??s?)
µp????? ?a ?p?????st??? a? ????s??µe ta a?????
ded?µ??a se µ????te?a ?p?s????a, ?p?????s??µe t??
t?µ? se ???e ?p?s????? ?a? t?? s?????e???µe p?
sum(), count(), max(), min() Algebraic measure
(a??eß???? µ?t??s?) µp??e? ?a ?p?????ste? a?
efa?µ?s??µe µ?a a??eß???? (p??????µ???) s????t?s?
se µ?a ? pe??ss?te?e? ?ata?eµ?µ??e? µet??se?? (p?
avg() sum()/count()) Holistic measure (???st???
µ?t??s?) p??pe? ?a ?p?????ste? st? s????? t??
ded?µ????
65
??asp???
Mean (µ?s? t?µ?)
Variance (s2)
Standard deviation (s)
66
?p?stas? ?a? ?µ???t?ta
67
???t???a ?µ???t?ta? -?p?stas?
  • ?µ???t?ta
  • ??a a???µ?t??? µ?t??s? ??a t? p?s? ?µ??a e??a?
    d?? a?t??e?µe?a
  • ?e?a??te?? ?s? p?? ?µ??a e??a? ta a?t??e?µe?a
    µeta?? t???
  • S???? t?µ?? st? 0, 1
  • ?? ?µ???t?ta (dissimilarity)
  • ??a a???µ?t??? µ?t??s? ??a t? p?s? d?af??et???
    e??a? d?? a?t??e?µe?a
  • ?????te?? ?s? p??? ?µ??a e??a? ta a?t??e?µe?a
    µeta?? t???
  • ? e????st? t?µ? e??a? s?????? 0 (?ta? ta ?d?a),
    a??? t? p??? ??? d?af??e?

68
???t???a ?µ???t?ta?
  • G??st?? ?d??t?te? ??a t?? ?µ???t?ta
  • s(p, q) 1 (? µ???st? ?µ???t?ta) µ??? a? p q.
  • s(p, q) s(q, p) ??a ???e p ?a? q (S?µµet??a)

69
???t???a ?µ???t?ta? -?p?stas?
? ?µ???t?ta-µ? ?µ???t?ta µeta?? d?? a?t??e?µ????
µet???ta? s?????? ß?s? µ?a? s????t?s?? ap?stas??
a??µesa sta a?t??e?µe?a ??a?t?ta? ap? t? e?d??
t?? ded?µ????, d??ad? ap? t? e?d?? t??
?????sµ?t?? t???
70
???t???a ?µ???t?ta?
S??a?t?se?? ap?stas?? (distance
functions) S????? ?d??t?te? 1. d(i, j) ? 0 2.
d(i, i) 0 (a?a??ast???) 3. d(i, j) d(j, i)
(s?µµet????) 4. d(i, j) ? d(i, h) d(h, j)
(t???????? a??s?t?ta) ?ta? ?s????? ?a? ?? 4, ?
s????t?s? ap?stas?? ???µ??eta? ?a? µet????
ap?stas?? (distance metric)
71
?p?st?se??
  • ??? µe???e? ?at?????e?
  • ????e?de?e? (ßas?sµ??e? st? ??s? t?? s?µe???,
    ap?st?se?? t?? s?µe??? st? ????, ?????a t?? µ?s??
    t?µ??)
  • ?? ????e?de?e? (ßas?sµ??e? se ???e? ?d??t?te? t??
    s?µe??? p??? t?? ??s?? t???)

72
???sµ?? ?p?stas??
?st? d?? µetaß??t?? i ?a? j µe n ??????sµata xik
?a? xjk i (xi1, xi2, , xin) and j (xj1, xj2,
, xjn) ? p?? s?????sµ???? t??p?? ????e?de?a
ap?stas?
???a? µet???? ap?stas??
73
???sµ?? ?p?stas??
?st? d?? µetaß??t?? i ?a? j µe n ?????sµata xik
?a? xjk i (xi1, xi2, , xin) and j (xj1, xj2,
, xjn) Manhattan ? city-block
???a? µet???? ap?stas??
74
???sµ?? ?p?stas??
?a??de??µa
75
???sµ?? ?p?stas??
?a??de??µa
b (9,8)
L2-norm dist(x,y) ?(4232) 5
3
5
L1-norm dist(x,y) 43 7
4
a (5,5)
76
???sµ?? ?p?stas??
?a??de??µa
???a?a? ?ed?µ????
???a?a? ?p?stas??
77
???sµ?? ?p?stas??
?st? d?? µetaß??t?? i ?a? j µe n ??????sµata xik
?a? xjk i (xi1, xi2, , xin) and j (xj1, xj2,
, xjn) Minkowski (p-norm)
???a? µet???? ap?stas??
78
???sµ?? ?p?stas??
  • p 1. City block (Manhattan, taxicab, L1 norm).
  • Hamming distance, ?ta? d?ad??? d?a??sµata
    a???µ?? bits p?? d?af?????
  • p 2. ????e?de?a ap?stas?
  • p ? ?. supremum (Lmax norm, L? norm) ap?stas?.
  • ? µ???st? ap?stas? µeta?? ?p????d?p?te
    ?????sµat?? (d??stas??) t?? d?? d?a??sµ?t??
  • t? maximum t? ???? ?ta? t? p te??e? st? 8 st??
    Lp norm pa?????ta? t?? p th t?? d?af????,
    a???????ta? ?a? pa?????ta? t?? n th ???a.

79
???sµ?? ?p?stas??
?a??de??µa
???a?e? ?p?stas??
80
???sµ?? ?p?stas??
S????, ???? p? ??a ????e?de?a ap?stas?
81
?? ????e?de?e? ap?st?se??
  • Jaccard distance
  • Cosine distance
  • Edit distance
  • Hamming Distance

82
???sµ?? ?p?stas??
??ad???? ?etaß??t??
  • S???? ded?µ??a µe µ??? d?ad??? ?????sµata
    (d?ad??? d?a??sµata)
  • S?µµet????? (t?µ?? 0 ?a? 1 ????? t?? ?d?a
    s?µas?a)
  • Invariant ?µ???t?ta
  • ?? s?µµet????? (? s?µf???a st? 1 p?? s?µa?t???
    p? ?ta? t? 1 s?µat?d?te? t?? ?pa??? ??p??a?
    as???e?a?)
  • Non-invariant (Jaccard)

83
???sµ?? ?p?stas??
  • ?eta?? d?? a?t??e?µ???? i ?a? j µe d?ad???
    ?????sµata
  • M01 ? a???µ?? t?? ?????sµ?t?? p?? t? i ??e?
    t?µ? 0 ?a? t? j ??e? 1
  • M10 ? a???µ?? t?? ?????sµ?t?? p?? t? i ??e?
    t?µ? 1 ?a? t? j ??e? 0
  • M00 ? a???µ?? t?? ?????sµ?t?? p?? t? i ??e?
    t?µ? 0 ?a? t? j ??e? 0
  • M11 ? a???µ?? t?? ?????sµ?t?? p?? t? i ??e?
    t?µ? 1 ?a? t? j ??e? 1
  • ?????????
  • ?p?? ta???asµa s?µµet????? µetaß??t??
  • SMC a???µ?? ta???asµ?t?? / a???µ??
    ?????sµ?t??
  • (M11 M00) / (M01 M10 M11
    M00)
  • J a???µ?? 11 ta???asµ?t?? / a???µ? µ?
    µ?de????? ?????sµ?t??
  • (M11) / (M01 M10 M11)
  • J - S??te?est?? Jaccard Jaccard Coefficient
    µ? s?µµet????? µetaß??t?? (d?af??et??? s?µas?a
    st?? t?µ? 1 ?a? st?? t?µ? 0)

84
???sµ?? ?p?stas??
?a??de??µa
  • p 1 0 0 0 0 0 0 0 0 0
  • q 0 0 0 0 0 0 1 0 0 1
  • M01 2
  • M10 1
  • M00 7
  • M11 0
  • SMC (M11 M00)/(M01 M10 M11 M00) (07)
    / (2107) 0.7
  • J (M11) / (M01 M10 M11) 0 / (2 1 0)
    0

J 1(p BAND q) / 1(p BOR Q)
85
???sµ?? ?p?stas??
??t??e?µe?? j
0
1
Contigency p??a?a? ??a d?ad??? ded?µ??a
?10
?11
1
??t??e?µe?? i
?00
?01
0
??t??s? ap?stas?? ??a s?µµet????? d?ad????
µetaß??t??


j
i
d
)
,
(
1 s?µµet????-?µ???t?ta
?01
?10
?11


?00


??t??s? ap?stas?? ??a µ? s?µµet????? d?ad????
µetaß??t??

j
i
d
)
,
(
?11

j
i
sim
)
,
(
Jaccard coefficient

Jaccard
86
???sµ?? ?p?stas??
?a??de??µa ta ?????sµata µ? s?µµet???? ?st? Y-P
?a a?t?st?????? st? 1 ?a? t? ? st? 0
Name
Gender
Fever
Cough
Test-1
Test-2
Test-3
Test-4
Jack
M
Y
N
P
N
N
N
Mary
F
Y
N
P
N
P
N
Jim
M
Y
P
N
N
N
N
87
???sµ?? ?p?stas??
?at???????? ?etaß??t?? ????? ???ta?? (nominal)
Ge???e?s? t?? d?ad???? µetaß??t?? (?????sµ?t??)
?p?? µp????? ?a p????? pa?ap??? ap? 2 t?µ??, p?
???????, p??s???, ??t????
1? ????d?? ?p?? ta???asµa m ta????sµata, p
s???????? µetaß??t??
2? ????d?? ???s? p????? d?ad???? µetaß??t?? ??a
??a ???e µ?a ap? t?? µ t?µ??
88
Jaccard ?µ???t?ta ??a S????a
  • ? Jaccard ?µ???t?ta ??a d?? s????a e??a? t?
    µ??e??? t?? t?µ?? p??? t? µ??e??? t?? ???s?? t???
  • Sim (C1, C2) C1?C2/C1?C2.

89
?a??de??µa
3 t?µ?. 8 ???s?. Jaccard ?µ???t?ta 3/8
90
Jaccard ?µ???t?ta ??a S????a
  • ??ad??? a?apa??stas? s?????? (1 t? st???e??
    ?p???e?, 0 ??????)
  • p1 10111 p2 10011.
  • ???e??? t?µ?? 3 ???e??? ???s?? 4, ?µ???t?ta
    Jaccard (??? ap?stas?) 3/4.
  • d(x,y) 1 (?µ???t?ta Jaccard) 1/4.

91
Jaccard ?µ???t?ta ??a S????a
  • d(x,x) 0 ??at? x?x x?x.
  • d(x,y) d(y,x) ??at? ? ???s? ?a? ? t?µ? e??a?
    s?µµet?????
  • d(x,y) gt 0 ??at? x?y lt x?y.
  • d(x,y) lt d(x,z) d(z,y) ?.

92
???sµ?? ?p?stas??
?µ???t?ta s???µ?t???? (cosine similarity)
  • ?? d1 and d2 e??a? d?a??sµata ?e?µ????
  • cos( d1, d2 ) (d1 ? d2) / d1
    d2 ,
  • ?p?? ? es?te???? ????µe?? d t? µ???? t?? d.
  • ?a??de??µa
  • d1 3 2 0 5 0 0 0 2 0 0
  • d2 1 0 0 0 0 0 0 1 0 2
  • d1 ? d2 31 20 00 50 00 00
    00 21 00 02 5
  • d1 (3322005500000022000
    0)0.5 (42) 0.5 6.481
  • d2 (110000000000001100
    22) 0.5 (6) 0.5 2.245
  • cos( d1, d2 ) .3150

T????µe µ?a ap?stas? p?? ?a a???e? ta 0 (?p?? ?
Jaccard) a??? ?a d???e?e? ?a? ??a µ? d?ad???
ded?µ??a ?p?s??, a???e? t? µ???? t?? d?a??sµ?t??
93
???sµ?? ?p?stas??
?µ???t?ta s???µ?t???? (cosine similarity)
Ge?µet???? e?µ??e?a
?µ???t?ta 1, ?ta? ? ????a 0 p?? s?µa??e? ?t? ta
x ?a? y ?d?a (a? e?a???s??µe t? µ????
t???) ?µ???t?ta 0, ?ta? ? ????a 90 (?a???a?
?????? ????)
94
?p?stas? Edit
  • G?a d?? s?µß???se???? (strings) ? e????st??
    a???µ?? e?sa?????/d?a??af?? ?a?a?t???? p??
    ??e?????ta? ??a ?a p?µe ap? t? µ?a st?? ????
  • x abcde y bcduve.
  • Turn x into y by deleting a, then inserting u
    and v after d.
  • Edit distance 3..

95
?a????es? ???sµ??
??e??e? ? pa????es?
96
Ge????? ?pa?t?se??
  • Scalability st?? a???µ? s?µe??? ?a? d?ast?se??
  • ?a ?p?st????e? d?af??et????? t?p??? ded?µ????
  • ?a ?p?st????e? s?st?de? µe d?af??et??? s??µata
    (s??????, sfa??e?)
  • ?a e??a? e????? ?a d?s??µe t?µ?? st?? pa?aµ?t????
    e?s?d?? (a???µ? s?st?d??, µ??e??? ??p)
  • ?a µ?? e?a?t?ta? ap? t? se??? epe?e??as?a? t??
    s?µe??? e?s?d??

97
Ge????? ?pa?t?se??
  • ???aµ??? µetaßa???µe?a ded?µ??a
  • ???a?? s?st?d?? µe t? p??asµa t?? ??????
  • ?p?d?s? (scaling)
  • Disk-resident vs Main memory

98
??d? S?st?d??
  • ?a??? d?a????sµ??e? s?st?de?
  • S?st?de? ßas?sµ??e? se ???t??
  • S??e??? (contiguous) s?st?de?
  • S?st?de? ?as?sµ??e? se p????t?ta
  • ?as?sµ??a se ?d??t?te? ? ?????e?
  • ?e?????f??ta? ap? µ?a a?t??e?µe???? s????t?s?
    (Objective Function)

99
???????µ?? S?stad?p???s??
Ta d??µe a??µesa se ?????? t???
  • K-means ?a? pa?a??a???
  • ?e?a????? S?stad?p???s?
  • S?stad?p???s? µe ß?s? t?? ?????t?ta (DBSCAN)
  • BIRCH (ded?µ??a st? d?s??!)

100
?e?a????? S?stad?p???s?
101
?e?a????? S?stad?p???s? ?as???
?a???e? ??a s????? ap? eµf??e?µ??e? s?st?de?
???a??µ??e? se ??a ?e?a????? d??t?? ?p??e? ?a
pa?asta?e? µe ??a d??d??-??aµµa ??a d????aµµa
p?? µ????e? µe d??d?? ?a? ?ata???fe? t??
a???????e? ap? s?????e?se?? (merges) ?a?
d?a????sµ??? (splits)
102
?e?a????? S?stad?p???s? ??e??e?t?µata
  • ?e ??e???eta? ?a ?p???s??µe ??a s???e???µ???
    a???µ? ap? s?st?de?
  • ?p???sd?p?te ep???µ?t?? a???µ?? ap? s?st?de?
    µp??e? ?a ep?te???e? ??ß??ta? t? de?d????aµµa st?
    ?at?????? ep?ped?
  • ?p??e? ?a a?t?st?????? se ??????? ta????µ?se??
  • G?a pa??de??µa st?? ß????????? ep?st?µe? (?????
    ßas??e??, phylogeny reconstruction, )

103
?e?a????? S?stad?p???s?
  • ??? ßas???? t?p?? ?e?a?????? s?stad?p???s??
  • S?ss??e?t???? (Agglomerative)
  • ?????e? µe ta s?µe?a ?? ?e????st?? s?st?de?
  • Se ???e ß?µa, s?????e?e? t? p?? ???t??? ?e?????
    s?st?d?? µ???? ?a µe??e? µ??? µ?a (? k) s?st?de?
  • ??a??et???? (Divisive)
  • ?????e? µe µ?a s?st?da p?? pe????e? ??a ta
    s?µe?a
  • Se ???e ß?µa, d?a?????e? µ?a s?st?da, ??? ???e
    s?st?da ?a pe????e? µ??? ??a s?µe?? (? ?a
    d?µ?????????? k s?st?de?)

104
?e?a????? S?stad?p???s?
  • ?? pa?ad?s?a??? a??????µ??
  • ???s?µ?p????? ??a? p??a?a ?µ???t?ta ? ap?stas??
  • d?a????sµ?? ? s?????e?s? µ?a? ?µ?da? t? f???

105
S?ss??e?t??? ?e?a????? S?stad?p???s? (S?S)
? p?? d?µ?f???? te????? s?stad?p???s??
?as???? ???????µ??
1 ?p?????sµ?? t?? ???a?a Ge?t??as?? 2 ?st?
???e s?µe?? ap?te?e? ?a? µ?a s?st?da 3
Repeat 4 S?????e?s? t?? d?? ???t???te???
s?st?d?? 5 ???µ???s? t?? ???a?a Ge?t??as??
6 Until ?a µe??e? µ?a µ??? s?st?da
?as??? ?e?t?????a e??a? ? ?p?????sµ?? t??
?e?t??as?? d?? s?st?d?? ??af??et???? a??????µ??
µe ß?s? t? p?? ????eta? ? ap?stas? a??µesa se d??
s?st?de?
106
S?ss??e?t??? ?e?a????? S?stad?p???s?
?????? ???e s?µe?? ?a? s?st?da ?a? ??a? ???a?a?
Ge?t??as?? (proximity matrix)
???a?a? Ge?t??as??
107
S?ss??e?t??? ?e?a????? S?stad?p???s?
?et? ap? ??p??e? s?????e?se??, ????µe ??p??e?
s?st?de?
C3
C4
???a?a? Ge?t??as??
C1
C5
C2
108
S?ss??e?t??? ?e?a????? S?stad?p???s?
T????µe ?a s?????e?s??µe t?? d?? ???t???te?e?
s?st?de? (C2 ?a? C5) ?a? ?a e??µe??s??µe t??
p??a?a ?e?t??as??.
C3
C4
???a?a? Ge?t??as??
C1
C5
C2
109
S?ss??e?t??? ?e?a????? S?stad?p???s?
?et? t? s?????e?s? ? e??t?s? e??a? ???
e??µe?????µe t?? p??a?a ?e?t??as??
C2 U C5
C1
C3
C4
?
C1
? ? ? ?
C2 U C5
C3
?
C3
C4
?
C4
???a?a? Ge?t??as??
C1
C2 U C5
110
S?S ???sµ?? ap?stas?? µeta?? s?st?d??
?µ???t?ta?
  • MIN
  • MAX
  • ??s?? ???? t?? s?st?da?
  • ? ap?stas? µeta?? t?? ?e?t????? s?µe???
  • ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
    s????t?s?
  • ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????

???a?a? Ge?t??as??
111
S?S ???sµ?? ap?stas?? µeta?? s?st?d??
  • MIN
  • MAX
  • ??s?? ???? t?? ?µ?da?
  • ? ap?stas? µeta?? t?? ?e?t????? s?µe???
  • ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
    s????t?s?
  • ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????

???a?a? Ge?t??as??
112
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
MIN ? µ??ad???? a?µ?? ? ap??? s??d?sµ?? (single
link) ? ?µ???t?ta µeta?? d?? s?st?d?? ßas??eta?
sta d?? p?? ?µ??a (p?? ?e?t?????) s?µe?a st??
d?af??et???? s?st?de? (µe ????? ??af?µ?t??
shortest edge) ?a?????eta? ap? ??a ?e????
t?µ??, d??ad? µ?a a?µ? (link) t?? ??af?µat??
?e?t??as??.
???µ??eta? ?a? µ???d?? s?stad?p???s??
???t???te??? ?e?t??a
113
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
MIN ? µ??ad???? a?µ?? ? ap??? s??d?sµ?? (single
link) ? ?µ???t?ta µeta?? d?? s?st?d?? ßas??eta?
sta d?? p?? ?µ??a (p?? ?e?t?????) s?µe?a st??
d?af??et???? s?st?de? (µe ????? ??af?µ?t??
shortest edge) ?a?????eta? ap? ??a ?e???? t?µ??,
d??ad? µ?a a?µ? (link) t?? ??af?µat?? ?e?t??as??.
???s??? ?µ???t?ta gt ta p??? ?µ??a
114
1 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
???a?a? ap?stas?? (????e?de?a)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
115
1 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
?a?????eta? µ??? ap? µ?a a?µ? t?? µ????te??
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
116
1 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
117
1 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
???e? ?a d? µ?a a?µ?
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
118
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
?e?t????aµµa
?? de?t????aµµa (y-????a?) d??e? ?a? t??
ap?st?se??
F???asµ??e? S?st?de?
119
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
???te??µata
?????? s?µe?a
Contiguity-based (s??e??µe?e? s?st?de?) ?p??e? ?a
?e???ste? µ? e??e?pt??? (non-elliptical) s??µata
120
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
?e???e?t?µata
?????? s?µe?a
  • ??a?s??t? se ????ß? ?a? outliers

121
S?S ???sµ?? ap?stas?? µeta?? s?st?d??
  • MIN
  • MAX
  • ??s?? ???? t?? ?µ?da?
  • ? ap?stas? µeta?? t?? ?e?t????? s?µe???
  • ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
    s????t?s?
  • ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????

???a?a? Ge?t??as??
122
S?S ???sµ?? ap?stas?? µeta?? s?st?d??
0 1.1180 2.6926 3.5 3.3541 1.1180 0 2.8282 3.162
3 4.1231 2.6926 2.8284 0 1.4142 2.2361 3.5 3.162
3 1.4142 0 3.6056 3.3541 4.1231 2.2361 3.6056 0
123
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
M?? ? p?????? s??des?µ?t?ta? (complete linkage) -
??a??t? ????e? ? ?µ???t?ta µeta?? d?? s?st?d??
ßas??eta? sta d?? ????te?? ?µ??a (p?? µa?????)
s?µe?a st?? d?af??et???? s?st?de? (longest edge)
d??ad?, ?? s?st?de? µe t?? µ????te?? t?t??a
ap?stas? ?a?????eta? ap? ??a ta ?e??? t?µ?? st??
d?? s?st?de?.
?µ???t?ta
124
1 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
125
1 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
126
1 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
127
1 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
128
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
F???asµ??e? S?st?de?
?e?t????aµµa
129
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
??e??e?t?µata
?????? S?µe?a
  • ????te?? e???t?s? se ????ß? ?a? outliers

130
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
?e???e?t?µata
?????? s?µe?a
  • ?e??e? ?a d?asp? µe???e? s?st?de?
  • ?d??e? s?????? se ??????? s??µata

131
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
MIN MAX
132
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
MIN MAX
133
S?S ???sµ?? ap?stas?? µeta?? s?st?d??
  • MIN
  • MAX
  • ??s?? ???? t?? ?µ?da? (group average)
  • ? ap?stas? µeta?? t?? ?e?t????? s?µe???
  • ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
    s????t?s?
  • ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????

???a?a? Ge?t??as??
134
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ??s?
?µ?da?
  • ???t???t?ta d?? s?st?d?? e??a? ? µ?s? t?µ? t??
    a?a-d?? ???t???t?ta? (average of pairwise
    proximity) µeta?? t?? s?µe??? t?? d?? s?st?d??.
  • ???s? µ?s?? ??at? ? ????? ?a ?d??e p??t?µ?s? st??
    µe???e? s?st?de?

?µ???t?ta
135
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ??s?
?µ?da?
F???asµ??e? S?st?de?
Dendrogram
136
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ??s?
?µ?da?
  • ???µesa se MIN-MAX
  • ??e??e?t?µata µ????te?? e?a?s??s?a se ????ß? ?a?
    outliers
  • ?e???e?t?µata ????e? ???????? s?st?de?

137
S?S ???sµ?? ap?stas?? µeta?? s?st?d??
?
?
  • MIN
  • MAX
  • ??s?? ???? t?? ?µ?da?
  • ? ap?stas? µeta?? t?? ?e?t????? s?µe???
  • ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
    s????t?s?
  • ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????

???a?a? Ge?t??as??
???ß??µa µ? µ???t??? a???s? t??
ap?stas?? ???ad?, d?? s?st?de? p?? s?????e???ta?
µp??e? ?a ????? µ????te?? ap?stas? ap? s?st?de?
p?? ????? s?????e?te? se p??????µe?a ß?µata
138
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ????d??
t?? Ward
  • ?as?sµ??? st?? a???s? t?? SSE ?ta? s?????e???ta?
    ?? d?? s?st?de?
  • ?e?a????? a?????? t?? k-means
  • ?p??e? ?a ???s?µ?p????e? ??a t?? a?????p???s? t??
    k-means

Wards distance µeta?? s?st?d?? Ci and Cj e??a?
? d?af??? µeta?? t?? ?????? ?????? t?? d??
s?st?d?? ?a? t?? ????? ?????? a? e??s??µe t?? 2
s?st?de? se µ?a s?st?da ?st? Cij ri centroid of
Ci rj centroid of Cj rij centroid of Cij
139
S?S ???sµ?? ap?stas?? µeta?? s?st?d?? S?????s?
MIN
MAX
????d?? t?? Ward
??s? ?µ?da?
140
S?S ????p????t?ta ?????? ?a? ?????
  • O(m2) ????? ??a t?? ap????e?s? t?? p??a?a
    ?e?t??as??
  • m a???µ?? s?µe???.
  • O(m3)
  • ?e????µe µe m s?st?de? ?a? µe?????µe 1 t? f???
  • ?? ??aµµ??? a?a??t?s? t?? p??a?a ?(m2)
  • ?a??te??? ?????? a? d?at????µe ??p??a ta????µ?s?
    t?? ap?st?se?? p? heap

141
S?S ?e?????sµ?? ?a? ???ß??µata
?? ap?f?se?? e??a? te????? af?? d?? s?st?de?
s?????e?t??? a?t? de? µp??e? ?a a????e? ?e?
e?a??st?p????? ?µesa ??p??a a?t??e?µe????
s????t?s?
142
S?S
  • ??a d?a??et??? pa?a??a?? t?? MIN ßas??eta? se
    spanning tree (s?e?et??? d??t?a)
  • ???s?µ?p???se t?? p??a?a ap?stas?? ?a?
    ?atas?e?ase ??a e????st? s?e?et??? d??t??
  • ??µ??????se µ?a ??a s?st?da sp????ta? t? d??t??
    st?? a?µ? µe t? µe?a??te?? ap?stas? (µ????te??
    ?µ???t?ta)
Write a Comment
User Comments (0)
About PowerShow.com