Title: S?stad?p???s? I
1S?stad?p???s? I
??sa???? ? a??????µ?? k-means ?p?st?se?? ?e?a?????
S?stad?p???s?
????? t?? d?afa?e??? e??a? ap? t? P.-N. Tan,
M.Steinbach, V. Kumar, Introduction to Data
Mining, Addison Wesley, 2006
2?? e??a? s?stad?p???s?
???es? s?st?d?? (?µ?d??) a?t??e?µ???? ?ts? ?ste
ta a?t??e?µe?a se ???e s?st?da ?a e??a? ?µ??a (?
?a s?et????ta?) ?a? d?af??et??? (? µ?
s?et???µe?a) ap? ta a?t??e?µe?a t?? ????? s?st?d??
3-d??stata s?µe?a, e???e?de?a ap?stas?
3?fa?µ????
- ?µad?p???s? ????d??? ?a? p??te???? p?? ????? t??
?d?a ?e?t?????a, - e????e?,
- ?a?a?t???st??? as?e?e???
- µet???? µe pa??µ??a d?a??µa?s? t?µ??,
- ?µad?p???s? weblog ??a e??es? pa??µ???? p??t?p??
p??sp??as??, - ?µad?p???s? s?et???µe??? a??e??? ??a browsing,
- ?µad?p???s? ?e?µ????
- pe??te? µe pa??µ??a s?µpe??f???
S?stad?p???s? ep?p?d?? ß????? (precipitation)
st?? ??st?a??a!
4?fa?µ????
- ?ata???s? Stand-alone efa?µ???/e??a?e??
- ?pt???p???s?, s?µpe??sµata ??a t?? ?ata??µ?
- ??µa ???epe?e??as?a?
- ?e?????? ???tt?s? t?? µe?????? µe????? s??????
???s? a?t?p??s?pe?t???? s?µe??? ap? ???e s?st?da
p??t?t?pa (prototypes), - S?µp?es? ?
- ?p?d?t??? ?atas?e?? e??et????? e??es?
???t???te??? ?e?t??a ??p
5?s?fe?a
6??sa????
??te µ?a s?stad?p???s? e??a? ?a??
- ??a µ???d?? s?stad?p???s?? e??a? ?a?? a? pa???e?
s?st?de? ?a??? p???t?ta? - ?e???? ?µ???t?ta e?t?? t?? s?st?da? ?a?
- ????? ?µ???t?ta a??µesa st?? s?st?de?
- ? p???t?ta e?a?t?ta? ap? t?
- ??t??s? ?µ???t?ta? ?a?
- ????d? ???p???s?? t?? s?stad?p???s??
7??d? s?stad?p???s??
??a s?stad?p???s? e??a? ??a s????? ap?
s?st?de? ?as??? d?????s? a??µesa st? ?e?a?????
(hierarchical) ?a? d?a????st??? (partitional)
s????? ap? ?µ?de? ??a????st??? S?stad?p???s?
(Partitional Clustering) ??a? d?aµe??sµ?? t??
a?t??e?µ???? se µ? ep??a??pt?µe?a -
non-overlapping - ?p?s????a (s?st?de?) t?t????
?ste ???e a?t??e?µe?? a???e? se a???ß?? ??a
?p?s????? ?e?a????? S?stad?p???s? (Hierarchical
clustering ??a s????? ap? eµf??e?µ??e? (nested)
?µ?de? ?p?t??p??µe se µ?a s?st?da ?a ??e?
?p?-s?st?de? ???a??µ??e? se ??a ?e?a????? d??t??
8??a????st??? ?a? ?e?a????? S?stad?p???s?
9??a????st??? ?a? ?e?a????? S?stad?p???s?
??a????st??? S?stad?p???s?
p1
p3
p4
p
2
?a?ad?s?a?? ???d??-???µµa (Dendrogram)
?e?a????? S?stad?p???s?
- F???a ap?? s?µe?a ? ap??? s?st?de?
- O? a???????a d?a????st????
- ?a ?????µe t? d??t??
10??p?? s?st?d?? ?a??? ??a????sµ??e? S?st?de?
??a s?st?da e??a? ??a s????? ap? s?µe?a t?t??a
?ste ???e s?µe?? µ?a? s?st?da? e??a? ???t???te??
se (? p?? ?µ??? µe) ??a ta ???a s?µe?a t??
s?st?da? ap? ?t? se ?p???d?p?te ???? s?µe?? p??
de? a???e? st? s?st?da.
3 ?a???-d?a????sµ??e? s?st?de?
S???? ?p???e? ? ?????a t?? ?at?f????
(threshold) ??? apa?a?t?ta ???????? (?p???d?p?te
s??µa)
11??p?? s?st?d?? S?st?de? ßas?sµ??e? se ???t?? ?
p??t?p?
- ??a s?st?da e??a? ??a s????? ap? a?t??e?µe?a
t?t??? ?ste ??a a?t??e?µe?? st?? s?st?da e??a?
???t???te?? se (? p?? ?µ??? µe) t? ???t?? ?
p??t?p? t?? s?st?da? ap? ?t? ap? t? ???t??
?p??asd?p?te ????? s?st?da?. - ?? ???t?? t?? ?µ?da? e??a? s????
- centroid, ? µ?s?? ???? t?? s?µe??? t?? s?st?da?,
? - a medoid, t? p?? a?t?p??s?pe?t??? s?µe?? t??
s?st?da? (p? ?ta? ?at??????? ?????sµata)
?e????? st? ?a e??a? ????????
4 s?st?de? ßas?sµ??e? se ???t??
12??p?? s?st?d?? S??e??? S?st?de?
S??e??? S?st?de? (Contiguous Cluster)
(???t???te??? ?e?t??a ? µetaßat???) ??se?
?e?t??as?? ??a s?st?da e??a? ??a s????? s?µe???
t?t??? ?ste ???e s?µe?? e??a? p?? ???t? se ??a ?
pe??ss?te?a s?µe?a t?? s?st?da? ap? ?,t? se
?p???d?p?te ???? s?µe?? e?t?? s?st?da? S???? se
pe??pt?se?? s?st?d?? µe µ? ?a?????? s??µa ? µe
a?????p?e??µe?a s??µata ? ?ta? ????µe ??af?µata
?a? ?????µe ?a ß???µe s??e?t???
?p???af?µata ???ß??µa µe ????ß?
8 s??e?e?? s?st?de?
13??p?? s?st?d?? S?st?de? ßas?sµ??e? st?? p????t?ta
??a s?st?da e??a? µ?a p???? pe????? ap? s?µe?a
t?? ?p??a ???????? ap? ???e? pe?????? µe?????
p????t?ta? pe?????? ?aµ???? p????t?ta? S???? se
pe??pt?se?? s?st?d?? µe µ? ?a?????? s??µa ? µe
a?????p?e??µe?a s??µata ? ?ta? ????ß?? ? outliers
6 s?st?de? ßas?sµ??e? st?? p????t?ta
14??p?? s?st?d?? ???????????? s?stad?p???s?
S?st?de? µe ????? ?d??t?ta ? e????????????
s?st?de?.
2 a??????a??pt?µe??? ??????
15??p?? s?st?d?? S?st?de? ßas?sµ??e? se µ?a
??t??e?µe???? S????t?s?
- ???es? s?st?d?? p?? e?a??st?p????? ?
µe??st?p????? µ?a a?t??e?µe???? s????t?s? - ?pa???µ?s? ???? t?? d??at?? t??p?? ????sµ?? t??
s?µe??? se s?st?de? ?a? ?p?????sµ?? t?? p?s?
?a?? (goodness) e??a? ???e p??a?? s????? ap?
s?st?de? ???s?µ?p????ta? t? d??e?sa a?t??e?µe????
s????t?s? (NP-hard) -
- ?? st???? (objectives) µp??e? ?a e??a? ??????
(global) ? t?p???? (local) - ?? ?e?a?????? s?????? t?p????
- ?? d?a????st???? ??????
16Ge????? ?pa?t?se??
??t?µet?p?s? ????ß?? ?a? outliers
s?st?da
outliers
Outlier (a??a?? s?µe??) t?µ?? p?? e??a?
e?a???se?? ?? p??? ta s?????sµ??e? ? a?aµe??µe?e?
t?µ??
17???e? d?a???se?? µeta?? s?????? s?st?d??
?p??a??pt?µe?? ? ??? ??a s?µe?? a???e? se
pe??ss?te?e? ap? µ?a s?st?de? (p? ???a??
s?µe?a) ?saf? s?stad?p???s? St?? asaf?
s?stad?p???s? ??a s?µe?? a???e? se ???e s?st?da
µe ??p??? ß???? µeta?? t?? 0 ?a? t?? 1 S???? ta
ß??? ??a ???e s?µe?? ????? ?????sµa 1 ?
p??a??t??? s?stad?p???s? ??e? pa??µ??a
?a?a?t???st??? ?e???? - ?????? Se ???sµ??e?
pe??pt?se?? ?????µe ?a ?µad?p???s??µe µ??? ??p??a
ap? ta ded?µ??a (???a ????ß??, ? µ? e?d?af????sa
p????f???a) ?te???e?? - ?µ??e?? S?st?de? µe p???
d?af??et??? µe????, s??µata ?a? p????t?te?
(densities)
18???????µ?? S?stad?p???s??
Ta d??µe a??µesa se ?????? t???
- K-means ?a? pa?a??a???
- ?e?a????? S?stad?p???s?
- S?stad?p???s? µe ß?s? t?? ?????t?ta (DBSCAN)
- BIRCH (ded?µ??a st? d?s??!)
19K-means
20K-means Ge????
??a????st???? a??????µ?? (ßas?sµ???? se
p??t?p?) ???e s?st?da s?s?et??eta? µe ??a
?e?t???? s?µe?? (centroid) ???e s?µe??
a?at??eta? st? s?st?da µe t? ???t???te?? ?e?t????
s?µe?? ? a???µ?? t?? ?µ?d??, ?, e??a? e?s?d??
st?? a??????µ?
21K-means ?as???? ???????µ??
?as???? a??????µ??
22K-means ?as???? ???????µ??
?a??de??µa 2 4 10 12 3 20 30 11 15
?st? k 2, ?a? a????? ep??????µe t? 3 ?a? t? 4
23K-means ?as???? ???????µ??
?a?at???se??
- 1. ?a a????? ?e?t???? s?µe?a s?????? ep??????ta?
t??a?a - ?? s?st?de? p?? pa?????ta? d?af????? ap? t? ??a
t????µ? t?? a??????µ?? st? ????
24K-means ?as???? ???????µ??
?a?at???se?? (s????e?a)
- 2. ? e???t?ta t?? s?µe??? ?p??????eta? µe ß?s?
??p??a ap?stas? p?? e?a?t?ta? ap? t? e?d?? t??
s?µe???, sta pa?ade??µata ?a ?e???s??µe t??
????e?de?a ap?stas? - ?pe?d? ? ap?stas? ?p??????eta? s???? o
?p?????sµ?? t?? p??pe? ?a e??a? s?et??? ap??? - 3. ?? ?e?t???? s?µe?? e??a? (s??????) t? µ?s?
(mean) t?? s?µe??? t?? s?st?da? (t? ?p??? µp??e?
?a µ?? e??a? ??a ap? ta ded?µ??a e?s?d??)
25K-means ?as???? ???????µ??
?????? ?at?stas?, ? 3 s?st?de? ?????? s?µe?a
k1, k2, k3
26K-means ?as???? ???????µ??
?a s?µe?a a?at??e?ta? st? p?? ?e?t????? ap? ta 3
a????? s?µe?a
27K-means ?as???? ???????µ??
?pa?a-?p?????sµ?? t?? ???t??? (???t??? ß?????)
???e s?µe???
28K-means ?as???? ???????µ??
??a a???es? t?? s?µe??? ??a ???t?a ß?????
29K-means ?as???? ???????µ??
?e? a????e? t?p?ta -gt ????S
30K-means ?as???? ???????µ??
?a?at???se?? (s????e?a)
- ????? ap????e???µe µ??a ta ???t?a
- ? p???p????t?ta e??a? O(I n K d)
- n a???µ?? s?µe???,
- K a???µ?? s?st?d??,
- I a???µ?? epa?a???e??,
- d a???µ?? ?????sµ?t?? (d??stas?)
31K-means ?as???? ???????µ??
?a?at???se?? (s????e?a)
- G?a s?????sµ??a µ?t?a ?µ???t?ta?, ? a??????µ??
s??????e? - ? s?????s? s?µßa??e? s?????? t?? a??????
p??te? epa?a???e?? - S???? ? te???? s?????? a????e? se
- Until
- s?et??? ???a s?µe?a ?a a??????? s?st?da ?
- ? ap?stas? µeta?? t?? ???? ?e?t????? s?µe??? ap?
ta pa??? ?a e??a? µ????
32K-means ??t?µ?s? p???t?ta?
??s?ast???, ? a??????µ?? p??spa?e? epa?a??pt???
?a µe??se? t?? ap?stas? ???? t?? s?µe??? ap?
??a s?µe?? t?? s?st?da?
- ? p?? s?????sµ??? µ?t??s? e??a? t? ?????sµa t??
tet??????? t?? ?????? (Sum of Squared Error
(SSE)) - G?a ???e s?µe??, t? ????? e??a? ? ap?stas? t??
ap? t?? ???t???te?? s?st?da - G?a ?a p????µe t? SSE, pa?????µe t? tet??????
a?t?? t?? ?a??? ?a? ta p??s??t??µe - ?p?? dist ????e?de?a ap?stas?, x e??a? ??a s?µe??
st? s?st?da Ci ?a? mi e??a? ? a?t?p??s?p??
(?e?t???? s?µe??) t?? s?st?da? Ci - ?p????µe ?a de????µe ?t? t? s?µe?? p??
e?a??st?p??e? t? SSE ??a t? s?st?da e??a? ? µ?s??
???? ci 1/mi S x? Ci x - ?????t?? d?? s?st?d??, µp????µe ?a ep??????µe
a?t?? µe t? µ????te?? ?????
33K-means ??t?µ?s? p???t?ta?
??a? t??p?? ?a ße?t??s??µe t? s?stad?p???s?
(e??tt?s? t?? SSE) e??a? ?a µe?a??s??µe t?
K ???? ?e???? µ?a ?a?? s?stadop???s? µe µ???? ?
µp??e? ?a ??e? µ????te?? SSE ap? µ?a ?a??
s?stad?p???s? µe µe???? K
34K-means ?as???? ???????µ??
- ?? ap?t??esµa e?a?t?ta? ap? t?? ep????? t??
a?????? s?µe???
35K-means ?a??de??µa
?????? s?µe?a
36K-means ?p????? a?????? s?µe???
37K-means ?p????? a?????? s?µe???
38K-means ?p????? a?????? s?µe???
39K-means ?p????? a?????? s?µe???
40?a??de??µa 10 s?st?d??
?e?????ta? µe d?? a????? s?µe?a se ???e s?st?da
???e ?e????? s?st?d??
41?a??de??µa 10 s?st?d??
?e?????ta? µe d?? a????? s?µe?a se ???e s?st?da
???e ?e????? s?st?d??
42?a??de??µa 10 s?st?d??
?e?????ta? µe ??p??a ?e?????a s?st?d?? ?a ?????
t??a ?e?t???? s?µe?a ?a? ???a µ??? ??a
43?a??de??µa 10 s?st?d??
?e?????ta? µe ??p??a ?e?????a s?st?d?? ?a ?????
t??a ?e?t???? s?µe?a ?a? ???a µ??? ??a
44K-means ?p????? a?????? s?µe???
?? ?p?????? K p?a?µat???? s?st?de? ? p??a??t?ta
?a ep??????µe ??a ???t?? ap? ???e s?st?da e??a?
µ????, s???e???µ??a a? ??e? ?? s?st?de? ????? t?
?d?? µ??e??? n, t?te
G?a pa??de??µa, a? ? 10, ? p??a??t?ta e??a?
10!/1010 0.00036
45K-means ??se?? ??a t?? ep????? a?????? s?µe???
- ????ap?? t?e??µata
- ?????, a??? p????? pe??pt?se??
- ?e??µat?????a ?a? ???s? ??p??a? ?e?a??????
te?????? - ?p????? pa?ap??? ap? k a?????? s?µe??? ?a? µet?
ep????? k ap? a?t? ta a????? ?e?t???? s?µe?a (p?
ta p?? ap?µa???sµ??a µeta?? t???) - Stad?a?? ep?????
- ?p????? t?? p??t?? s?µe??? t??a?a ? ?? t? µ?s?
???? t?? s?µe??? - G?a ?a???a ap? ta ?p????pa a????? s?µe?a
- ep??e?e a?t? p?? e??a? p?? µa???? ap? ta µ????
t??a ep??e?µ??a a????? s?µe?a - ?p??e? ?a ?d???se? st?? ep????? outliers
- ? ?p?????sµ?? t?? p?? ap?µa???sµ???? s?µe???
e??a? dapa????? - S???? efa?µ??eta? se de??µata
46K-means ?de?e? s?st?de?
? ßas???? a??????µ?? µp??e? ?a ?d???se? se ?de?e?
a?????? s?st?de? ?????? st?at?????? ?p????? t??
s?µe??? p?? e??a? p?? µa???? ap? ??a ta t?????
???t?a ep????? t?? s?µe??? p?? s?µß??e?
pe??ss?te?? st? SSE ??a s?µe?? ap? t? s?st?da
µe t? ?????te?? SSE ?a ?d???se? se sp?s?µ?
t?? ??a se µe??s? t?? ?????? ?? p????? ?de?e?
s?st?de?, ta pa?ap??? ß?µata µp??e? ?a
epa?a??ft??? p????? f????
47K-means Stad?a?? e??µ???s? ?e?t????? s?µe???
- St? ßas??? ?-means, t? ???t?a e??µe???eta? af??
??? ta s?µe?a ????? a?ate?e? st? ???t?? - ??a pa?a??a?? e??a? ?a e??µe?????ta? ta ???t?a
µet? ap? ???e a???es? (incremental approach) - ???e a???es? e??µe???e? 0 ? 2 ???t?a
- ??? dapa????
- ??e? s?µas?a ? se??? e?sa?????/e??tas?? t??
s?µe??? - ?e? ?p?????? ?de?e? s?st?de?
- ?p??e? ?a ???s?µ?p??????? ß??? a? ?p???e?
??p??a t??a?a a?t??e?µe???? s????t?s? ??e????
t? s?µf??e? ???e f???
48??? ?a? ?eta ?pe?e??as?a
????? SSE ?a? SSE S?st?da? ???-epe?e??as?a ?a?????
?p???s? t?? ded?µ???? ?p?µ?????s?
outliers Post-processing Split-Merge
(d?at????ta? t? ?d?? K) ??a????sµ?? (split)
s?st?d?? µe t? s?et??? µe?a??te?? SSE ??µ??????a
µ?a ??a? s?st?da? p? ep??????ta? t? s?µe?? p??
e??a? p?? µa???? ap? ??a ta ???t?a ? t??a?a
ep????? s?µe??? ? ep????? t?? s?µe??? µe t?
µe?a??te?? SSE S?????s? (merge) s?st?d?? p??
e??a? s?et??? ???t???? (ta ???t?a t??? ????? t??
µ????te?? ap?stas?) ? t?? d?? s?st?de? p??
?d????? st?? µ????te?? a???s? t?? SSE ??a??af?
s?st?da? ?a? a?a?ata??µ? t?? s?µe??? t?? se ???e?
s?st?de? (a?t? p?? ?d??e? st?? µ????te?? a???s?
t?? SSE)
49K-means µe d???t?µ?s? (bisecting k-means)
?a?a??a?? p?? µp??e? ?a pa???e? µ?a d?a????st???
? ?e?a????? s?stad?p???s?
50K-means µe d???t?µ?s? (bisecting k-means)
- ???a s?st?da ?a d?asp?s??µe
- ?? µe?a??te??
- ??t? µe t? µe?a??te?? SSE
- S??d?asµ? t?? pa?ap???
- ?p??e? ?a ???s?µ?p????e? ?a? ?? ?e?a??????
51K-means µe d???t?µ?s?
52K-means ?e?????sµ??
O K-means ??e? p??ß??µata ?ta? ?? s?st?de? ?????
??af??et??? ?e???? ??af??et????
?????t?te? Non-globular shapes ??e? p??ß??µata
?ta? ta ded?µ??a ????? outliers
53K-means ?e?????sµ?? d?af??et??? µe????
K-means (3 s?st?de?)
?????? s?µe?a
?e? µp??e? ?a ß?e? t? µe???? ???????, ??at? e??a?
p??? µe?a??te??? ap? t??? ??????
54K-means ?e?????sµ?? d?af??et???? p????t?te?
K-means (3 s?st?de?)
?????? s?µe?a
?e? µp??e? ?a d?a????se? t??? d?? µ?????? ??at?
e??a? p??? p????? se s??s? µe t?? ??a µe????
55K-means ?e?????sµ?? µ? ??????? s??µata
?????? s?µe?a
K-means (2 s?st?de?)
?e? µp??e? ?a ß?e? t?? d?? s?st?de? ??at? ?????
µ? ??????? s??µata
56K-means ?e?????sµ??
?????? S?µe?a K-means S?st?de?
??a ??s? e??a? ?a ???s?µ?p??????? p?????
s?st?de? ???s?e? tµ?µata t?? s?st?d??, a???
p??pe? ?a ta s???e?t??s??µe
57K-means ?e?????sµ??
?????? s?µe?a K-means S?st?de?
58K-means ?e?????sµ?? d?af??et??? µe????
?????? S?µe?a K-means S?st?de?
59K-medoid
S?????? s??e?? d-d??stat? ???? ??a???e? ??a
a?t?p??s?pe?t??? s?µe?? ap? ta ded?µ??a ?a?
e?a??st?p??e? t?? ap?stas? ap? a?t? Medoid t?
p?? ?e?t???? s?µe?? t?? s?st?da? (a?t? ?a
???s?µ?p??e? t? mean) Me???e? t?? e?a?s??s?a se
outliers ?p??e? ?a efa?µ?ste? se ded?µ??a
?p????d?p?te t?p?? (p? ?a? ??a ?at???????
ded?µ??a)
60?a????es? ???sµ??
??a pa????es? ??s? t?µ? ?p?stas?
61 Ge???? ??s?
- ????µ?t??? ??s?/??s? ??µ?- Mean (a??eß????
µ?t??s?) (sample vs. population) - ????µ?t??? µ?s? µe ß???? (Weighted arithmetic
mean) - Trimmed mean ??ß??µe t?? a??a?e? t?µ?? (p? ta
µe?a??te?a ?a? µ????te?a (p/2)) - ??s? µesa?a t?µ? (median) - d??µes?
- ?esa?a t?µ? a? µ???? a???µ??, ? µ?s?? ???? t??
d?? µesa??? t?µ??, a?????
?? µ?s? s?µpe??f??eta? ?a??te?a ?ta? ded?µ??a µe
µ? ?µ???µ??f? ?ata??µ? (skewed)
?a??de??µa 1 2 3 4 5 90 ??s? ??s?
t?µ? Trimmed 40
62Ge???? ??s?
- Mode
- ? t?µ? p?? eµfa???eta? p?? s???? sta ded?µ??a
- Unimodal, bimodal, trimodal (p?? s????
eµfa???eta? µ?a, d?? ? t?e?? d?af??et???? t?µ??)
- Midrange (µ?s? d?ast?µat??)
- (min()max())/2
63Ge???? ??s?
Median, mean and mode of symmetric, positively
and negatively skewed data
64 Ge???? ??s?
Distributed measure (?ata?eµ?µ??? µ?t??s?)
µp????? ?a ?p?????st??? a? ????s??µe ta a?????
ded?µ??a se µ????te?a ?p?s????a, ?p?????s??µe t??
t?µ? se ???e ?p?s????? ?a? t?? s?????e???µe p?
sum(), count(), max(), min() Algebraic measure
(a??eß???? µ?t??s?) µp??e? ?a ?p?????ste? a?
efa?µ?s??µe µ?a a??eß???? (p??????µ???) s????t?s?
se µ?a ? pe??ss?te?e? ?ata?eµ?µ??e? µet??se?? (p?
avg() sum()/count()) Holistic measure (???st???
µ?t??s?) p??pe? ?a ?p?????ste? st? s????? t??
ded?µ????
65??asp???
Mean (µ?s? t?µ?)
Variance (s2)
Standard deviation (s)
66?p?stas? ?a? ?µ???t?ta
67???t???a ?µ???t?ta? -?p?stas?
- ?µ???t?ta
- ??a a???µ?t??? µ?t??s? ??a t? p?s? ?µ??a e??a?
d?? a?t??e?µe?a - ?e?a??te?? ?s? p?? ?µ??a e??a? ta a?t??e?µe?a
µeta?? t??? - S???? t?µ?? st? 0, 1
- ?? ?µ???t?ta (dissimilarity)
- ??a a???µ?t??? µ?t??s? ??a t? p?s? d?af??et???
e??a? d?? a?t??e?µe?a - ?????te?? ?s? p??? ?µ??a e??a? ta a?t??e?µe?a
µeta?? t??? - ? e????st? t?µ? e??a? s?????? 0 (?ta? ta ?d?a),
a??? t? p??? ??? d?af??e?
68???t???a ?µ???t?ta?
- G??st?? ?d??t?te? ??a t?? ?µ???t?ta
- s(p, q) 1 (? µ???st? ?µ???t?ta) µ??? a? p q.
- s(p, q) s(q, p) ??a ???e p ?a? q (S?µµet??a)
69???t???a ?µ???t?ta? -?p?stas?
? ?µ???t?ta-µ? ?µ???t?ta µeta?? d?? a?t??e?µ????
µet???ta? s?????? ß?s? µ?a? s????t?s?? ap?stas??
a??µesa sta a?t??e?µe?a ??a?t?ta? ap? t? e?d??
t?? ded?µ????, d??ad? ap? t? e?d?? t??
?????sµ?t?? t???
70???t???a ?µ???t?ta?
S??a?t?se?? ap?stas?? (distance
functions) S????? ?d??t?te? 1. d(i, j) ? 0 2.
d(i, i) 0 (a?a??ast???) 3. d(i, j) d(j, i)
(s?µµet????) 4. d(i, j) ? d(i, h) d(h, j)
(t???????? a??s?t?ta) ?ta? ?s????? ?a? ?? 4, ?
s????t?s? ap?stas?? ???µ??eta? ?a? µet????
ap?stas?? (distance metric)
71?p?st?se??
- ??? µe???e? ?at?????e?
- ????e?de?e? (ßas?sµ??e? st? ??s? t?? s?µe???,
ap?st?se?? t?? s?µe??? st? ????, ?????a t?? µ?s??
t?µ??) - ?? ????e?de?e? (ßas?sµ??e? se ???e? ?d??t?te? t??
s?µe??? p??? t?? ??s?? t???)
72???sµ?? ?p?stas??
?st? d?? µetaß??t?? i ?a? j µe n ??????sµata xik
?a? xjk i (xi1, xi2, , xin) and j (xj1, xj2,
, xjn) ? p?? s?????sµ???? t??p?? ????e?de?a
ap?stas?
???a? µet???? ap?stas??
73???sµ?? ?p?stas??
?st? d?? µetaß??t?? i ?a? j µe n ?????sµata xik
?a? xjk i (xi1, xi2, , xin) and j (xj1, xj2,
, xjn) Manhattan ? city-block
???a? µet???? ap?stas??
74???sµ?? ?p?stas??
?a??de??µa
75???sµ?? ?p?stas??
?a??de??µa
b (9,8)
L2-norm dist(x,y) ?(4232) 5
3
5
L1-norm dist(x,y) 43 7
4
a (5,5)
76???sµ?? ?p?stas??
?a??de??µa
???a?a? ?ed?µ????
???a?a? ?p?stas??
77???sµ?? ?p?stas??
?st? d?? µetaß??t?? i ?a? j µe n ??????sµata xik
?a? xjk i (xi1, xi2, , xin) and j (xj1, xj2,
, xjn) Minkowski (p-norm)
???a? µet???? ap?stas??
78???sµ?? ?p?stas??
- p 1. City block (Manhattan, taxicab, L1 norm).
- Hamming distance, ?ta? d?ad??? d?a??sµata
a???µ?? bits p?? d?af????? - p 2. ????e?de?a ap?stas?
- p ? ?. supremum (Lmax norm, L? norm) ap?stas?.
- ? µ???st? ap?stas? µeta?? ?p????d?p?te
?????sµat?? (d??stas??) t?? d?? d?a??sµ?t?? - t? maximum t? ???? ?ta? t? p te??e? st? 8 st??
Lp norm pa?????ta? t?? p th t?? d?af????,
a???????ta? ?a? pa?????ta? t?? n th ???a.
79???sµ?? ?p?stas??
?a??de??µa
???a?e? ?p?stas??
80???sµ?? ?p?stas??
S????, ???? p? ??a ????e?de?a ap?stas?
81?? ????e?de?e? ap?st?se??
- Jaccard distance
- Cosine distance
- Edit distance
- Hamming Distance
82???sµ?? ?p?stas??
??ad???? ?etaß??t??
- S???? ded?µ??a µe µ??? d?ad??? ?????sµata
(d?ad??? d?a??sµata) - S?µµet????? (t?µ?? 0 ?a? 1 ????? t?? ?d?a
s?µas?a) - Invariant ?µ???t?ta
- ?? s?µµet????? (? s?µf???a st? 1 p?? s?µa?t???
p? ?ta? t? 1 s?µat?d?te? t?? ?pa??? ??p??a?
as???e?a?) - Non-invariant (Jaccard)
83???sµ?? ?p?stas??
- ?eta?? d?? a?t??e?µ???? i ?a? j µe d?ad???
?????sµata - M01 ? a???µ?? t?? ?????sµ?t?? p?? t? i ??e?
t?µ? 0 ?a? t? j ??e? 1 - M10 ? a???µ?? t?? ?????sµ?t?? p?? t? i ??e?
t?µ? 1 ?a? t? j ??e? 0 - M00 ? a???µ?? t?? ?????sµ?t?? p?? t? i ??e?
t?µ? 0 ?a? t? j ??e? 0 - M11 ? a???µ?? t?? ?????sµ?t?? p?? t? i ??e?
t?µ? 1 ?a? t? j ??e? 1 - ?????????
- ?p?? ta???asµa s?µµet????? µetaß??t??
- SMC a???µ?? ta???asµ?t?? / a???µ??
?????sµ?t?? - (M11 M00) / (M01 M10 M11
M00) - J a???µ?? 11 ta???asµ?t?? / a???µ? µ?
µ?de????? ?????sµ?t?? - (M11) / (M01 M10 M11)
- J - S??te?est?? Jaccard Jaccard Coefficient
µ? s?µµet????? µetaß??t?? (d?af??et??? s?µas?a
st?? t?µ? 1 ?a? st?? t?µ? 0)
84???sµ?? ?p?stas??
?a??de??µa
- p 1 0 0 0 0 0 0 0 0 0
- q 0 0 0 0 0 0 1 0 0 1
- M01 2
- M10 1
- M00 7
- M11 0
-
- SMC (M11 M00)/(M01 M10 M11 M00) (07)
/ (2107) 0.7 - J (M11) / (M01 M10 M11) 0 / (2 1 0)
0
J 1(p BAND q) / 1(p BOR Q)
85???sµ?? ?p?stas??
??t??e?µe?? j
0
1
Contigency p??a?a? ??a d?ad??? ded?µ??a
?10
?11
1
??t??e?µe?? i
?00
?01
0
??t??s? ap?stas?? ??a s?µµet????? d?ad????
µetaß??t??
j
i
d
)
,
(
1 s?µµet????-?µ???t?ta
?01
?10
?11
?00
??t??s? ap?stas?? ??a µ? s?µµet????? d?ad????
µetaß??t??
j
i
d
)
,
(
?11
j
i
sim
)
,
(
Jaccard coefficient
Jaccard
86???sµ?? ?p?stas??
?a??de??µa ta ?????sµata µ? s?µµet???? ?st? Y-P
?a a?t?st?????? st? 1 ?a? t? ? st? 0
Name
Gender
Fever
Cough
Test-1
Test-2
Test-3
Test-4
Jack
M
Y
N
P
N
N
N
Mary
F
Y
N
P
N
P
N
Jim
M
Y
P
N
N
N
N
87???sµ?? ?p?stas??
?at???????? ?etaß??t?? ????? ???ta?? (nominal)
Ge???e?s? t?? d?ad???? µetaß??t?? (?????sµ?t??)
?p?? µp????? ?a p????? pa?ap??? ap? 2 t?µ??, p?
???????, p??s???, ??t????
1? ????d?? ?p?? ta???asµa m ta????sµata, p
s???????? µetaß??t??
2? ????d?? ???s? p????? d?ad???? µetaß??t?? ??a
??a ???e µ?a ap? t?? µ t?µ??
88Jaccard ?µ???t?ta ??a S????a
- ? Jaccard ?µ???t?ta ??a d?? s????a e??a? t?
µ??e??? t?? t?µ?? p??? t? µ??e??? t?? ???s?? t??? - Sim (C1, C2) C1?C2/C1?C2.
89?a??de??µa
3 t?µ?. 8 ???s?. Jaccard ?µ???t?ta 3/8
90Jaccard ?µ???t?ta ??a S????a
- ??ad??? a?apa??stas? s?????? (1 t? st???e??
?p???e?, 0 ??????) - p1 10111 p2 10011.
- ???e??? t?µ?? 3 ???e??? ???s?? 4, ?µ???t?ta
Jaccard (??? ap?stas?) 3/4. - d(x,y) 1 (?µ???t?ta Jaccard) 1/4.
91Jaccard ?µ???t?ta ??a S????a
- d(x,x) 0 ??at? x?x x?x.
- d(x,y) d(y,x) ??at? ? ???s? ?a? ? t?µ? e??a?
s?µµet????? - d(x,y) gt 0 ??at? x?y lt x?y.
- d(x,y) lt d(x,z) d(z,y) ?.
92???sµ?? ?p?stas??
?µ???t?ta s???µ?t???? (cosine similarity)
- ?? d1 and d2 e??a? d?a??sµata ?e?µ????
- cos( d1, d2 ) (d1 ? d2) / d1
d2 , - ?p?? ? es?te???? ????µe?? d t? µ???? t?? d.
- ?a??de??µa
- d1 3 2 0 5 0 0 0 2 0 0
- d2 1 0 0 0 0 0 0 1 0 2
- d1 ? d2 31 20 00 50 00 00
00 21 00 02 5 - d1 (3322005500000022000
0)0.5 (42) 0.5 6.481 - d2 (110000000000001100
22) 0.5 (6) 0.5 2.245 - cos( d1, d2 ) .3150
T????µe µ?a ap?stas? p?? ?a a???e? ta 0 (?p?? ?
Jaccard) a??? ?a d???e?e? ?a? ??a µ? d?ad???
ded?µ??a ?p?s??, a???e? t? µ???? t?? d?a??sµ?t??
93???sµ?? ?p?stas??
?µ???t?ta s???µ?t???? (cosine similarity)
Ge?µet???? e?µ??e?a
?µ???t?ta 1, ?ta? ? ????a 0 p?? s?µa??e? ?t? ta
x ?a? y ?d?a (a? e?a???s??µe t? µ????
t???) ?µ???t?ta 0, ?ta? ? ????a 90 (?a???a?
?????? ????)
94?p?stas? Edit
- G?a d?? s?µß???se???? (strings) ? e????st??
a???µ?? e?sa?????/d?a??af?? ?a?a?t???? p??
??e?????ta? ??a ?a p?µe ap? t? µ?a st?? ????
- x abcde y bcduve.
- Turn x into y by deleting a, then inserting u
and v after d. - Edit distance 3..
95?a????es? ???sµ??
??e??e? ? pa????es?
96Ge????? ?pa?t?se??
- Scalability st?? a???µ? s?µe??? ?a? d?ast?se??
- ?a ?p?st????e? d?af??et????? t?p??? ded?µ????
- ?a ?p?st????e? s?st?de? µe d?af??et??? s??µata
(s??????, sfa??e?) - ?a e??a? e????? ?a d?s??µe t?µ?? st?? pa?aµ?t????
e?s?d?? (a???µ? s?st?d??, µ??e??? ??p) - ?a µ?? e?a?t?ta? ap? t? se??? epe?e??as?a? t??
s?µe??? e?s?d??
97Ge????? ?pa?t?se??
- ???aµ??? µetaßa???µe?a ded?µ??a
- ???a?? s?st?d?? µe t? p??asµa t?? ??????
- ?p?d?s? (scaling)
- Disk-resident vs Main memory
98??d? S?st?d??
- ?a??? d?a????sµ??e? s?st?de?
- S?st?de? ßas?sµ??e? se ???t??
- S??e??? (contiguous) s?st?de?
- S?st?de? ?as?sµ??e? se p????t?ta
- ?as?sµ??a se ?d??t?te? ? ?????e?
- ?e?????f??ta? ap? µ?a a?t??e?µe???? s????t?s?
(Objective Function)
99???????µ?? S?stad?p???s??
Ta d??µe a??µesa se ?????? t???
- K-means ?a? pa?a??a???
- ?e?a????? S?stad?p???s?
- S?stad?p???s? µe ß?s? t?? ?????t?ta (DBSCAN)
- BIRCH (ded?µ??a st? d?s??!)
100?e?a????? S?stad?p???s?
101?e?a????? S?stad?p???s? ?as???
?a???e? ??a s????? ap? eµf??e?µ??e? s?st?de?
???a??µ??e? se ??a ?e?a????? d??t?? ?p??e? ?a
pa?asta?e? µe ??a d??d??-??aµµa ??a d????aµµa
p?? µ????e? µe d??d?? ?a? ?ata???fe? t??
a???????e? ap? s?????e?se?? (merges) ?a?
d?a????sµ??? (splits)
102?e?a????? S?stad?p???s? ??e??e?t?µata
- ?e ??e???eta? ?a ?p???s??µe ??a s???e???µ???
a???µ? ap? s?st?de? - ?p???sd?p?te ep???µ?t?? a???µ?? ap? s?st?de?
µp??e? ?a ep?te???e? ??ß??ta? t? de?d????aµµa st?
?at?????? ep?ped? - ?p??e? ?a a?t?st?????? se ??????? ta????µ?se??
- G?a pa??de??µa st?? ß????????? ep?st?µe? (?????
ßas??e??, phylogeny reconstruction, )
103?e?a????? S?stad?p???s?
- ??? ßas???? t?p?? ?e?a?????? s?stad?p???s??
- S?ss??e?t???? (Agglomerative)
- ?????e? µe ta s?µe?a ?? ?e????st?? s?st?de?
- Se ???e ß?µa, s?????e?e? t? p?? ???t??? ?e?????
s?st?d?? µ???? ?a µe??e? µ??? µ?a (? k) s?st?de? - ??a??et???? (Divisive)
- ?????e? µe µ?a s?st?da p?? pe????e? ??a ta
s?µe?a - Se ???e ß?µa, d?a?????e? µ?a s?st?da, ??? ???e
s?st?da ?a pe????e? µ??? ??a s?µe?? (? ?a
d?µ?????????? k s?st?de?)
104?e?a????? S?stad?p???s?
- ?? pa?ad?s?a??? a??????µ??
- ???s?µ?p????? ??a? p??a?a ?µ???t?ta ? ap?stas??
- d?a????sµ?? ? s?????e?s? µ?a? ?µ?da? t? f???
105S?ss??e?t??? ?e?a????? S?stad?p???s? (S?S)
? p?? d?µ?f???? te????? s?stad?p???s??
?as???? ???????µ??
1 ?p?????sµ?? t?? ???a?a Ge?t??as?? 2 ?st?
???e s?µe?? ap?te?e? ?a? µ?a s?st?da 3
Repeat 4 S?????e?s? t?? d?? ???t???te???
s?st?d?? 5 ???µ???s? t?? ???a?a Ge?t??as??
6 Until ?a µe??e? µ?a µ??? s?st?da
?as??? ?e?t?????a e??a? ? ?p?????sµ?? t??
?e?t??as?? d?? s?st?d?? ??af??et???? a??????µ??
µe ß?s? t? p?? ????eta? ? ap?stas? a??µesa se d??
s?st?de?
106S?ss??e?t??? ?e?a????? S?stad?p???s?
?????? ???e s?µe?? ?a? s?st?da ?a? ??a? ???a?a?
Ge?t??as?? (proximity matrix)
???a?a? Ge?t??as??
107S?ss??e?t??? ?e?a????? S?stad?p???s?
?et? ap? ??p??e? s?????e?se??, ????µe ??p??e?
s?st?de?
C3
C4
???a?a? Ge?t??as??
C1
C5
C2
108S?ss??e?t??? ?e?a????? S?stad?p???s?
T????µe ?a s?????e?s??µe t?? d?? ???t???te?e?
s?st?de? (C2 ?a? C5) ?a? ?a e??µe??s??µe t??
p??a?a ?e?t??as??.
C3
C4
???a?a? Ge?t??as??
C1
C5
C2
109S?ss??e?t??? ?e?a????? S?stad?p???s?
?et? t? s?????e?s? ? e??t?s? e??a? ???
e??µe?????µe t?? p??a?a ?e?t??as??
C2 U C5
C1
C3
C4
?
C1
? ? ? ?
C2 U C5
C3
?
C3
C4
?
C4
???a?a? Ge?t??as??
C1
C2 U C5
110S?S ???sµ?? ap?stas?? µeta?? s?st?d??
?µ???t?ta?
- MIN
- MAX
- ??s?? ???? t?? s?st?da?
- ? ap?stas? µeta?? t?? ?e?t????? s?µe???
- ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
s????t?s? - ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????
???a?a? Ge?t??as??
111S?S ???sµ?? ap?stas?? µeta?? s?st?d??
- MIN
- MAX
- ??s?? ???? t?? ?µ?da?
- ? ap?stas? µeta?? t?? ?e?t????? s?µe???
- ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
s????t?s? - ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????
???a?a? Ge?t??as??
112S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
MIN ? µ??ad???? a?µ?? ? ap??? s??d?sµ?? (single
link) ? ?µ???t?ta µeta?? d?? s?st?d?? ßas??eta?
sta d?? p?? ?µ??a (p?? ?e?t?????) s?µe?a st??
d?af??et???? s?st?de? (µe ????? ??af?µ?t??
shortest edge) ?a?????eta? ap? ??a ?e????
t?µ??, d??ad? µ?a a?µ? (link) t?? ??af?µat??
?e?t??as??.
???µ??eta? ?a? µ???d?? s?stad?p???s??
???t???te??? ?e?t??a
113S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
MIN ? µ??ad???? a?µ?? ? ap??? s??d?sµ?? (single
link) ? ?µ???t?ta µeta?? d?? s?st?d?? ßas??eta?
sta d?? p?? ?µ??a (p?? ?e?t?????) s?µe?a st??
d?af??et???? s?st?de? (µe ????? ??af?µ?t??
shortest edge) ?a?????eta? ap? ??a ?e???? t?µ??,
d??ad? µ?a a?µ? (link) t?? ??af?µat?? ?e?t??as??.
???s??? ?µ???t?ta gt ta p??? ?µ??a
1141 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
???a?a? ap?stas?? (????e?de?a)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
1151 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
?a?????eta? µ??? ap? µ?a a?µ? t?? µ????te??
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
1161 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
1171 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
???e? ?a d? µ?a a?µ?
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
118S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
?e?t????aµµa
?? de?t????aµµa (y-????a?) d??e? ?a? t??
ap?st?se??
F???asµ??e? S?st?de?
119S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
???te??µata
?????? s?µe?a
Contiguity-based (s??e??µe?e? s?st?de?) ?p??e? ?a
?e???ste? µ? e??e?pt??? (non-elliptical) s??µata
120S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ???
?e???e?t?µata
?????? s?µe?a
- ??a?s??t? se ????ß? ?a? outliers
121S?S ???sµ?? ap?stas?? µeta?? s?st?d??
- MIN
- MAX
- ??s?? ???? t?? ?µ?da?
- ? ap?stas? µeta?? t?? ?e?t????? s?µe???
- ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
s????t?s? - ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????
???a?a? Ge?t??as??
122S?S ???sµ?? ap?stas?? µeta?? s?st?d??
0 1.1180 2.6926 3.5 3.3541 1.1180 0 2.8282 3.162
3 4.1231 2.6926 2.8284 0 1.4142 2.2361 3.5 3.162
3 1.4142 0 3.6056 3.3541 4.1231 2.2361 3.6056 0
123S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
M?? ? p?????? s??des?µ?t?ta? (complete linkage) -
??a??t? ????e? ? ?µ???t?ta µeta?? d?? s?st?d??
ßas??eta? sta d?? ????te?? ?µ??a (p?? µa?????)
s?µe?a st?? d?af??et???? s?st?de? (longest edge)
d??ad?, ?? s?st?de? µe t?? µ????te?? t?t??a
ap?stas? ?a?????eta? ap? ??a ta ?e??? t?µ?? st??
d?? s?st?de?.
?µ???t?ta
1241 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
1251 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
1261 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
1271 (0.4, 0.53) 2 (0.22, 0.38) 3 (0.35, 0.32) 4
(0.26, 0.19) 5 (0.08, 0.41) 6 (0.45, 0.30)
p1 p2 p3 p4 p5 p6 p1 0.00 0.24 0.22 0.37 0.34 0.2
3 p2 0.24 0.00 0.15 0.20 0.14 0.25 p3 0.22 0.15 0.
00 0.15 0.28 0.11 p4 0.37 0.20 0.15 0.00 0.29 0.22
p5 0.34 0.14 0.28 0.29 0.00 0.39 p6 0.23 0.25 0.1
1 0.22 0.39 0.00
128S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
F???asµ??e? S?st?de?
?e?t????aµµa
129S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
??e??e?t?µata
?????? S?µe?a
- ????te?? e???t?s? se ????ß? ?a? outliers
130S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
?e???e?t?µata
?????? s?µe?a
- ?e??e? ?a d?asp? µe???e? s?st?de?
- ?d??e? s?????? se ??????? s??µata
131S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
MIN MAX
132S?S ???sµ?? ap?stas?? µeta?? s?st?d?? MAX
MIN MAX
133S?S ???sµ?? ap?stas?? µeta?? s?st?d??
- MIN
- MAX
- ??s?? ???? t?? ?µ?da? (group average)
- ? ap?stas? µeta?? t?? ?e?t????? s?µe???
- ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
s????t?s? - ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????
???a?a? Ge?t??as??
134S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ??s?
?µ?da?
- ???t???t?ta d?? s?st?d?? e??a? ? µ?s? t?µ? t??
a?a-d?? ???t???t?ta? (average of pairwise
proximity) µeta?? t?? s?µe??? t?? d?? s?st?d??. - ???s? µ?s?? ??at? ? ????? ?a ?d??e p??t?µ?s? st??
µe???e? s?st?de?
?µ???t?ta
135S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ??s?
?µ?da?
F???asµ??e? S?st?de?
Dendrogram
136S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ??s?
?µ?da?
- ???µesa se MIN-MAX
- ??e??e?t?µata µ????te?? e?a?s??s?a se ????ß? ?a?
outliers - ?e???e?t?µata ????e? ???????? s?st?de?
137S?S ???sµ?? ap?stas?? µeta?? s?st?d??
?
?
- MIN
- MAX
- ??s?? ???? t?? ?µ?da?
- ? ap?stas? µeta?? t?? ?e?t????? s?µe???
- ???e? µ???d?? ßas?sµ??e? se µ?a a?t??e?µe????
s????t?s? - ? µ???d?? t?? Ward ???s?µ?p??e? tet?a?????? ????
???a?a? Ge?t??as??
???ß??µa µ? µ???t??? a???s? t??
ap?stas?? ???ad?, d?? s?st?de? p?? s?????e???ta?
µp??e? ?a ????? µ????te?? ap?stas? ap? s?st?de?
p?? ????? s?????e?te? se p??????µe?a ß?µata
138S?S ???sµ?? ap?stas?? µeta?? s?st?d?? ????d??
t?? Ward
- ?as?sµ??? st?? a???s? t?? SSE ?ta? s?????e???ta?
?? d?? s?st?de? - ?e?a????? a?????? t?? k-means
- ?p??e? ?a ???s?µ?p????e? ??a t?? a?????p???s? t??
k-means
Wards distance µeta?? s?st?d?? Ci and Cj e??a?
? d?af??? µeta?? t?? ?????? ?????? t?? d??
s?st?d?? ?a? t?? ????? ?????? a? e??s??µe t?? 2
s?st?de? se µ?a s?st?da ?st? Cij ri centroid of
Ci rj centroid of Cj rij centroid of Cij
139S?S ???sµ?? ap?stas?? µeta?? s?st?d?? S?????s?
MIN
MAX
????d?? t?? Ward
??s? ?µ?da?
140S?S ????p????t?ta ?????? ?a? ?????
- O(m2) ????? ??a t?? ap????e?s? t?? p??a?a
?e?t??as?? - m a???µ?? s?µe???.
- O(m3)
- ?e????µe µe m s?st?de? ?a? µe?????µe 1 t? f???
- ?? ??aµµ??? a?a??t?s? t?? p??a?a ?(m2)
- ?a??te??? ?????? a? d?at????µe ??p??a ta????µ?s?
t?? ap?st?se?? p? heap
141S?S ?e?????sµ?? ?a? ???ß??µata
?? ap?f?se?? e??a? te????? af?? d?? s?st?de?
s?????e?t??? a?t? de? µp??e? ?a a????e? ?e?
e?a??st?p????? ?µesa ??p??a a?t??e?µe????
s????t?s?
142S?S
- ??a d?a??et??? pa?a??a?? t?? MIN ßas??eta? se
spanning tree (s?e?et??? d??t?a) - ???s?µ?p???se t?? p??a?a ap?stas?? ?a?
?atas?e?ase ??a e????st? s?e?et??? d??t?? - ??µ??????se µ?a ??a s?st?da sp????ta? t? d??t??
st?? a?µ? µe t? µe?a??te?? ap?stas? (µ????te??
?µ???t?ta)