Title: S?????t??? ????????s? ?e??d?? ?e??????af???? ?a? S?
1S?????t??? ????????s? ?e??d?? ?e??????af???? ?a?
S?µas????????? ?µ???t?ta?
?p?ß??p?? ??st????? ?at??e???µ??d??
2???????
- ? a????? ??a ta???asµa ??t?t?t?? p?? ?p???e? se
p?????? t?µe?? t?? p????f?????? ?p?? Information
Retrieval, Computational Biology, Musicology,
Text Editing, Meteorology, Signal Processing,
etc. - ? ?atas?e?? e??? e??a?e??? s?????s?? st???e???
p?? ?a ßas??eta? se s??d?asµ? ?e??????af???? ?a?
s?µas????????? ?µ???t?ta?. - ? ???e??? e??a?e??? p?? ?a ???s?µ?p??e? a??et???
a??????µ??? ?ste ?a a??????a??pt??ta? ta
µe???e?t?µat? t???.
3S?S????S? ???????O?
- S??p?? ? e?a???? a???µ?t???? t?µ?? ?µ???t?ta?.
- G?a ???e st???e?? t?? e?eta??µe??? ??t?t?t??
?a p??pe? ?a ?s??e? - L(s,t) ? 0..1, ?p?? L ? d?ad??as?a e?a?????
?µ???t?ta?. - ???s? ?e??????af???? ?a? s?µas?????????
?µ???t?ta?.
4??????G??F??? ????????? (1/4)
1? ????S
- ?p?s??pe? st?? e?a???? a???µ?t???? t?µ?? ? ?p??a
?a ?p?d????e? e?te t?? ??s? eµf???s?? µ?a?
s?µß???se???? µ?sa se µ?a ???? (? se ??a ?e?µe??)
e?te t?? ßa?µ? ?µ???t?ta? t?? d?? ??t?t?t??. - ? a???µ?t???? t?µ?? ?µ???t?ta? e?a?t??ta? ap? t??
a??????µ? p?? ???s?µ?p??e?ta? (de? ß??s???ta?
ap???e?st??? st? d??st?µa 0..1).
5??????G??F??? ????????? (2/4)
- Exact vs Approximate
- Exact Matching ? e??es? ???? t?? ??se?? st??
?p??e? µ?a s?µß???se??? eµfa???eta? µ?sa se µ?a
????. - p.?. s A B C A A B
- t A B A B C A A B C A A B C A A
B A A - ?p??t?s? 3, 7, 11,
- Approximate Matching ??a???? a???µ?t????
t?µ?? ? ?p??a p??sd?????e? t? p?s? µ??????? ??
s???????µe?e? ??t?t?te?. ?????? e?p??s?p?? t??
?at?????a? e??a? ? ap?stas? t?? s?µß???se????. - p.?. sABCABC
- tABBAAC
- ?p?stas? Hamming 2
6??????G??F??? ????????? (3/4)
- Smart vs Naive Methods
- Smart Methods ???s?µ?p????? µet????? ??
?p??e? ßas????ta? se ??p??a µ??f??????? ?
f???t??? ?a?a?t???st??? t?? s???????µe???
s?µß???se????. - p.?. Soundex Algorithm
- Naive Methods S????????? ap?? t???
?a?a?t??e? t?? s?µß???se????. - p.?. Edit distance Algorithms, etc.
7??????G??F??? ????????? (4/4)
- ??af????
- Exact Matching Handbook of Exact
String-Matching Algorithms, C. Charras, T.
Lecroq. - Approximate Matching
- A Guided tour to Approximate String Matching,
G. Navarro, ACM Computing Survey. - Selecting the Right Objective Measure for
Associaton Analysis, P. Tan, V. Kumar, J.
Srivastava.
8??G???T??? ??????G??F???S ?????????S (1/9)
- ? pa?a??te??? a??????µ?? ap?stas?? e??a? ?
µet???? Levenshtein. ?as??eta? st?? ?p?????sµ?
t?? ??st??? µetaß???? t?? µ?a? s?µß???se???? st??
???? (??st?? 1). p.?. stest, ttend ap?stas?2. - Needleman-Wunch ?as??eta? st?? a??????µ?
Levenshtein ?a? ???s?µ?p??e? ß??? (ltgt1) ??a ???e
????s? µetaµ??f?s??.
??t??at?stas?/a?t???af? ??sa???? ??a??af?
d(si,tk)s????t?s? ap?stas??, G ??st?? µetaß????
9??G???T??? ??????G??F???S ?????????S (2/9)
- Smith-Waterman ?µ??a te????? µe t?? p??????µe??
a??????µ?. - Jaro ?aµß??e? ?p???? t??? µetas??µat?sµ??? se
s??s? µe t? µ???? t?? a?????? a???????a?. -
- ?µ???t?ta
?????? ??t??at?stas?/a?t???af? ??sa???? ??a??af?
d(si,tk)s????t?s? ap?stas??, G ??st?? µetaß????
10??G???T??? ??????G??F???S ?????????S (3/9)
- Jaro-Winkler ?a?a??a?? t?? a??????µ?? Jaro
- ?µ???t?taJaro (1-Jaro),
Pmax(prefix,4) - Maedche-Staab ???s?µ?p??e? t?? ap?stas?
s?µß???se???? se s??s? µe t? e????st? µ???? t??
s?µß???se????. - ?µ???t?tamax(0,
) - Dice ?p?? µet???? p?? ßas??eta? ap???e?st???
st??? ??????? ?a?a?t??e?. - ?µ???t?ta
11??G???T??? ??????G??F???S ?????????S (4/9)
- Lin ???te??e t?e?? µet?????.
- 1? ?as??eta? st?? ap?stas? t?? s?µß???se????
- ?µ???t?ta
- 2? ?as??eta? se ?????? ?p?a???????e?
(t?????µµata) - ?µ???t?ta
-
- 3? ?as??eta? se ????? t?????µµata ?a? t??
p??a??t?te? eµf???s?? t???. - ?µ???t?ta
12??G???T??? ??????G??F???S ?????????S (5/9)
- Longest Common Subsequence (LCSs) ?e?a??te??
????? ?p?a???????a ?a?a?t???? ????? ?a e??a?
apa?a?t?ta s??e??µe???. - p.?. shouseboat tcomputer LCSsout
- Longest Common Substring (LCSt)?e?a??te?? ?????
?p?a???????a s??e??µe??? ?a?a?t????. - p.?. shello taloha LCStlo
- Q-Grams ???s? e??? pa?a????? Q ?a?a?t???? st?
?p??? ???eta? ? s?????s?. - p.?. 2-grams, 3-grams, , N-grams
13??G???T??? ??????G??F???S ?????????S (6/9)
- Ratcliff-Obershelp ?p??????e? t?? ?µ???t?ta d??
s?µß???se???? ?? t? d?p??s?? t?? p?????? t??
?????? ?a?a?t???? ?? p??? t? s??????? p?????
?a?a?t???? t?? d?? s?µß???se????. ?? ??????
?a?a?t??e? e??a? ?s?? a?????? ?e?a??te?? ?????
?p?a???????a (LCS) ep?p???? t?? ?????? ?a?a?t????
st?? pe????? ? ?p??a de? a???e? st?? LCS. - p.?. ??a sALEXANDRE ?a? tALEKSANDER e??a?
- LCSALEANDE ep?p???? t? R
- s??ep??
- Sim(s,t) 0.84
14??G???T??? ??????G??F???S ?????????S (7/9)
- Yang-Yuan-Zhao-Chun-Peng ???s?µ?p????? t??
te????? t?? ?????? pa?a????? ?a?a?t???? ??a ?a
e?t?µ?s??? t?? ßa?µ? ?µ???t?ta?. - ? a???µ?t??? t?µ? ?µ???t?ta? e???eta? ap? t??
??f?as? -
- ?µ???t?ta
-
- ?p?? SSNC
wµ??e??? pa?a????? (1..min(m,n)) - ?a? n, m ta µ??? t?? s?µß???se????.
- p.?. ??a sabc de ?a? tabc k de e??a?
-
- Sim(s,t) 0,638
15??G???T??? ??????G??F???S ?????????S (8/9)
- Soundex Algorithm ?as??eta? st?? ?d?a t?? ?t?
µetaß???? t?? ??????af?a? a???st??? ?µ????
s???aß?? ? ??aµµ?t?? ?d??e? se ?µ???t?ta t??
???e?? p?? ta pe???aµß?????. ?p?d?de? se ???e
???µa ??a tet?a??f?? ??d??? ? ?p???? ?e???? µe
???µµa ?a? a??????e?ta? ap? t??a ??f?a. ?µ??e?
???e?? ?a ????? t?? ?d?? ??d???. - p.?. s1Darwin s2Davidson s3Derwin
- Darwin ? Drn ? D65 ? D650
- Davinson ? Dvnsn ? D1525 ? D152
- Derwin ? Drn ? D65 ? D650
-
i) "1" to B, F, P, V ii) "2" to C, G, J, K, Q, S,
X, Z iii) "3" to D, T iv) "4" to L v) "5" to M,
N vi) "6" to R
16??G???T??? ??????G??F???S ?????????S (9/9)
- Token-Based Methods ??????? t?? ?µ???t?ta d??
??t?t?t?? ?? ?p??e? ap?te????ta? ap? ??a s?????
st???e??? (tokens). ???sµ??e? ap? a?t??
???s?µ?p????? ?a? stat?st??? st???e?a ap? s?µata
?e?µ???? ? t?µ?? p??a??t?ta? (p.?. TFIDF,
Fellegi-Sunter, etc). - ? p?? ap?? µ???d?? e??a? ? µet???? Jaccard ?
?p??a e???e? t?? ßa?µ? ?µ???t?ta? ?e ß?s? t??
s??s? -
- Sim(s,t)
17?????S??S (1/2)
- Se??? Q-grams St?? ?da???? pe??pt?s? ??
s?µß???se???? ?a ????? ?(?1)/2 p????? ??????
?p?a?????????. S??ep?? - ?µ???t?ta
- ?p?? L t? µ????te?? µ???? t?? d??
s?µß???se????. - S??d?asµ?? t?? a??????µ?? Jaro-Winkler µe
- LCSs
- LCSt
- Common bi-grams
- Common tri-grams
18?????S??S (2/2)
- S??d?asµ?? t?? a??????µ?? Dice µe
- LCSt
- Common bi-grams
- ???s? t?? LCSs ?a? LCSt µe??d??.
- ?µ???t?ta
19??G???T??? ??? ???S????????????
- Lin Second Measure
- Maedche-Staab
- Jaro
- Jaro-Winkler
- Jaro-Winkler LCSs
- Jaro-Winkler LCSt
- Jaro-Winkler bi-grams
- Jaro-Winkler tri-grams
- Smith-Waterman
- Needleman-Wunch
- Q-grams series
- Dice
- Dice bi-grams
- Dice LCSt
- Simple LCSs
- Simple LCSt
?? a??????µ?? Ratcliff-Obershelp
Yang-Yuan-Chun-Peng ???s?µ?p??????a? st??
epe?t?se?? t?? a?????? a??????µ??.
20S???S????G??? ?????????
2? ????S
- ?a?????e? ta ????? ?a? ta d?af??et??? st???e?a
d?? ??t?t?t??. - O? st???e?a ?e?????ta? d?µ???? ?????e? t?? ???e
??t?t?ta?. - ?p?d?s? a???µ?t???? t?µ?? ?µ???t?ta?.
- ? t?µ? a?t? de? µp??e? ?a a?apa?ast?se? t??
?????????? d??stas? t?? ?µ???t?ta? d?? ??t?t?t??.
21S?M?S????G??? ????????? vs S???S????G??? S?S????S?
- ?p?te???? d?af??et???? ?????e?.
- ?.?. ?? ?????a a?t?????t? s?et??eta? µe t??
?????a ße?????. ?? a?t?????t? ??e? µe?a??te??
?µ???t?ta µe t?? ?????a p?d??at? d??t?
µ???????ta? pe??ss?te?a ????? st???e?a ?p?? ??e?
??de? ? ???e?ta? ?.?p.
22????G????S
- Ontology Based ???s? ??t??????? (p.?. Wordnet)
?a? t?? s??se?? p?? ?p?????? a??µesa st??
?????e?. - Corpus Based ???s? s?µat?? ?e?µ???? ??a t??
e?a???? stat?st???? st???e??? ??a ???e ?????a. - Information Content Approaches ???s? t??
?e??e??µ???? ?????f???a? (IC) t?? e??????
?ß??d??? p??s????s?. S?????? ???s?µ?p?????ta?
s?µa ?e?µ????. - Dictionary Based ???s? ?e????? p??spe??s?µ? ap?
µ?a µ??a?? ??a t?? d?ap?st?s? t?? s??se?? µeta??
t?? e??????.
23???F???S S???S????G???S ?????????S
- ??af????
- EvaluatingWordNet-based Measures of Lexical
Semantic Relatedness, A. Budanitsky, G. Hirst,
Computational Linguistics. - Computational Models of Similarity in Lexical
Ontologies, N. Seco, Msc Thesis.
24?????S? S??????O?
- ???s?µ?p??e?ta? t? ??e?t?????? ?e???? WordNet.
- ???s? t?? ?e?a???a? t?? ??s?ast????.
- ?e???aµß??e? 79689 s????a s?????µ?? ??s?ast????.
- ???s? t?? s??se?? µeta?? t?? s?????? s?????µ??
(?p????µa, ?p???µa, µe????µa, ?.?p.) - ? p?? s?µa?t??? s??s? e??a? ? is-a-kind-of
(?p????µa ?p???µa) p?? ?p?d????e? ?t? µ?a
?????a ap?te?e? e?e?d??e?s? µ?a? ?????.
25WORDNET - ???????G????
- The noun good has 3 senses (first 3 from tagged
texts) -
- 1. (11) good -- (benefit "for your own good"
"what's the good of worrying?") - 2. (9) good, goodness -- (moral excellence or
admirableness "there is much good to be found in
people") - 3. (6) good, goodness -- (that which is good or
valuable or useful "weigh the good against the
bad" "among the highest goods of all are
happiness and self-realization")
10 senses of bank
Sense 1 depository
financial institution, bank, banking concern,
banking company -- (a financial institution that
accepts deposits and channels the money into
lending activities "he cashed a check at the
bank" "that bank holds the mortgage on my
home") gt financial institution, financial
organization, financial organisation -- (an
institution (public or private) that collects
funds (from the public or other institutions) and
invests them in financial assets) gt
institution, establishment -- (an organization
founded and united for a specific purpose)
gt organization, organisation -- (a group
of people who work together)
gt social group -- (people sharing some social
relation) gt group,
grouping -- (any number of entities (members)
considered as a unit)
26??G?????? S???S????G???S ?????????S (1/7)
- Leacock-Chodorow ???s?µ?p??e? t?? s??se??
?pe???µ?? ?p???µ?? ??a t?? µ?t??s? t?? µ?????
µ???pat???. - simlch(c1,c2)-log(
), Dmax Depth - Rada ? ap?stas? e?a?t?ta? ap? t? p????? t??
a?µ?? p?? ???????? t?? d?? ?????e?. - dist(c1,c2)p????? a?µ?? p?? ???????? ta
c1,c2 - ????? ????pat??? ? ?µ???t?ta e??a? t? a?t?st??f?
t?? e????st?? µ????? µ???pat???. - simpath(c1,c2)
27??G?????? S???S????G???S ?????????S (2/7)
- Wu-Palmer St????eta? st?? ap?stas? t?? d??
e?????? ?a? st? ß???? st? ?p??? ß??s???ta? st??
?e?a???a. - simwup(c1,c2)
- Wu-Palmer-Resnik St????eta? ap???e?st??? st?
ß???? t?? e?????? ?a? t?? ?????? ????a. - simrwup(c1,c2)
28??G?????? S???S????G???S ?????????S (3/7)
- Resnik ?as??eta? st? pe??e??µe?? p????f???a?
(Information Content) t?? ?????? ????a (LCS
Least Common Subsumer). - IC(c)-log(p(c)), ?p?? t? p(c) e???eta? ap?
??p??? s?µa ?e?µ????. p(c)freq(word)/N µe ? t?
p????? t?? ???e?? s???????. - S??ep?? simres(c1,c2)IC(LCS(c1,c2))
- Jiang-Conrath ?a??pte? ta µe???e?t?µata t??
p??????µe??? µe??d??. ? ?µ???t?ta e???eta? ap? t?
pe??e??µe?? p????f???a? t?? e?????? ?a? t??
?????? ????a - S??ep?? distjcn(c1,c2)IC(c1)IC(c2)-2.IC(LCS
(c1,c2))
29??G?????? S???S????G???S ?????????S (4/7)
- Lin ?p?te?e? pa?a??a?? t?? p??????µe??? µe??d??.
- simlin(c1,c2)
- Tversky ???s? t?? ?e???a? s?????? ??a t??
e?a???? t?? te????? t?µ??. - simtvr(c1,c2)x.f(?(c1).?(c2))-y.f(?(c1)\
?(c2))-z.f(?(c2)\ ?(c1)) , µe x,y,z
pa?aµ?t????. - f(?(c1)?(c2))? ap?d?s? t?µ?? st?? t?µ?
- f(?(c1)\?(c2))? ap?d?s? t?µ?? st?? d?af???
(st???e?a t?? 1?? ?????a? p?? de? ?p?????? st??
2?) - f(?(c2)\?(c1))? ap?d?s? t?µ?? st?? d?af???
(st???e?a t?? 2?? ?????a? p?? de? ?p?????? st??
1?)
30??G???T??? S???S????G???S ?????????S (5/7)
- Lesk ?as??eta? st?? pe????af?? t?? e??????.
- ???? 1 pine Senses 2
- Sense 1 kind of evergreen tree with
needle-shaped leaves - Sense 2 waste away through sorrow or illness
- ???? 2 cone Senses 3
- Sense 1 solid body which narrows to a point
- Sense 2 something of this shape whether solid
or hollow - Sense 3 fruit of certain evergreen tree
- Extended Lesk ????e? ?a? st?? pe????af?? t??
?e?t?????? e??????.
31??G?????? S???S????G???S ?????????S (6/7)
- Rondriguez-Egenhofer ???s?µ?p??e? t?? ?e???a
s?????? t?? Tversky. - S(s,t)
-
- ?ta?
?s??e? depth(s) lt depth(t) - a(s,t)
- 1- ?ta?
?s??e? depth(s) gt depth(t)
32??G?????? S???S????G???S ?????????S (7/7)
- Li-Zuhair-Bandar-McLean pe??aµat?st??a? µe 10
µet????? ?? ?p??e? ap?te???? ??aµµ????? ? µ?,
s??d?asµ??? t?? pe??e??µ???? p????f???a?, t??
ß????? (?), t?? µ????? e????st?? µ???pat??? (l),
t?? ß????? t?? ?????? ????a (h), t?? t?p????
p????t?ta? t?? d?? e?????? (d) ?a? d?af????
pa?aµ?t??? p?? ß??s???ta? st? d??st?µa 0,1
(a,b,?). -
- S6S1.
- S7S2.
- S8S3.
- S9S4.
- S10
- S12.M-l
- S2a.S1b.d
- S3e-al
- S4e-al
- S5S4?.IC(LCS(c1,c2))
33???????G?S SECO
- ? Seco p??te??e st?? d?p??µat??? t?? ?a e???eta?
t? pe??e??µe?? p????f???a? ap? t? p????? t??
?p???µ?? p?? ??e? µ?a ?????a st?? ?e?a???a t??
WordNet. - ICwn(c)1- ,
-
- ?p?? maxwn µ???st?? a???µ?? e??????
- ?e ß?s? t? pa?ap??? ? e?a???? ?µ???t?ta? ß?se?
t?? µe??d?? Tversky p??te??eta? ?a ???eta? ap? - simtvr(c1,c2)3.IC(LCS(c1,c2))-IC(c1)-IC(c2)
34??G???T??? ??? ???S????????????
- Leacock-Chodorow
- Jiang-Conrath
- Lin
- Wu-Palmer
- Wu-Palmer-Resnik
- Tversky
- S1
- S2
- S3
- S4
- S5
- S10
- Simple Distance
- Rada
G?a t?? ?p?????sµ? t?? pe??e??µ????
p????f???a? ???s?µ?p??e?ta? ? pa?a??a?? Seco.
35??T?????G??
- ?????? st???? ? s?????s? s?µß???se???? p??
p???????ta? ap? t?? ???? t?? ??t??????? ?a? t??
ß?se?? ded?µ????. - Ta e?a??e? ap?? ? µ?s?? ???? t?? ap?te?esµ?t??
t?? a??????µ??. - ???ß??µata ?p??????, ?????? ??a t?? e?a???? t??
s?µas????????? ?µ???t?ta?, d??t? ?? s?µß???se????
µp??e? ?a µ?? ap?te???? ?????e? ???e??. - ??s? ap?te?e? ? d??spas? t?? s?µß???se????.
- Se pe??pt?s? p?? de? p?????e? ??te µ?a ??????
????, t?te t? ap?t??esµa st????eta? ap???e?st???
st?? ?e??????af??? ?µ???t?ta.
36???S??S? S??????S???O?
- ??a??????µe d?? pe??pt?se??
- ?? s?µß???se???? pe??????? e?d??? s?µß??a ?p?? _,
, ?.?p. ? se ??p??a s?µe?a pe??????? a???µ??? ?
?efa?a?a ???µµata. - ?? d??spas? ???eta? s a?t??? t??? ?a?a?t??e?.
- ?? s?µß???se???? de? pe??????? e?d?????
?a?a?t??e?, a???µ??? ? ?efa?a?a ???µµata. - ?? d??spas? ???eta? se ?p?s?µß???se???? µe 3 ?
??? ???µµata p?? ap?te???? ?????e? ?ata????se??
t?? WordNet. ?se? s?µß???se???? p????pt???
e?et????ta? ??a t?? ?µ???t?ta µe ß?s? t??
a??????µ? Monge-Elkan.
37??G???T??S MONGE-ELKAN
- ? ?µ???t?ta d?? s?????? st???e??? ?, ? e?f???eta?
?? e??? -
- match(A,B)
match(Ai,Bj) - ? a??????µ?? de? e??a? s?µµet?????.
38???????OS?
- ?a ded?µ??a p???????ta? ap?
- ?? tµ?µa p????f?????? t?? pa?ep?st?µ??? t??
????????, ?p?? d?d??ta? ded?µ??a ??a ta???asµa
s??µat?? ?a? st???e??? ??t???????.
???s?µ?p?????ta? ta st???e?a a?t?st????s??
µa??µ?t?? pa?ep?st?µ??? ?a? t?? ?eµat????
e??t?ta? Real Estate. - ?? pa?ep?st?µ?? t?? ?e???????, ?p?? ???eta? ?
pe????af? t?? ???ssa? D2R. ???s?µ?p??e?ta? t?
pa??de??µa t?? a?t?st????s?? µ?a? ß?s?? ded?µ????
?atast?µat?? p???s?? CD se s??µa RDF.
39???????S???? (1/4)
- ?a a??e?a ded?µ???? pe??????? 116 a?t?st????se??.
- ??a?????st??a? s?st? ?? 88, µe p?s?st? ?µ???t?ta?
p??? ap? 75 (???? p?? t????e). ?? p?s?st?
ep?t???a? e??a? 75,86. - ?? p????? t?? a??????µ?? ap?de????eta? a??et?
µe????, e?d??? st?? pe??pt?s? p?? ????µe ??a
epe?e??as?a µe???? ???? ded?µ????. - ? pa?a??a?? p?? p??te??e ? Seco ap?de?????e
a??et? ßa??? d?ad??as?a e?d??? ??a t?s? µe????
p????? ded?µ????.
40???????S???? (2/4)
- ??a????a? ???s?µa s?µpe??sµata ?s?? af??? st??
s?µpe??f??? t?? a??????µ??, ?d??? t??
?e??????af???? ?µ???t?ta?. - G?a t?? e?a???? µ?a? p?? a?t??e?µe????? t?µ??
??e???eta? ? µ?-??aµµ???? s??d?asµ??
?e??????af???? ?a? s?µas????????? ?µ???t?ta?. - G?a t?? e?a???? µ?a? a?t??e?µe????? t?µ??
?µ???t?ta? ??e???eta? ? s??d?asµ?? (?s??
ßas?sµ???? st?? ?d?e? t?? s?µß???se????!!) t?µ??
t?? a??????µ??.
41???????S???? (3/4)
First String student Second String
dentist LIN SECOND
0.14285714285714285 MAEDCHE STAAB
0.14285714285714285 JARO
0.6 JARO WINKLER
0.76 JARO WINKLER LCSSt
0.72 JARO WINKLER LCSSs
0.76 JARO WINKLER TRIGRAMS
0.6799999999999999 JARO WINKLER BIGRAMS
0.76 SMITH WATERMAN
0.5714285714285714 NEEDLEMAN WUNCH
0.5714285714285714 Q GRAMS SERIES
0.5357142857142857 DICE (COMMON CHARS)
0.8571428571428571 DICE (COMMON
BIGRAMS) 0.6666666666666666 DICE LCSSt
0.5714285714285714
LCSSt
0.5714285714285714 LCSSs
0.5714285714285714 RATCLIFF OBERSHELP 7
1.0 YANG YUAN ZHAO CHUN PENG 0.5714285714285714
42???????S???? (4/4)
First String ltarchaeologygt Second String
ltsocial_sciencegt ---------------------------------
---------------------------- 1. Leacock-Chodorow
0.8339850002884617 2. Jiang-Conrath
0.9513280478322936 3. Lin 0.9315807363671041 4.
Wu-Palmer 0.875 5. Wu-Palmer-Resnik 0.875 6.
Tversky(Seco) 0.8551206815796943 7. S1
0.9444444444444444 8. S2 0.413326816552623 9.
S3 0.6065306597126334 10. S4 0.6062579431847053
11. S5 0.6725285380921548 12. S10
0.9999983369439447 13. Simple Distance 0.5 14.
Rada et Al 0.3333333333333333 Average of All
measures0.7427453241665282 Max Similarity
0.9999983369439447 at 12
First String ltsocial_sciencegt Second String
ltsocial_sciencegt ---------------------------------
---------------------------- 1. Leacock-Chodorow
1.0 2. Jiang-Conrath 1.0 3. Lin 1.0 4.
Wu-Palmer 1.0 5. Wu-Palmer-Resnik 1.0 6.
Tversky(Seco) 0.8875686496914987 7. S1 1.0 8.
S2 1.0 9. S3 1.0 10. S4 0.9995503664595333 11.
S5 1.0 12. S10 0.9999983369439447 13. Simple
Distance 1.0 14. Rada et Al 1.0 Average of
All measures0.9919369537924984 Max
Similarity 1.0 at 1
43??????S??S (1/2)
- ??µ??????a e??a?e??? ep?????? a??????µ??
44??????S??S (2/2)
- ??a???? ap?te?esµ?t?? ??a Precision, Recall and
F-measure ?a? s?????s? µe ???e? e??as?e?. - ???tas? t?? ep?pt?s?? t?? threshold (ep???????e
t? 75) st?? pa?ap??? t?µ??. - ?p?d?s? s???e???µ???? ßa??? st??? a??????µ???
a?t?µata µe ß?s? ??p??a ?a?a?t???st??? t??
??t?t?t??. - ?p???e?sµ?? a??????µ?? µe ß?s? ta ap?te??sµat?
t??? (p.?. µe ???s? t?? d?a??µa?s??). - ?e????af? ???? t?? a??????µ?? µe ß?s? µ?a ?????
?????a (p.?. LCSs) ? ep????? a??????µ?? p??
µp????? ?a pe????af??? µe µ?a ????? ?????a. - ???s? s???e???µ???? a??????µ?? µe ß?s? ta p???
epe?e??as?a ded?µ??a ?ste ?a ap?fe?????? ta
µe???e?t?µata p?? t???? pa?at?????ta?.
45????????S??S S???S ??G???T???S ??????G??F???S
?????????S (1/2)
- ?s?????
- LCSs (mn-Lev)/2
- LCSs a1LCSta2, µe a1 ?a e??a? µeta?? 0 ..
indexof(LCSt)-1 ?a? a2 µeta?? 0 .. indexof(last
character)-(indexof(LCSt) length(LCSt))-1 - LCStltLCSsltmin(m,n)
- T????µe Lev(s,t) max(m,n)-min(m,n) 0
- ??a ta pa?ap??? µa? ß?????? ?a ??s??µe ??p??a
???a t?µ??.
46????????S??S S???S ??G???T???S ??????G??F???S
?????????S (1/2)
- ?s? a????e? ? ap?stas? Levenshtein t?s? µe???eta?
t? µ???? t?? LCSs. S??ep?? ?a p??pe? ?a
ap?fe????ta? ?? a??????µ?? p?? ßas????ta? se
a?t?? t?? µe??d???. ???? ???? t?µ?? ?µ?? p??pe?
?a ???s?µ?p????e?? - ?a?? e??a? ta µ??? t?? s???????µe???
s?µß???se???? ?a e??a? ?sa ? ?a ????? ???t????
t?µ?? ?ste ?a ?p???e? µe?a??te?? p??a??t?ta ?a
e?a??e? µ???? t?µ? ap?stas?? µeta?? t???. - ?s? ? LCSt p??se????e? t?? LCSs ?a? ta d?? µa??
p??se??????? t? min(m,n) t?s? µe???eta? ?
ap?stas? Levenshtein ?a? s??ep?? a????e? ? ßa?µ??
?µ???t?ta?. - ?ta? s????????µe s?µß???se???? p?? de? ap?te????
?????e? ? s??d?asµ? ? tµ?µata ??????? ???e??
p??pe? ?a ap?fe????ta? a??????µ?? p?? s?????????
d????µµata ? a??µ? ?e???te?a t?????µµata.
47S??????S??
- ?a?a?t???sµ?? t?? a??????µ?? ?e??????af????
?µ???t?ta? µe ß?s? ta LCSt, LCSs, Lev, max(m,n),
min(m,n) ? ??s???? e??e???µa. - ???a??? ??a t?? ep????? ??p???? a??????µ?? ?a
p??pe? ?a ?atas?e?aste? ? ?a ß?e?e? µ?a s????t?s?
? ?p??a pa?????ta? ?? ???sµata t?? pa?ap??? t?µ??
?a e???e? e?te ??p??? ß???? p?? ?a ap?d?deta? se
???e a??????µ? e?te ?a ap???e?e? t?? ???s?
s???e???µ???? µet?????. - ?p???e?sµ?? a??????µ?? p?? ?? t?µ?? t??? de?
s?µf????? µe t?? t?µ?? p?? ?a d?s??? ??p????
experts (?p???e? e??as?a a??? ??? ??a t?s???
p?????? a??????µ???) ? ??s????a st?? ?atas?e??
t?? ded?µ???? sta ?p??a ?a ??????? ?? experts ?a
d?s??? t?µ?? ?µ???t?ta? ?a??? ?a? t? p??e? t?µ??
?a ap?d?d??? se ???e ?e???? s?µß???se????.
48????S!!?