Title: ?et????? ??t?
1?et????? ??t?µ?s?? ?p?d?s??
- ?as??? p??? t? ß?ß??? R. Baeza-Yates, B.
Ribeiro-Neto, Modern Information Retrieval,
Addison Wesley, 1999 (second edition,
2011, http//mir2ed.org/)
- ?as??? p??? t? ß?ß??? R. Baeza-Yates, B.
Ribeiro-Neto, Modern Information Retrieval,
Addison Wesley, 1999 (second edition,
2011, http//mir2ed.org/)
?as??? p??? t? ß?ß??? ?a? ?? d?af??e?e? R.
Baeza-Yates, B. Ribeiro-Neto, Modern Information
Retrieval, Addison Wesley, 1999 (second edition,
2011, http//mir2ed.org/)
2??ass???? ?et????? (??t?µ?s? ?p?d?s??)
- ???????/???????? p???p????t?te? d?µ??
de??t?d?t?s?? - ?p????????a µe t? ?e?t??????? S?st?µa
- ?a??ste??se?? st??? d?a????? ep????????a?
- ?p?ßa???se?? ap? ?pa??? p????? ep?p?d??
????sµ????
3??d???? ?et?????(??t?µ?s? ?p?d?s?? ????t?s??)
- S?????? ?e?µ???? ??af????
- s?????? ?e?µ????
- s?????? p??t?p?? p????f???a??? a?a???? Q
- s?????? s?et???? ?e?µ???? ??a ???e q ? Q
- ?at?????? µet???? ap?d?s?? a???t?s??
4?e?µe?a ??af????
- TREC (TREC evaluation collections WSJ (Wall
Street Journal, AP (Associated Press), ZIFF, FR,
DOE, PATents) - GOV2 (25 million page GOV2 web page collections
terabyte track) - NTCIR (NII Test Collections for IR systems,
focusing on East Asian, cross language
information retrieval) - CLEF (Cross Language Evaluation Forum
http//www.clef-campaign.org) - Reuters (Reuters-21578 and Reuters Corpus Volume
1 collection) - Cranfield (1398 abstract of aerodynamics journal
articles, 225 queries) - CACM collection
- ISI (Institute of Scientific Information)
collection - Newsgroups
5??????s? (Recall) ?a? ????ße?a (Precision)
?st? ? µ?a p??t?p? p????f???a?? a????? ?a? R t?
s????? t?? s?et???? t?? ?e?µ????. ?p???ste ?t?
µ?a d?sµ??? st?at????? a???t?s?? pa???e? ??a
s????? ?e?µ???? ap??t?s?? ?. ?st? Ra t? s?????
t?? ?e?µ???? p?? e??a? ????? sta s????a R ?a? A.
6S??s? ????ße?a?/??????s??
D
R
A
Ra
7S??s? ????ße?a?/??????s??
8S?ed?as? ??a???µµat??
?st? e??t?µa q t? ?p??? a???e? st? s?????? t??
p??t?p?? p????f???a??? a?a???? ?a? ?st? Rq t?
s????? t?? s?et???? ?e?µ???? ??a t? e??t?µa q
?p?? ??e? ?a????ste? ap? e?d?????. G?a pa??de??µa
a? ?p???s??µe ?t? t? s????? Rq pe????e? ta
a??????a ?e?µe?a Rqd1, d3, d5,d7, d9,d13, d21,
d41, d43, d45.
1. d7 6. d5 11. d4 2. d2 7. d28 12. d40 3.
d3 8. d12 13. d10 4. d6 9. d22 14. d36 5.
d8 10. d13 15. d1
9S?ed?as? ??a???µµat??
Te????ta? ?t? ? a???µ?? t?? ep?st?ef?µe???
?e?µ???? e??a? 30, s?ed??ste ta ??af?µata
a?????s?? a???ße?a?, ??a ta a??????a e??t?µata
(d????ta? ? a???µ?? t?? s?et???? ?e?µ???? ?a? ?
??s? t??? st? ap?t??esµa) ???a??1,????µ?? 10,
T?s? 1, 5, 7, 8, 9, 13, 17, 26, 27, 28
???a??2.????µ?? 10, T?s? 2, 3, 4, 5, 7, 10,
11, 12, 16, 27. ?e ß?s? ta d?? p????pt??ta
??af?µata s??????ete µeta?? t??? t?? d?? µ??a???.
10S?ed?as? ??a???µµat??
- S?????? t? d????aµµa a?t? ßas??eta? se 11 p??t?pa
ep?peda a?????s?? ta 0, 10, ..., 100, ?p?? se
???e ep?ped? ? a???ße?a ?p??????eta? µe ???s?
µ?a? d?e??as?a? pa?eµß???? (interpolation) t??
a???????? µ??f?? ?st? rj, j?0,1,2,,10 t?
j-?st? ep?ped? a?????s?? t?te - P(rj)max rj?r?rj1 P(r)
- ??µata ?????s?? (t?p??? ??a TREC)
- ?p?????se interpolated precision ??a recall
levels 0.0, 0.1, - ?p?????se ??a ???e e??t?s? se ???e evaluation
benchmark - ?p?????se µ?se? t?µ?? ??a ???e e??t?µa
11S????? ??a??aµµ?t??
- ??s? a???ße?a ??a ???e s?et??? ?e?µe?? p??
a?a?t?ta? (Mean Average Precision (latest TREC
Conferences)) -- µp??e? ?a ?e????e? ?a? ?t?
a?apa??st? t? s??????? eµßad?? - R-????ße?a
- pa???eta? µ?a t?µ? s?????? p?? ?p??????eta? ?? ?
a???ße?a st? R-?st? ??s? d??ta???, ?p?? R e??a?
? s???????? a???µ?? t?? s?et???? ?e?µ???? ??a
t?? t?????sa e??t?s? (d??ad? ? a???µ?? t??
?e?µ???? st? s????? Rq).
- ?st????µµata ????ße?a?
- ?st? RPA(i) ?a? RPB(i) ?? t?µ?? t?? R-a???ße?a?
??a d?? a??????µ??? a???t?s?? A,B ??a t? i-?st?
e??t?µa. ??????µe t?? a??????? d?af???
RPA/B(i)RPA(i)-RPB(i).
12Receiver Operating Characteristics
- true positives (tp) retrieved and relevant
- false positives (fp) retrieved and non
relevant - true negatives (tn) non relevant and
non-retrieved - false negatives (fn) non relevant and
retrieved - sensitivitytp/(tpfn), false-positive
rate or 1-specificityfp/(fptn). - Ptp/(tpfp), Rtp/(tpfn)
13?ata?????t?ta ????ße?a?/??????s??
- ?pa?te?ta? ?ept?µe?? ???s? ???? t?? ?e?µ???? t??
s??????? p?? se µe???e? s??????? de? e??a?
d?a??s?µ? - ? ?ata??af? µ?a? µ??? µet????? a?t? ??a d??
e??a? s?????? e????st? - Se µ??t???a s?st?µata ? d?epaf? ?a? ?
a????ep?d?as? µe t?? ???st? ap?te???? s?µe??
??e?d? st?? epe?e??as?a e??? e??t?µat??, ??t? p??
?a??st? ep?ta?t??? t?? ?????t?s? µet????? p??
t?? ?aµß????? ?p???. - O? µet????? a?????s?? ?a? a???ße?a? e??a?
?at?????e? ?ta? ?p???e? µ?a ??aµµ??? d??ta?? sta
a?a?t?µe?a ?e?µe?a, d?af??et??? µp??e? ?a e??a?
a?a???ße??.
14??a??a?t???? ?et?????
- ??µ?????? ??s?? ????
- ? ?et???? ?
- ?et????? ???sa?at???sµ??e? p??? t?? ???st?
15??µ?????? ??s?? ????
? a?µ?????? µ?s?? ???? F a?????s?? ?a? a???ße?a?
????eta? ?? e???
?p?? R(j) e??a? ? a?????s? ??a t? j-?st? ?e?µe??
st? d??ta??, P(j) e??a? ? a???ße?a ??a t? j-?st?
?e?µe?? st? d??ta?? ?a? F(j) e??a? ? a?µ??????
µ?s?? ???? t?? R(j), P(j). ??t?a ??a t?? ep?????
a?t?, e??a? ?t? ? a?µ?????? µ?s?? ??? p??se????e?
t? e????st? t?? d?? t?µ?? ?a? ??? t? µ???st?.
16? ?et???? ?
? µet???? ? ????eta? ?? e???
-- R(j) e??a? ? a?????s? ??a t? j-?st? ?e?µe??
st? d??ta??, P(j) e??a? ? a???ße?a ??a t? j-?st?
?e?µe?? st? d??ta?? ?a? F(j) e??a? ? a?µ??????
µ?s?? ???? t?? R(j), P(j). -- t?µ?? bgt1,
s?µa??e? ?t? ? ???st?? e?d?af??eta? p?? p??? ??a
a???ße?a, t?µ?? blt1 ?t? e?d?af??eta? ??a
a?????s?.
17?et????? ???sa?at???sµ??e? p??? t?? ???st? (1)
?st? R t? s????? t?? s?et???? ?e?µ???? ??a t??
p????f???a?? a????? I, A t? s????? t?? ?e?µ????
p?? ??e? a?a?t??e? ?a? U ? R t? s????? t??
?e?µ???? p?? e??a? ???st? st? ???st? ?t? e??a?
s?et??? p??? t? e??t?µa t??. ?st? Rk ? t?µ? t??
s?????? ? ?a? U ?a?Ru o a???µ?? t?? s?et????
?e?µ????, p?? de? ??????e p??? ? ???st?? ?a? ta
?p??a ????? a?a?t??e?.
- ?a?µ?? ??????? (coverage ratio)
- Ba?µ?? ?a???t?µ?a? (novelty ratio)
18???e? ?et?????
- S?et??? a?????s? (relative recall) ????eta? ?? t?
p????? a??µesa st?? a???µ? t?? s?et???? ?e?µ????
p?? ????? a?a?t??e? ?a? t?? s?et???? ?e?µ???? p??
? ???st?? pe??µ??e? ?a a?a?t?????. - ??st?? a?????s?? (recall effort) ??????µe t?
p????? a??µesa sta s?et??? ?e?µe?a p?? ? ???st??
a?aµ??e? ?a e?t?p?se? ?a? ta ?e?µe?a p?? e?et??e?
µ????? ?t?? e?t?p?se? a?t? p?? a?aµ??e?.
19???e? ?et????? ???a??? ?a??µat??
- ??s? ??????a de??t?d?te?
- ????µ?? ?e?µ????/??a
- µ?s? µ??e??? ?e?µ????
- ??s? ??????a apa?t?
- ??f?ast???t?ta ???ssa? e??t?s??
- ??a??t?ta d?at?p?s?? p???p????? p????f???a???
a?a???? - ?a??t?ta p???p????? e??t?se??
20??t??s? ??a??p???s?? ???st?
- T?µa p???? ???st? ?????µe ?a ??a??p???s??µe
- e?a?t?ta? ap? t?? efa?µ???
- Web engine ? ???st?? e?t?p??e? a?t? p?? ???e?
?a? ep?st??fe? st?? ?d?a µ??a?? - ?ata??af? ???µ?? ep?st??f?? ???st?
- eCommerce site ? ???st?? ß??s?e? a?t? p?? ???e?
?a? ???e? a???? - ???a? ? end-user, ? t? eCommerce site t? ?p???
µet??µe - ??t??s? ?????? a?????, ? p?s?st? ???st?? p??
????a? a???ast??
21??t??s? ??a??p???s?? ???st?
- Enterprise (company/govt/academic) Care about
user productivity - How much time do my users save when looking for
information? - breadth of access, secure access, etc.
22Web Search Evaluation
- H a?????s? e??a? d?s???? ?a ?p?????ste? st?
Web - ?? µ??a??? ?a??µat?? s???? ???s?µ?p?????
a???ße?a sta p??ta k, p.?., k 10 ?e?µe?a ?
µet????? p?? p??µ?d?t???, t?? ??????? a???t?s?
????fa??? se??d?? - ?? µ??a??? ???s?µ?p?????
ep?s?? non-relevance-based µet?????. ?a??de??µa
1 clickthrough st? p??t? ap?t??esµa (a? ?a? ???
p??? a???p?st? µet???? e??a? a???p?st? ?at? µ?s?
???). ?a??de??µa 2 ??e? te?????? p?? a??µ? de?
????? ????a???se? st? pe????? ?a??de??µa 3 A/B
testing
23?/? ?et????
?est???sµa ?a???t?µ?? ???????µ?? ???apa?t??µe??
?pa??? µ?a? µ??a??? ?a??µat?? ?etat?p?s? e???
µ????? p?s?st?? t?? ?????f???a? (pe??p?? 1) se
??a ??? s?st?µa, p?? s?µpe???aµß??e? t??
?a???t?µ?a ????????s? µe µ?a a?t?µat? µet????
?p?? clickthrough st? p??t? ap?t??esµa ?a?a??a??
d?ste st??? ???ste? t? d??at?t?ta ?a µeta????????
st? ??? a??????µ?.
24Benchmark collection
S?????? ?e?µ???? - a?t?p??s?pe?t??? t?? ?e?µ????
p?? d?a?e?????µaste S?????? p????f???a???
a?a???? - ... ?a??asµ??a a?af????ta? ??
e??t?µata - a?t?p??s?pe?t??? a?t?? p??
a?aµ????µe ?ata??af? s?et???t?ta? - apa?a?t?t? ?
???s? ???t?? ? d?af??et??? e?t?µ?t??
s?s??t?s?? - d?ad??as?a a???ß? ?a? ?????ß??a -
?? ???se?? p??pe? ?a e??a? a?t?p??s?pe?t???? t??
e?t?µ?s?? t?? ???st?? - ?? ???se?? p??pe? ?a
e??a? µeta?? t??? s??epe?? - p?? µp??e? ?a
a????????e? ? s???pe?a t?? ???st?? (kappa
µet????) - t?µ?? t?? k ap? 2/3 ?? 1 ?e?????ta?
??a??p???t????.
25K µet????
- K e??a? µet???? p?? a??????e? ?at? p?s? d??
???t?? s?µf????? ? d?af????? - S?ed?asµ??? ??a ?at??????? ???sµata
- P(A) e??a? t? p?s?st? s?µf???a? t?? d?? ???t??
- - P(E) e??a? t? p?s?st? s?µf???a? ap? t???
- ? µet???? K ?p??????eta? ?? e???
- K(P(A)-P(E))/(1-P(E))
- ?a? ?? d?? p??a??t?te? ?p????????ta? ap? p??a?e?
a???????se?? t?? d?? ???t??. - ??? s???e???µ??a P(E)P(relevant)2P(non_relevant)
2 ?p?? ?a? st?? d?? a???????se?? pa?????µe
?p???? µa? ??e? t?? a???????se?? t?? referee.
26S?????? Cranfield
- - ?p? t?? p??te? s??????? ded?µ????, µe pa????
a?t?p?s?pe?t???? µ?t??? ??a ?ata??af? p?s?t???
t?? ap?te?esµat???t?ta? s???????. - ???? 1950, UK
- 1938 abstracts ?????? se pe???d??? ae??d??aµ????,
s????? 225 e??t?µ?t??, e?a?t??t???? ???se??
s?et???t?ta? ??a ??a ta ?e??? e??t?µ?t??-?e?µ???? - - ???et? µ????, ?a? ??? t?s? t?p??? ??a s?ßa??
a???????s? a???t?s? p????f???a? s?µe?a.
27S?????? TREC
- TREC (Text Retrieval Conference)
- ???a?????e ap? U.S. National Institute of
Standards Organization (NIST) - TREC e??a? µ?a s?????? ap? d?af??et???
benchmarks - G??st? ?? TREC Ad Hoc, ???s?µ?p??????e ??a t??
p??te? 8 TREC a???????se?? 1992-1999. - 1.89 e?at?µµ???a ?e?µe?a, ?????? ????a, 450
p????f???a??? a????e? - ??? e?a?t??t???? a???????se??, a??et? a???ß??
- ?as??? e?t?µ?se?? a???????s?? ?p?????? µ??? ??a
?e?µe?a p?? ?ta? a??µesa sta k p??ta p?? ?ta?
st?? TREC s?????? ?a? ep?st??f??a? st? d????e?a
ap??t?s?? µ?a? p????f???a??? a??????.
28S???????
- GOV2
- -- µ?a ???? TREC/NIST s??????
- -- 25 e?at?µµ???a web se??de?
- -- ap? t?? µe?a??te?e? d?a??s?µe? s???????
- -- 3 t??e?? µe?????? µ????te?? ap?
Google/Yahho/MSN - NTCIR
- -- East Asian Language ?a? Cross Language
Information Retrieval - Cross Language Evaluation Forum (CLEF)
- -- ??t? ? s?????? ??e? ep??e?t???e? se
????pa???? ???sse? ?a? cross language
information retrieval
29??sta ?p?te?esµ?t??
- ??? s???? title, url, ??sta µetaded?µ????
- ??a pe??????
- ??? ?p??????eta? ? pe??????
- ??? ßas??? e?d? pe???????, stat??? ?a? d??aµ???
- - stat??? a?e???t?t? e??t?s??
- - d??aµ??? e?a?t?µe?? ap? e??t?s?.
30Stat??? ?e??????
- ?e?????? t?? pe??e??µ???? t?? ?e?µ????
- ?? p??te? pe??p?? 50 ???e?? t?? ?e?µ????
- ??? p???p???e? pe?????e??, ???s? te?????? NLP
- - NLP heuristics ??a µa?????sµa p??t?se??
- - pe?????? pa???eta? ap? t?? ????fa?e?
p??t?se?? - ??? p???p???e? p??se???se?? efa?µ????? NLP ??a
pa?a???? p??t?se?? - - ??? ?t??µ? ??a ???s? se efa?µ????
31???aµ???? ?e?????e??
- ?a???s?as? e??? ? pe??ss?t???? pa??????? ?
snippets st? ?e?µe?? p?? pa???s?????? µe??????
ap? t??? ????? e??t?s?? - ?a?????ta? se s??d?asµ? µe t?? ap??t?s? st???
????? e??t?s?? - S?????? p??t?µ??ta? snippets ?p?? ?? ????
eµfa?????ta? sa? µ?a f??s? ? ?p?? ? e???t?t? t???
µ?sa st? f??s? p?a?µat?p??e?ta? se ??a pa??????
p?? ????eta? ap? t?? ???st? - ? pe?????? ? ?p??a ?p??????eta? ?ts? eµfa???e?
????? t??? ????? t?? pa?a?????, ??? µ??? a?t???
p?? eµpe??????ta? st?? e??t?s?.
32?e????? T?µata
- G?a t?? ??????? ???p???s? ?p?????sµ?? t??
snippets ?a p??pe? ?a ?????µe cache documents sta
?p??a ?a ???e? ? ?p?????sµ?? (ep????d???t?ta
te???? a?t? ?a e??a? outdated) - ??s? t? caching ?a ???eta? se ??a prefix t??
?e?µ???? ?at??????? µe?????? - ?da???? ta snippets ?a p??pe? ?a e??a? µ???? ?a?
?a µetaf????? ?da???? t? pe??e??µe?? t?? ?e?µ???? - ? ?pa??? d??aµ???? pe?????e?? e??a? s?µa?t???
??µa t? ?p??? p??pe? ?a p??se??e? ?ts? ?ste ?a
e??a? e??a??st?µ???? ? te????? ???st??.
33???te??p???s?
- ?a S?st?µata ?.?. ???s?µ?p????? ?????
de??t?d?t?s?? ??a ?a a?t?µet?p?s??? t??
p????f???a??? a????e? t?? ???st?. - ???? ?e??t?d?t?s??
- ??a keyword ? ?µ?da ep??e??µe??? ???e??
- ???e ???? (p?? ?e????)
- ?p?µ?????s? ?ata???e?? (stemming) µp??e? ?a
???s?µ?p????e? - connect connecting, connection, connections
- ??a a?est?aµµ??? a??e?? ?t??eta? ??a t???
d?sµ????? ????? de??t?d?t?s??.
34?e?µe?a
???? ?e??t?d?t?s??
?e?µe??
?a???asµa
?????f???a?? ??????
?at?ta??
???t?µa
35Ad-Hoc ????t?s? ?a? F??t????sµa
Q1
Q2
S?????? ?epe?asµ???? ?e??????
Q3
Q4
Q5
36Ad-Hoc ????t?s? ?a? F??t????sµa
?e?µe?a ??a ???st?2
???st??2 ???f??
???st??1 ???f??
?e?µe?a ??a ???st?1
??? ?e?µ????
37- ?at?ta?? e??a? µ?a ta????µ?s? t?? a?a?t?µ????
?e?µ???? p?? a?apa??st? t? s?et???t?ta t??
?e?µ???? µe t? e??t?µa t?? ???st?. - ??a ?at?ta?? ßas??eta? se ?p???se?? s?et??? µe
t?? ?????a t?? s?et???t?ta? ?p?? - ????? s????? ???? de??t?d?t?s??
- ??aµ???as? ????sµ???? ????
- ???a??t?ta s?s??t?s??
- ??af??et??? s????? ?p???se?? ?d????? se
d?af??et??? µ??t??a ?.?.
38??p???? ???sµ?? ???t???? ?.?.
??a µ??t??? a???t?s?? p????f???a? e??a? ? tet??da
D, Q, F, R(qi, dj) ?p?? 1) - D e??a? ??a
s????? ap? ??????? a?apa?ast?se?? ??a ta ?e?µe?a
t?? s??????? 2) - Q e??a? ??a s????? ap?
??????? a?apa?ast?se?? ??a t?? p????f???a???
a????e? t?? ???st?. ??t?? ?? a?apa?ast?se??
?a????ta? e??t?µata 3) - F e??a? ??a
?p?ßa??? ??a t?? µ??te??p???s? t?? a?apa??stas??
t?? ?e?µ????, t?? e??t?µ?t?? ?a? t?? s??se??
µeta?? t??? - R(qi, dj) e??a? µ?a s????t?s?
?at?ta???, ? ?p??a s??d?e? ??a? p?a?µat???
a???µ? µe ??a e??t?µa qi ? Q ?a? µ?a a?apa??stas?
?e?µ???? dj ? D. ??a t?t??a ?at?ta?? ????e? µ?a
d??ta?? p??? sta ?e?µe?a p??ta µe ß?s? t?
e??t?µa. qi.
39???t??a ?.?.
40???t??a ?.?.
- ?? ???t??? ?.?., ? ?????? ??? t?? ?e?µ???? ?a? ?
d?e??as?a a???t?s?? ap?te???? d?a???t?? ??e?? t??
s?st?µat??.