Title: ????t?s? ?????f?????
1????t?s? ?????f?????
- ???st?? ?apa?e?d????
- ?µ?µa ???e????µ?a? - ??ß?????????µ?a?
- ????? ?a?ep?st?µ??
2????t?s? ?????f???a?
- ?OS e?f?????µe a????e? p????f???s?? (queries)
- ?OS e?t?p????µe ?a? a?a?t??µe p????f???e? p??
??a??p????? t?? a????e? - ?OS a????????µe ta ap?te??sµata t?? a?a??t?s??
3Information Retrieval System
Input
Document classification Processor Search strategy
Documents
Output
queries
feedback
4Information retrieval process
ranked docs
User Interface
user feedback
user need
Text Operations (tokenization, stopwords,
stemming, etc.)
text
DB manager
Indexing
Query operations
query
Searching
Docs database
Index
Retrieved docs
Ranking
5??a?efa?a??s?
IR lt D, Q, F, R(qi, dj)gt
D documents Q queries F p?a?s?? a?apa??stas??
?e?µ???? R s???fe?a query qi µe ?e?µe?? dj
a???µ?? ? 0-1
6?at?????e? Documents
- ??µ?µ??a (structured)
- e???af??, ped?a (??se?? ?ed?µ????)
- ?????? ad?µ?ta
- e?e??e?? ?e?µe??
- ???epe?e??as?a (pre-processing)
- Metadata
- Stemming
7?a?e???µe?a
- Document identifier
- ?a????µ??? ped??
- ???e?? - f??se?? ??e?d?? (keywords)
- ?e?????? (abstract)
- ??a????? (extraction) - e?t?? s????af?a
- ??as??p?se?? (reviews)- e?t?? s????af?a
8??p???? ???sµ?? document
- ?e??????? V, e?e???µe?? (controlled) ? ???
- ???? wi,
- document a
- s????t?ta ???? wi st? a
9Boolean model (1/2)
- ?as?sµ??? st? ?e???a s??????
- ?? ???? t?? query s??d???ta? µe t??? te?est??
AND, OR, NOT - ?a??de??µa
- Query restaurants AND (Mideastern OR vegeterian)
AND inexpensive - ?p??t?s? ?e?µe?a p?? pe???aµß????? t?? ???e??
restaurants, Mideastern, inexpensive ? t?? ???e??
restaurants, vegeterian, inexpensive - ?? query µeta???feta? se Disjunctive normal form
(s???st?sa e??a? ?p?? ?p???e? a????? t?µ? st??
p??a?a a???e?a?)
10Disjunctive normal form
ka kb Kc Kc ka ?kb ka ? Kc (ka ?kb )?(ka ? Kc)
1 1 1 0 1 0 1
1 1 0 1 1 1 1
1 0 1 0 0 0 0
1 0 0 1 0 1 1
0 1 1 0 0 0 0
0 1 0 1 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
q ka ?(kb ? Kc) ? DNF (ka ?kb )?(ka ? Kc)
?(1,1,1)?(1,1,0)?(1,0,0)
11Boolean model (2/2)
- Similarity query - documents ?ta? t??????st??
µ?a s???st?sa t?? query Disjunctive normal form
ta?t??eta? µe ??a document - ??µ?? similarity 0 ? 1
- ?a??de??µa
- q ka ?(kb ? Kc) ? (1,1,1)?(1,1,0)?(1,0,0)
- d (0,1,0)
- Similarity 0 (a? ?a? ?p???e? st? document ?
???? kb).
12?e???e?t?µata Boolean model
- ?e? ????eta? ? ???a ???e ????
- Se??? eµf???s?? t?? ???e ????
- ? pa?????ta? NOT
- ??s????a s??ta??? boolean expressions
- Data retrieval ?a? ??? information retrieval
- ?e? p??ß??pe?
- Ranking
- Partial match
- ?p?st??fe? e?te p??? ???a e?te p??a p????
13Vector Model (1/2)
- Similarity S???µ?t??? ????a? d?? documents dk,
dj - ?p??????eta? ap? t? es?te???? ????µe?? t??
d?a??sµ?t?? t?? documents
14Vector model (2/2)
- ?p????? µe ?at?f?? (threshold) st? ßa?µ?
?µ???t?ta? (similarity) - ???ß??µa ? µ?t??s? t?? s????t?t?? t?? ????
- a?t?p??s?pe?t??? µ?t??s? t?? ßa??t?ta? t?? ????
15??e??e?t?µata - µe???e?t?µata
- ??e??e?t?µata
- ?at?ta?? a???????s? ?e?µ???? µe ß?s? t??? ?????
t??? - Partial matching
- ?a??te?? ap?d?s?
- ?e?????t?µata
- Te???s? a?e?a?t?s?a? t?? index terms
- ?? ???? p?? ?e?p???
16TFIDF Model
- N documents, ni p????? documents µe t?? ??? ki,
freqij ? s????t?ta t?? ???? st? document dj. - Term frequency
- Inverse document frequency
- Term weighting tf idf
17??e??e?t?µata µe???e?t?µata
- ?a??te?? ap?d?s?
- ???s????s? partial matching
- Ranking µe ß?s? t?? t?µ? t?? s???µ?t????
- ?e?????t?µa ?e???s? ?t? ?? index terms e??a?
a?e???t?t??
18? ?????a t?? ?µad?p???s?? ?e?µ???? (document
clustering)
- S?????? C ap? ?e?µe?a (index terms)
- Query S????? ? ap? index terms
- ???a ?e?µe?a a?????? st? ?
- ???a ?a?a?t???st??? st???e?a pe?????f??? t???
????? t?? ? (intra cluster similarity, tf) - ???a ?a?a?t???st??? st???e?a d?a??????? ta µ???
t?? ? (inter-cluster similarity, idf)
19Clustering
- ?ed?µ???? µ?a? s??????? te?µ?????, ?a
d?µ???????e? µ?a ?e?a????? ?µad?p???s? (taxonomy)
ßas?sµ??? se ??a µ?t?? s???fe?? t??? (similarity)
(p.?. Yahoo) - ??t?a s???fe?a? te?µ?????
- ??apa??stas? µe ß?s? TFIDF
- ????e?d?e? ap?st?se?? µeta?? te?µ?????
- S???µ?t??? t?? ????a? t?? te?µ?????
- ???ß??µata
- T???ß?? µe????? a???µ?? ????st?? ????
- ? ?????a t?? ????ß?? e?a?t?ta? ap? t? s??????
20Top-down clustering
- k-Means Repeat
- Choose k arbitrary centroids
- Assign each document to nearest centroid
- Recompute centroids
21Bottom-up clustering
- Initially G is a collection of singleton groups,
each with one document - Repeat
- Find ?, ? in G with max s(???)
- Merge group ? with group ?
- For each ? keep track of best ?
22Probabilistic Model (1/2)
- ?p???e? ?p?s????? R, t?? s?et???? ?e?µ???? µe t?
query - ? ???st?? ?p?de????e? ta s?et??? ?e?µe?a
- ??t?µ?s? p??a??t?ta? t? document ?a e??a? st??
ep?????? t?? ???st? - wij ? 0,1 d?ad??? a?apa??stas? t?? ????, ?p??
t? boolean model
23Probabilistic Model (2/2)
- ?a ep?st?ef?µe?a ?e?µe?a ??a??p????? t? query µe
µ?a p??a??t?ta µe?a??te?? ap? ??a ?at?f??
(threshold)
24Extended similarity
- ??? µp??? ?a ft???? t? µ??a?? µ???
- ??a ?a?? s??e??e?? ??a ?a ep?d?????se?? t?
2????? e??a? st? - a?t?????t? and aµ??? s???? apa?t??ta? µa??
(co-occur) - ?e?µe?a µe s?et???? ???e?? e??a? s?et???
- ?as???? p??se???se?? ??a a?a??t?s? ?a?
?µad?p???s? - T?sa???? (WordNet)
- S?s??t?s? ???? p?? apa?t??ta? µa??
auto car car auto
auto car car auto
auto car car auto
car ? auto
auto
?
car
25Latent Semantic Indexing
- St????
- ? ape?????s? t?? p??a?a documents-terms se ??a
µ????te??? d?ast?se?? p??a?a p?? a?t?st???e? se
?????e? (concepts) - ??? ?e???? ?µ???t?ta a??? e???????????
- ?a??µat??? p???p???? µ??t??? ßas?sµ??? st?
??aµµ??? ???eß?a
26Latent Semantic Indexing
Term
Document
d
Documents
A
U
D
V
car
SVD
Terms
t
auto
d
r
27Extended Boolean
- St????
- ?a d?s??µe ß??? st??? ????? t?? Boolean queries
??????? te?est?? (and, or ?.?p.)
28???a µ??t??a
- Te???a s??????
- Fuzzy set
- ???eß????
- Generalised Vector model
- Neural networks
- Te???a ???a??t?t??
- Bayesian networks
- Inference networks
- Belief networks
29????????s? ????t?s??
- Precision
- Relevant answers (Ra)/ Total answers (A)
- Recall
- Relevant answers / Relevant documents
Documents
Ra
A
R
30?a??de??µa
- Se ??a e??t?µa q, ta s?et??? ?e?µe?a e??a? d3,
d5, d9, d25, d39, d44, d56, d71, d89, d123 - ? µ??a?? a?a??t?s?? ep?st?e?e µe se???
s?et???t?ta? ta ?e?µe?a d123, d84, d56, d6, d8,
d9, d511, d129, d187, d25, d38, d48, d250, d113,
d3 - ????ße?a (precision) Ra/A 5/15 33,3
- ??????s? (recall) Ra/R 5/10 50
31?aµp??? a???ße?a? / a?????s??
?p??t?s? ????ße?a ??????s?
1 1/1 1/10
2
3 2/3 2/10
4
5
6 3/6 3/10
7
8
9
10 4/10 4/10
11
12
13
14
15 5/15 5/10
32????????s? a???t?s??
- ??t?a s??d?asµ?? precision recall
- ??s?? a?µ??????
- ???a stat?st???
- ??t?a ßas?sµ??a st? ???st?
- V s?et??? ?e?µe?a, ???st? st? ???st?
- Rk,apa?t?µ??a ?a? ???st? (A?V)
- Ru, apa?t?µ??a ?a? ????sta st? ???st?
- ?????? a?a???? (coverage) ???st? Rk / V
- ?a???t?µ?a (novelty) p?s?st? ???? s?et????
?e?µ???? Ru / (Rk Ru ) - S??????? ?e?µ???? ??a a???????s? µe??d??
- TREC
- ISI
33Query Languages
- ??????? t? ???st? st?? ?p?ß??? e??t?µ?t?? ?a?
st?? ?at?ta?? t?? ap?te?esµ?t?? (?? data
retrieval ???sse? de? ?????? ranking) - ???t?????a ???sse? p?? de? ape??????ta? st?
???st? a??? ???s?µ?p?????ta? ap? s?st?µata (p.?.
??a ?p?ß??? queries se CD-ROM archive ? se
on-line databases, ?39.50, CCL, WAIS) - ???at?t?te? ???ss?? ?p?ß???? e??t?µ?t??
- Keywords (Single words, Context, boolean, natural
language) - Pattern matching (???e??, p????µata, ep???µata,
d?a?e???s? ?a???, d?ast?µata, t?p???? e?f??se??
?a? epe?t?se??) - Structural queries (Forms, hypertext,
hierarchical)
34Keyword based querying (1/2)
- ?p?ß??? ap??? ???e??
- ?p?ß??? f??se?? (a?a??t?s? s?????? ???e?? p??
s??µat????? µ?a f??s?) - ??t??s? e???t?ta? (proximity)
- ?p?ß?????ta? ???e?? ? f??se?? ?a? µ?a µ???st?
ep?t?ept? ap?stas? µeta?? t??? - ???t?µata se f?s??? ???ssa
- e??t?µata ?a? ?e?µe?a µetat??p??ta? se
d?a??sµata ???? µe ß??? ??a ???e ??? - a?a??t?s? ?e?µ???? p?? µ??????? pe??ss?te?? µe
ta e??t?µata - e?a???? a?t?p??s?pe?t???? ???e?? ??e?d??? ap?
ta e??t?µata
35Keyword based querying (2/2)
- Boolean queries ap?te????ta? ap?
- ap?? queries (atoms) p?? a?a?t??? ?e?µe?a
- boolean te?est?? (operators, AND, OR, NOT BUT)
p?? efa?µ????ta? se s????a ?e?µ???? - ????eta? d??t?? e??t?µat?? p?? ta f???a e??a?
sta queries ?a? ?? es?te????? ??µß?? st???
te?est?? - ?a??de??µa µet?f?as? AND (s??ta?? OR s??ta?t???)
AND
OR
µet?f?as?
s??ta?t???
s??ta??
36??a?????s? p??t?p?? (pattern matching) (1/3)
- ??a??t?s? ?e?t???? p??t?p?? (patterns) µ?sa se
?e?µe?a - ?a p??t?pa s??d?????ta? µeta?? t??? µe boolean
operators ??a t? s??µat?sµ? keyword queries - Substrings
- p.?. any flow ? many flowers
- ??ast?µata (ranges) a?faß?t??? a?a??t?s? ???e??
a??µesa se ??a d??st?µa ap? strings - p.?. a?a??t?s? se ?e????
37??a?????s? p??t?p?? (2/3)
- ???t?µata µe ???? (allowing errors) d??eta?
string t? ?p??? µetaß???eta? ??a ?a ß?e????
pa??µ??e? ???e?? - ?etaß??? e?sa???? d?a??af?, a?t??at?stas?
??aµµ?t?? ?a? ??s?? t??? - Threshold st?? µetaß???? (edit distance) t?
e????st? p????? t?? µetaß???? p?? apa?t???ta? ??a
?a ?????? ?d?a d?? strings. - ??p???? e?f??se?? (regular expressions) strings
? ?? a???????? s??d?asµ?? ap? strings - Concatenation (s??e???) strings (t? ?e) ? t??e
- Union (e?a??a?t??? ???s?) (µese)
- ?pa?????? e??? string e
- p.?. pro (teinblem) (e012) ? protein ?
problem02, e t? ?e?? string
38??a?????s? p??t?p?? (3/3)
- Extended patterns
- Classes of characters s??d?asµ?? s??????
?a?a?t???? a??µesa st?? ??se?? e??? patterns
(p.?. e?sa???? a???µ?? st?? ??se?? e??? pattern) - Wild characters (p.?. t??e ? t??e??as?,
t??e-e?pa?de?s?, t??e-d??s?e?? ?.?p.) - Conditional expressions a?a??t?s? ? ??? e???
µ????? ap? ??a pattern.
39Structural queries (1/3)
- ?p?t??p??? t?? a?a??t?s? s??d?????ta? t?
pe??e??µe?? t?? documents µe t? d?µ? t??? - Forms
- ?a documents e??a? d?µ?µ??a se ped?a p?? de?
ep??a??pt??ta?, ??te e??a? f???asµ??a - ??a??t?s? patterns se s???e???µ??? ped??
40Structural queries (2/3)
- ?pe??e?µe?a
- ?e?µe?a p?? ????? s??d?se?? e?te
- µeta?? t???
- se s???e???µ??a s?µe?a t?? ?e?µ????
- ??a??t???ta? patterns se se??de? ? se ?e?t??????
t??? - ?e?a????? d?µ?
- ??a??t???ta? patterns se s???e???µ??e? d?µ??
- ?? d?µ?? ??d???p?????ta? ap? tags (?p?? st??
HTML), ta ?p??a ?a???????? pe?????? st? ?e?µe?? - ?? pe?????? µp??e? ?a a????????? ? µ?a t?? ????,
?a e??a? ep??a??pt?µe?e?, ? ?a e??a? f???asµ??e? - G??eta? indexing ??? µ??? st??? ????? a??? ?a?
st?? pe??????
41Structural queries (3/3)
- ?a??de??µa ?e?a?????? d?µ?? ?a? e??t?µat??
?ef??a?? 4 4.1 ??sa???? Se a?t? t? ?ef??a?? 4.4
??µ?µ??a e??t?µata
?ef??a??
pa????af??
pa????af??
t?t???
t?t???
s??µa
Query t? s??µa µ?a? pa?a???f?? µe t?t??
d?µ?µ??a
42?e?????? ?e?t??s?? (expansion) Query
- ? ???st?? ?p?de????e? ta s?et??? ?e?µe?a (User
relevance feedback) - ????? a??µ??? t?? ???st?
- ?????f???e? ap? ta ep?st?af??ta ?e?µe?a
(automatic local analysis) - ?????f???e? ap? t? s?????? ?e?µ???? (automatic
global analysis)
43User Relevance Feedback (1/2)
- ? ???st?? a??????e? ta ep?st?af??ta ?e?µe?a
(relevant, non-relevant clusters) - ?p??es? ta s?et??? ?e?µe?a ????? pa??µ?????
????? - St???? ???p?p???s? t?? e??t?µat?? µe s??afe??
????? - ?e??????
- query expansion,
- term reweighting
44User Relevance Feedback (2/2)
- query expansion
- a?a??t?s? e?e???? t?? query vector p?? d?a????e?
?a??te?a ta relevant ap? ta irrelevant documents - ???s?µ?p??e?ta? st? vector model.
- term reweighting
- a?ap??sd????sµ?? t?? s??te?est?? ßa??t?ta? st???
????? t?? query - ???s?µ?p??e?ta? sta vector ?a? probabilistic
models
45?????s? ?p?st?af??t?? (local analysis)
- ??t?µat?? p??sd????sµ?? s?et???? ???? µe t???
????? t?? query - Local clustering
- ?? ???? ?p??e??ta? se stemming
- ?atas?e?? term-document matrix m (s????t?te? t??
???? sta ?e?µe?a) - Term term matrix s mmt, ? ?p???? de???e? ??a
???e ??? t?? ?µ???t?t? t?? µe t??? ?p????p??? (mt
? a??st??f?? p??a?a? t?? m) - G?a ???e ??? t?? query ep????eta? ap? t?? p??a?a
s ? ?µ?da (cluster) µe t??? p?? s?et????? ?????
46Local Context Analysis
- ?as??eta? st? ???s? ?µ?d?? ???? (a?t? ??a ap??
keywords) ap? ta s?et???te?a ?e?µe?a p??
ep?st??f??ta? (top ranked documents) - ?? ?µ?de? t?? ???? a?t?st?????? se ?????e?
(concepts) ?a? ???s?µ?p?????ta? ??a ße?t??s? t??
query - ?a top ranked documents sp????? se passages d??.
se ??µµ?t?a ?e?µ???? sta?e??? µ????? (p.?. 300
???e??)
47Local Context Analysis (a??????µ??)
- ????t?s? t?? ? top ranked documents p?? ?a
ep?st?af??? ap? t?? e?t??es? t?? query - ????sµ?? t?? top ranked documents se passages
- ???sd????sµ?? t?? ?µ?d?? t?? ???? (concepts)
e??????ta? t? s???pa??? t??? µe t??? ????? t??
query - ???sd????sµ?? t?? s?et???t?ta? similarity(concept,
query) µe µ???d? pa??µ??a t?? tfidf ?a?
?at?ta?? t?? e?????? s?µf??a µe t?? ?µ???t?t?
t??? µe ??? t? query - To query epe?te??eta? µe ta m top ranked concepts
µe ßa??t?ta 1-0.9(i/m), I ? ??s? t?? concept
st?? ?at?ta?? t?? concepts
48Automatic Global Analysis
- Similarity thesaurus
- Index terms, similarity query-index term
- Statistical Thesaurus
- ?µad?p???s? ?e?µ???? µe ???t???? t? similarity
- ?p????? t?? ???? ??a ???e ?µ?da
49Similarity thesaurus
- Index terms concepts
- ??a??te?ta? ? s??s? (similarity) µeta?? t?? index
terms µ?a? s??????? ?e?µ???? - Inverse term frequency itfj log(t/tj), t t?
p????? t?? ???? t?? s???????, tj t? p????? t??
???? e??? ?e?µ???? dj - ?e ß?s? ta itf ?p????????ta? ?? ßa??t?te? t??
???e ???? se ???e ?e?µe?? - ?atas?e?? term document p??a?a µe t?µ?? t??
ßa??t?te? t?? ???? sta ?e?µe?a - ?p?????sµ?? similarity t?? ???? sta ?e?µe?a t?
es?te???? ????µe?? t?? ??aµµ?? t?? pa?ap???
p??a?a - ?atas?e?? similarity thesaurus ??a? term-term
p??a?a? µe t?µ?? ta similarities
50Query Expansion µe Similarity thesaurus
- ???sd???????ta? ?? ßa??t?te? t?? ???? t?? query,
µe t?? ?d?? t??p? p?? ?p????????ta? ?? ßa??t?te?
t?? ???? st? ??sa??? - ?p?????sµ?? t?? similarity t?? ???? t?? ??sa????
kv µe t? query sim(q, kv) - ?p??????ta? ??a ep??tas? ?? top r ranked terms,
s?µf??a µe t? sim(q, kv)
51Statistical Thesaurus
- ? ??sa???? apa?t??eta? ap? ???se?? s?et???? ????
ap? t? s?????? ?e?µ???? - ?pa?t?s? ?? ???se?? ?a ????? µe???? ßa?µ?
d?af??et???t?ta? (d?a???t???t?ta?), ?ts? ?ste ?a
d?a??????ta? e????a - ??t? ? ?d??t?ta e?asfa???eta? ap? ????? µe ?aµ???
s????t?ta eµf???s?? d??. p??? e?d????? ????? - ????d??
- ?µad?p???s? (clustering) ?e?µ????
- ap? t?? ?µ?de? ?e?µ???? ep??????ta? ?? ???? µe
?aµ??? s????t?ta ??a t?? ?a????sµ? t?? ???se??
t?? ??sa????
52?µad?p???s? ?e?µ???? complete link algorithm
- ?????? t?p??ete?ta? ???e ?e?µe?? se ??a ?e????st?
cluster - ?p?????sµ?? s?et???t?ta? ??a ???e ?e????? ap?
clusters µe t? ???s? t?? vector model ?a? t?
µ???d? t?? s???µ?t???? - S?????e?s? t?? ?e??a???? t?? clusters µe t?
µe?a??te?? ?µ???t?ta . ?? ??? cluster p??
s??µat??eta? ??e? ??a similarity value ?s? µe t?
similarity t?? clusters p?? s?????e????a? - ?pa?????? t?? pa?ap??? d?? ß?µ?t?? µ???? ?a µ??
?p?????? clusters ??a s?????e?s? - ?p?t??esµa t?? d?ad??as?a? e??a? µ?a ?e?a???a ap?
?µ?de? ?e?µ????
53?p????? ???? ??sa????
- ?p????? t?? clusters p?? ?a ???s?µ?p???????
- ep??????ta? ?? clusters µe similarity values
µe?a??te?e? ap? ??a threshold - ?p????? t?? ?e?µ???? p?? ?a ??f???? ?p ??? ap?
t??? ep??e????te? clusters - ???s? threshold ??a t? p????? t?? ?e?µ???? t??
clusters p?? ?a epe?e??ast??? - ?p????? t?? ???? ??sa????
- ??a ???e ??? ap? ta ep??e????ta ?e?µe?a
?p??????eta? t? Minimum inverse document
frequency (MIDF) - ep??????ta? ?? ???? µe t?µ?? MIDF µ????te?e? ap?
??a threshold
54Query Expansion µe Statistical thesaurus
- ?p?????sµ?? t?? ßa??t?ta? ???e ???? p?? a???e? se
µ?a ???s? - ?e ß?s? t?? pa?ap??? ?p?????sµ? ?p??????eta? ?
ßa??t?ta t?? ???e ???s?? - ?p????? t?? ???s?? ???? p?? ?a ???s?µ?p???????
??a query expansion
55Web searching
- Search Engines
- Web directories
- Hyperlink structure exploitation
56???ß??µata
- ?ata?eµ?µ??a ded?µ??a
- ?sta?? ded?µ??a
- ?e????? ????? ded?µ????
- ????t?ta
- ???µ????e?? ded?µ??a - p???µ?sa
57Search Engines
- ??af??? ap? IR
- ?e? ???eta? a?a??t?s? se ?e?µe?a a??? se
(?e?t????) index - Indexing inverted files
Query Engine
User
Interface
Indexer
Crawler
Web
58Query Engine
- Boolean, proximity, stemming, stop words
- Alta-vista ???s? se??d?? p?? pe???aµß?????
t??????st?? ??a ??? - Hot Bot ???s? se??d?? p?? pe???aµß????? ????S
t??? ????? - ?????? search engines µe t? ?d?? query engine
(Magellan, Excite)
59Ranking
- Tf-idf model
- Boolean spread ?p??tas? t?? boolean model
- ??s?? ???? a?????? se ???e se??da t?? ap??t?s??
- ??s?? ???? a?????? se ???e se??da st?? ?p??a
?p?????? links ap? t?? se??de? t?? ap??t?s?? - Vector spread ?p??tas? t?? vector model
?p????????ta? t? similarity ?a? µe t?? se??de?
st?? ?p??e? ?p?????? links ap? t?? se??de? t??
ap??t?s?? - Most cited µ??? ???? t?? se??d?? p?? ????? links
st?? se??de? t?? ap??t?s?? - Web query p?s? s??dedeµ??e? e??a? ?? web pages
- HITS hubs authorities
60Crawling
- St???? e??µ???s? ?e?t????? ?ata????? ??a ??e?
se??de? - ?e???d??? e??µ???s? (µ???? 2 µ??e?), e??µ???s?
?a? ap? d?a?e???st?? t?? se??d?? - ?e??????
- Depth first ep??tas? se ß???? e??? link
- Breadth first ??a ta links µ?a? se??da?
61Harvesting
- Distributed architecture
- ??e??e?t?µata (µe???e?t?µata crawlers)
- ?p?ß????s? ?e?t?????a? web servers
- ?????f???a?? p??ß??µa a???t?s? se??d??, t?
pe??e??µe?? pet??eta? - ??e?a?t?s?a engines ????? s??t???sµ?
- ?e???e?t?µata apa?t?s? p????? servers
62????te?t?????
User
63Brokers - Gatherers
- Gatherers s?????? p????f????? se ta?t? ???????
d?ast?µata - ?????f???e? se p?????? brokers
- ????e? ??a ??a server (no traffic)
- Brokers User Interface ?a? indexing ap?
gatherers ?a? brokers - Teµat???? ?a? ?e?t????? brokers
- S??e??as?a brokers (filtering)
64Replicator - Cache
- Replicator
- ??t???af? brokers a?????a µe t? ??t?s? ?a? t?
µ??e??? - ????es? gatherers se brokers
- Object cache
- µe??s? f??t?? server, ?????f???a?
65Web directories - Browsing
- ?e?a?????? ta????µ?se?? a????p???? ???s?? (Yahoo)
- ??e????t?µa ????ße?a a???t?s??
- ?e?????t?µa ta????µ?s?
- S??d?asµ?? searching browsing (WebGlimpse,
ep?t??pe? a?a??t?s? st? site page indexing) - Meta-searchers
66Hyperlink searching
- Web Query Languages
- S??d?asµ?? content µe link structure
- Software Agents
- ??????? sta Web pages a????????ta? ta links
- Heuristics ??a ep????? p??te?a??t?ta? se??d??
67Recommendation systems
- Social recommendation ? collaborative filtering
- Relevance feedback by many users for information
ranking - ????d??
- k-nearest neighbors (case-based reasoning)
- ?fa?µ????
- User actions prediction
- User profile learning
- Links evaluation recommendation (Letizia,
Syskil Webert)
68OPACs
- 1? ?e?e?
- µ?????, non-stadard e???af??
- a?a??t?s? µe t?t??, s????af?a
- 2? ?e?e?
- a?a??t?s? µe ?eµat???? ?efa??de?, ???e?? -
??e?d?? - 3? ?e?e?
- ?e??????? a?a??t?s??,
- a???µ???? p????f???a? e???af??,
- GUI, Z39.50, metadata