Title: Keith van Rijsbergen
1(No Transcript)
2Landmarks in Information Retrieval
the message out of the bottle
Keith van Rijsbergen Tampere 12th August, 2002
3Introductory Remarks
- Exclusions IE, TM, ..
- Commercial successes and failures
- Caveats
- Why we have survived.
- Where we were, where we are, where we are going.
4Pre-history
Smee (1850) Wells (1936) Bush (1945) Bagley
(1951) MIT Fairthorne (1945-52)
RAE Luhn (1958) Mooers (1952)
5Experimental Methodology
Cleverdon Cranfield Lancaster Medlars Keen Cra
nfield/Smart Saracevic CWRU Salton Smart Sparck
Jones Ideal Test Collection Blair
Maron Stairs Harman TREC
6Evaluation
ABNO/OBNA (Fairthorne) Precision, Recall
-gt trade-off (Cleverdon) Probabilistic versions
(Swets) Measure-theoretic (Bollman)
7the world in 1980 according to Belver Griffith
Who is missing?
8Landmarks
Luhns tf weighting Architecture Relevance
Feedback Stemming Poisson Model -gt
BM25 Statistical weighting tfidf Various models
9Luhns curve
10Fictive Objects
Information Problem
Representation
Representation
Indexed Objects
Query
Compare
What about evaluation?
11Architecture (Brenda Gerrie, 1983)
12Time I (highlights for me)
13Time II
14dimensions
Representation
a priori
a posteriori
Logical
Statistical
Language Models
15Probabilistic Retrieval
Maron and Kuhns Miller (following
Goffman) SER/KSJ Croft
16Vector Space Model
Salton Murray Rocchio
17Logical Model
For
Mooers/Faithorne 1960 Hillman 1965 Cooper/Ma
ron 1970 CvR 1986 Nie/Amati/Bruza/Huiber
s 1990
Against
Bar-Hillel 1950 Kasher 1966
18Buried Treasure
Dependence e.g C.T Yu Unified Probabilistic
Model Maron/Cooper/SER Co-relevance Ivie Stocha
stic Processes Mandelbrot/Herdan Brouwerian
Logics Hillman Error Analysis Hughes/Cover/Dud
a
19Hypotheses/Principles
Items may be associated without apparent meaning
but exploiting their association may help
retrieval
P R trade-off ABNO/OBNA Exhaustivity/Specifici
ty Cluster Hypothesis Association
Hypothesis Probability Ranking Principle Logical
Uncertainty Principle ASK Polyrepresentation
20Postulates of Impotence(according to Swanson,
1988)
- An information need cannot be expressed
independent of context - It is impossible to instruct a machine to
translate a request into adequate search terms - A documents relevance depends on other seen
documents - It is never possible to verify whether all
relevant documents have been found - Machines cannot recognise meaning -gt cant beat
human indexing etc
21.more postulates
- Word-occurrence statistics can neither represent
meaning nor substitute for it - The ability of an IR system to support an
iterative process cannot be evaluated in terms of
single-iteration human relevance judgment - You can have either subtle relevance judgments or
highly effective mechanised procedures, but not
both - Thus, consistently effective fully automatic in
dexing and retrieval is not possible
22Conclusions
?
23Matching
Co-ordination is positively correlated with
external relevance Jackson, 1969 Association
Hypothesis The larger the number of matching
descriptive items, for a request and document,
the more likely the document is to be relevant to
the request Sparck Jones, 1971- Relevance
Hypothesis
24Inference
It is a common fallacy, underwritten at this date
by the investment of several million dollars in a
variety of retrieval hardware, that the algebra
of Boole (1847) is the appropriate formalism for
retrieval design..The logic of Brouwer, as
invoked by Fairthorne, is one such weakening of
the postulate system, Mooers, 1961
Another one Logical Uncertainty Principle CvR,
1986
25Classification
Co-occurrence of terms as a basis for grouping
makes for good swops i.e. permits substitutions
which retrieve relevant rather than irrelevant
documents. Sparck Jones, 1971. Classification
Hypothesis
If an index term is good at discriminating
relevant from non-relevant document then any
closely associated index term is also likely to
be good at this. CvR, 1979 Association
Hypothesis
Closely associated documents tend to be relevant
to the same requests CvR, 1971 - Cluster
Hypothesis
26Models
Vector Space/LSI Probabilistic Logical
27Query Language
Artificial/Natural Multilingual/cross-lingual im
ages none at all
28Query Definition
Complete/Incomplete Independence/Dependence Weig
hted/Unweighted Query Expansion/one shot
(feedback, web) Sense disambiguation Cross-lingu
al
29Query Dependence
Relevance Feedback
Query Expansion
Ostensive Retrieval
Context
30Items wanted
Relevance
ASK Anomolous State of Knowledge
Situated Relevance
31Error response
Precision and Recall
32Logic
standard/non-standard probabilistic
logic information flow/logic
33Representation
Discrimination/Representation
Specificity/Exhaustivity
34Language Models
NLP
Montague Semantics
Stochastic
35(No Transcript)