Title: Discussion Class 4
1Discussion Class 4
2Discussion Classes
Format Questions. Ask a member of the class to
answer. Provide opportunity for others to
comment. When answering Stand up. Give your
name. Make sure that the TA hears it. Speak
clearly so that all the class can
hear. Suggestions Do not be shy at presenting
partial answers. Differing viewpoints are
welcome.
3Question 1 Objectives
The TREC workshop series has four
goals (a) Encourage research in text based
retrieval based on large test collections (b) Comm
unication among industry, academia and
government (c) Transfer of technology from
research labs into products by demonstrating
methodologies on real-world problems (d) Increase
availability of appropriate evaluation
techniques What does the ad hoc task contribute
to each of these goals?
4Question 2 The TREC Corpus
Source Size Docs Median (Mbytes) words/doc W
all Street Journal, 87-89 267 98,732 245 Associate
d Press newswire, 89 254 84,678 446 Computer
Selects articles 242 75,180 200 Federal Register,
89 260 25,960 391 abstracts of DOE
publications 184 226,087 111 Wall Street Journal,
90-92 242 74,520 301 Associated Press newswire,
88 237 79,919 438 Computer Selects
articles 175 56,920 182 Federal Register,
88 209 19,860 396
5Question 2 The TREC Corpus
- What characteristics of this data are likely to
impact the results of experiments? - Explain the statement, "Disks 1-5 were used as
training data." - Suppose that you were designing two search
engines (i) for use with a library catalog, (ii)
for use with a Web search service. How does your
data differ from the TREC corpus?
6Question 3 TREC Topic Statement
ltnumgt Number 409 lttitlegt legal, Pan Am,
103 ltdescgt Description What legal actions have
resulted from the destruction of Pan Am Flight
103 over Lockerbie, Scotland, on December 21,
1988? ltnarrgt Narrative Documents describing any
charges, claims, or fines presented to or imposed
by any court or tribunal are relevant, but
documents that discuss charges made in diplomatic
jousting are not relevant.
A sample TREC topic statement
7Question 3 TREC Topic Statement
(a) What is the relationship between TREC topic
statements and queries? (b) Distinguish between
manual and automatic methods of query
generation. (c) Explain the process used by the
manual methods. (d) Some of the results used a
time limit (e.g., "limited to no more than 10
minutes clock time"). What was being timed?
8Question 4 Relevance Assessments
(a) Explain the statement, "All TRECs have used
the pooling method to assemble the relevance
assessments." (b) How is relevance
assessed? (c) What is the impact of some relevant
documents being missed from the pool? (d) What is
the problem of some relevant documents in the
pool coming from only a single run? How serious
is this?
9Question 5 Evaluation
10Question 5
- What are
- (a) The recall-precision curve?
- The mean (non-interpolated) average precision?
- The report commented that, "two topics are
fundamental to - effective retrieval performance." What are they?
- How do the automatic tests differ from the manual?
11Question 6 The future
(a) Why was TREC-8 the last year for the ad hoc
task? (b) Does this mean that text-based
information retrieval is now solved?