Basics: Task Definition - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Basics: Task Definition

Description:

Red or white wine related to heart ... 25 million web pages. Largest collection that ... Ultimate goal of web search engine: Make user happy. Factors include: ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 49

Provided by: rong7

Category:

more less

Transcript and Presenter's Notes

Title: Basics: Task Definition

1
BasicsTask Definition Evaluation,
Characteristics of Texts

Rong Jin

2
Outline

Task definition
Tasks, types of systems, terminology
Evaluation
Issues, test collections, metrics
Statistical properties of text
Zipfs Law

3
Terminology

Document
An information object with unknown structure
Types of documents Text (default), hypertext,
multimedia
Document Text Collection Database
Corpus
Examples Document database, text collection,
corpus
An unordered set of documents
Corpora
Several text databases

4
Information Needs

Short-term information need (Ad hoc retrieval)
Temporary need, e.g., info about used cars
Information source is relatively static
User pulls information
Application example library search, Web search
Long-term information need (Filtering)
Stable need, e.g., news stories about the war
of Iraq
Information source is dynamic
System pushes information to user
Applications news filter

5
Relevance

Relevance is difficult to define satisfactorily
A relevant document is one judged useful in the
context of a query
Who judges?
What is useful?
Judgment depends on more than document and query
All retrieval models include an implicit
definition of relevance
Satisfiability of a FOL expression
Distance
P (Relevance query, document)

6
Relevance Information Need vs. Query

Information need i
You are looking for information on whether
drinking red wine is more effective at reducing
your risk of heart attacks than white wine.
Query q
Red or white wine related to heart attack
Document d
He then launched into the heart of his speech and
attacked the wine industry lobby for downplaying
the role of red and white wine in drunk driving.
d is relevant to the query q . . .
d is not relevant to the information need i .

7
Formal Formulation

Vocabulary Vw1, w2, , wN of language
Query q q1,,qm, where qi ? V
Document di di1,,dimi, where dij ? V
Collection C d1, , dk
Set of relevant documents R(q) ? C
Generally unknown and user-dependent
Query is a hint on which doc is in R(q)
Task compute R(q), an approximate R(q)

8
Computing R(q)

Strategy 1 Document selection
Classification function f(d,q) ?0,1
Outputs 1 for relevance, 0 for irrelevance
R(q) is determined as a set d?Cf(d,q)1
System must decide if a doc is relevant or not
(absolute relevance)

9
Computing R(q)

Strategy 2 Document ranking
Similarity function f(d,q) ??
Outputs a similarity between document d and query
q
Cut off ?
The minimum similarity for document and query to
be relevant
R(q) is determined as the set d?Cf(d,q)?
System must decide if one doc is more likely to
be relevant than another (relative relevance)

10
Document Selection vs. Ranking
True R(q)
-
-

-
-

-

-
-
-
-
-
-
-
-
-
-
-
11
Which Strategy is Better?
12
Ranking is often preferred

Similarity function is more general than
classification function
Relevance is a subject concept
Factors other than query and document can be
included in the ranking strategy through the cut
off ?
The classifier is unlikely accurate
Ambiguous information needs
Over-constrained query (terms are too specific)
Under-constrained query (terms are too general)
Query is the only evidence for a users
information need

13
Ranking is often preferred

Similarity function is more general than
classification function
Relevance is a subject concept
Factors other than query and document can be
included in the ranking strategy through the cut
off ?
The classifier is unlikely accurate
Ambiguous information needs
Over-constrained query (terms are too specific)
Under-constrained query (terms are too general)
Query is the only evidence for a users
information need

14
Ranking is often preferred

Relevance is a subjective concept
A user can stop browsing anywhere, so the
boundary is controlled by the user
High recall users would view more items
High precision users would view only a few
Theoretical justification Probability Ranking
Principle Robertson 77

15
Probability Ranking PrincipleRobertson 77

As stated by Cooper
Robertson provides two formal justifications
Assumptions Independent relevance and sequential
browsing

If a reference retrieval systems response to
each request is a ranking of the documents in the
collections in order of decreasing probability of
usefulness to the user who submitted the request,
where the probabilities are estimated as
accurately as possible on the basis of whatever
data made available to the system for this
purpose, then the overall effectiveness of the
system to its users will be the best that is
obtainable on the basis of that data.
16
Ad-hoc Retrieval

Search a large collection of documents to find
the ones that satisfy an information need
(relevant documents)
Example Web search systems

17
Ad-hoc Retrieval

Ranked ad-hoc retrieval
Return a set of documents that satisfy the query,
ordered by (presumed) relevance/similarity
Good queries are still important, but large
results not a problem
Less time spent crafting queries
Unranked ad-hoc retrieval
Return an unordered set of documents that satisfy
the query
Usually used only in Boolean systems
It is important to create a good query, so that
the set is small
But, a small set may not have enough relevant
documents

18
Cross-lingual Retrieval (CLIR)

Query in one language (e.g., English)
Return documents in other languages (e.g.,
Chinese)
Sometimes called translingual/cross-language
retrieval

19
Distributed Retrieval

Ad-hoc retrieval in an environment with many
text databases
More complicated than centralized ad-hoc
retrieval
Database selection
Merging results from different databases

20
Test Collections

Retrieval performance is compared using a test
collection
Set of documents, set of queries, set of
relevance judgments
To compare two techniques
Each technique is used to evaluate queries
Results (set or ranked list) compared using some
metric
Usually use multiple measures, to get different
perspectives
Usually test with multiple test collections,
because performance is collection dependent to
some extent

21
Sample Test Collections
22
Test Collection I Cranfield

First testbed allowing precise quantitative
(1950)
Measures of information retrieval effectiveness
1398 abstracts of aerodynamics journal articles
a set of 225 queries
exhaustive relevance judgments of all
query-document-pairs
Too small, too untypical for serious IR
evaluation today

23
Test Collection II TREC

TREC Text Retrieval Conference (TREC),
organized by the U.S. National Institute of
Standards and Technology (NIST)
TREC Ad-hoc
1.89 million documents, mainly newswire articles
450 information needs
Relevance judgments are available only for the
documents that were among the top k returned by
the systems which entered in the TREC evaluation

24
Test Collection III Others

GOV2
Another TREC/NIST collection
25 million web pages
Largest collection that is easily available
But still 3 orders of magnitude smaller than what
Google/Yahoo/MSN index
NTCIR
East Asian language and cross-language
information retrieval
Cross Language Evaluation Forum (CLEF)
This evaluation series has concentrated on
European languages and cross-language information
retrieval.
Many others

25
Finding Relevant Documents