Basics: Task Definition - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Basics: Task Definition

Description:

Red or white wine related to heart ... 25 million web pages. Largest collection that ... Ultimate goal of web search engine: Make user happy. Factors include: ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 49
Provided by: rong7
Category:

less

Transcript and Presenter's Notes

Title: Basics: Task Definition


1
BasicsTask Definition Evaluation,
Characteristics of Texts
  • Rong Jin

2
Outline
  • Task definition
  • Tasks, types of systems, terminology
  • Evaluation
  • Issues, test collections, metrics
  • Statistical properties of text
  • Zipfs Law

3
Terminology
  • Document
  • An information object with unknown structure
  • Types of documents Text (default), hypertext,
    multimedia
  • Document Text Collection Database
    Corpus
  • Examples Document database, text collection,
    corpus
  • An unordered set of documents
  • Corpora
  • Several text databases

4
Information Needs
  • Short-term information need (Ad hoc retrieval)
  • Temporary need, e.g., info about used cars
  • Information source is relatively static
  • User pulls information
  • Application example library search, Web search
  • Long-term information need (Filtering)
  • Stable need, e.g., news stories about the war
    of Iraq
  • Information source is dynamic
  • System pushes information to user
  • Applications news filter

5
Relevance
  • Relevance is difficult to define satisfactorily
  • A relevant document is one judged useful in the
    context of a query
  • Who judges?
  • What is useful?
  • Judgment depends on more than document and query
  • All retrieval models include an implicit
    definition of relevance
  • Satisfiability of a FOL expression
  • Distance
  • P (Relevance query, document)

6
Relevance Information Need vs. Query
  • Information need i
  • You are looking for information on whether
    drinking red wine is more effective at reducing
    your risk of heart attacks than white wine.
  • Query q
  • Red or white wine related to heart attack
  • Document d
  • He then launched into the heart of his speech and
    attacked the wine industry lobby for downplaying
    the role of red and white wine in drunk driving.
  • d is relevant to the query q . . .
  • d is not relevant to the information need i .

7
Formal Formulation
  • Vocabulary Vw1, w2, , wN of language
  • Query q q1,,qm, where qi ? V
  • Document di di1,,dimi, where dij ? V
  • Collection C d1, , dk
  • Set of relevant documents R(q) ? C
  • Generally unknown and user-dependent
  • Query is a hint on which doc is in R(q)
  • Task compute R(q), an approximate R(q)

8
Computing R(q)
  • Strategy 1 Document selection
  • Classification function f(d,q) ?0,1
  • Outputs 1 for relevance, 0 for irrelevance
  • R(q) is determined as a set d?Cf(d,q)1
  • System must decide if a doc is relevant or not
    (absolute relevance)

9
Computing R(q)
  • Strategy 2 Document ranking
  • Similarity function f(d,q) ??
  • Outputs a similarity between document d and query
    q
  • Cut off ?
  • The minimum similarity for document and query to
    be relevant
  • R(q) is determined as the set d?Cf(d,q)?
  • System must decide if one doc is more likely to
    be relevant than another (relative relevance)

10
Document Selection vs. Ranking
True R(q)
-
-


-
-


-

-
-
-
-
-
-
-
-
-
-
-
11
Which Strategy is Better?
12
Ranking is often preferred
  • Similarity function is more general than
    classification function
  • Relevance is a subject concept
  • Factors other than query and document can be
    included in the ranking strategy through the cut
    off ?
  • The classifier is unlikely accurate
  • Ambiguous information needs
  • Over-constrained query (terms are too specific)
  • Under-constrained query (terms are too general)
  • Query is the only evidence for a users
    information need

13
Ranking is often preferred
  • Similarity function is more general than
    classification function
  • Relevance is a subject concept
  • Factors other than query and document can be
    included in the ranking strategy through the cut
    off ?
  • The classifier is unlikely accurate
  • Ambiguous information needs
  • Over-constrained query (terms are too specific)
  • Under-constrained query (terms are too general)
  • Query is the only evidence for a users
    information need

14
Ranking is often preferred
  • Relevance is a subjective concept
  • A user can stop browsing anywhere, so the
    boundary is controlled by the user
  • High recall users would view more items
  • High precision users would view only a few
  • Theoretical justification Probability Ranking
    Principle Robertson 77

15
Probability Ranking PrincipleRobertson 77
  • As stated by Cooper
  • Robertson provides two formal justifications
  • Assumptions Independent relevance and sequential
    browsing

If a reference retrieval systems response to
each request is a ranking of the documents in the
collections in order of decreasing probability of
usefulness to the user who submitted the request,
where the probabilities are estimated as
accurately as possible on the basis of whatever
data made available to the system for this
purpose, then the overall effectiveness of the
system to its users will be the best that is
obtainable on the basis of that data.
16
Ad-hoc Retrieval
  • Search a large collection of documents to find
    the ones that satisfy an information need
    (relevant documents)
  • Example Web search systems

17
Ad-hoc Retrieval
  • Ranked ad-hoc retrieval
  • Return a set of documents that satisfy the query,
    ordered by (presumed) relevance/similarity
  • Good queries are still important, but large
    results not a problem
  • Less time spent crafting queries
  • Unranked ad-hoc retrieval
  • Return an unordered set of documents that satisfy
    the query
  • Usually used only in Boolean systems
  • It is important to create a good query, so that
    the set is small
  • But, a small set may not have enough relevant
    documents

18
Cross-lingual Retrieval (CLIR)
  • Query in one language (e.g., English)
  • Return documents in other languages (e.g.,
    Chinese)
  • Sometimes called translingual/cross-language
    retrieval

19
Distributed Retrieval
  • Ad-hoc retrieval in an environment with many
    text databases
  • More complicated than centralized ad-hoc
    retrieval
  • Database selection
  • Merging results from different databases

20
Test Collections
  • Retrieval performance is compared using a test
    collection
  • Set of documents, set of queries, set of
    relevance judgments
  • To compare two techniques
  • Each technique is used to evaluate queries
  • Results (set or ranked list) compared using some
    metric
  • Usually use multiple measures, to get different
    perspectives
  • Usually test with multiple test collections,
    because performance is collection dependent to
    some extent

21
Sample Test Collections
22
Test Collection I Cranfield
  • First testbed allowing precise quantitative
    (1950)
  • Measures of information retrieval effectiveness
  • 1398 abstracts of aerodynamics journal articles
  • a set of 225 queries
  • exhaustive relevance judgments of all
    query-document-pairs
  • Too small, too untypical for serious IR
    evaluation today

23
Test Collection II TREC
  • TREC Text Retrieval Conference (TREC),
    organized by the U.S. National Institute of
    Standards and Technology (NIST)
  • TREC Ad-hoc
  • 1.89 million documents, mainly newswire articles
  • 450 information needs
  • Relevance judgments are available only for the
    documents that were among the top k returned by
    the systems which entered in the TREC evaluation

24
Test Collection III Others
  • GOV2
  • Another TREC/NIST collection
  • 25 million web pages
  • Largest collection that is easily available
  • But still 3 orders of magnitude smaller than what
    Google/Yahoo/MSN index
  • NTCIR
  • East Asian language and cross-language
    information retrieval
  • Cross Language Evaluation Forum (CLEF)
  • This evaluation series has concentrated on
    European languages and cross-language information
    retrieval.
  • Many others

25
Finding Relevant Documents
  • Two factors make finding relevant documents
    difficult
  • Given a large collection, it is impossible to
    judge every document for a query
  • Relevance judgment is subjective
  • How to solve this problem?

26
Finding Relevant Documents
  • Pooling strategy

Query
1,000,000 docs
27
Finding Relevant Documents
  • Pooling strategy
  • Retrieve documents using several techniques
  • Judge top K documents for each technique
  • Relevant set is union of relevant documents from
    each technique
  • Relevant set is a subset of the true relevant set
  • Problem incomplete set of relevant documents for
    a given query

28
Finding Relevant Documents
  • Relevance judgment is subjective
  • Disagreement among assessors

29
Finding Relevant Documents
  • Judges disagree a lot. How to combine judgments
    from multiple reviewers ?
  • Average
  • Union
  • Intersection
  • Majority vote

30
Finding Relevant Documents
  • Large impact on absolute performance numbers
  • Virtually no impact on ranking of systems

31
Evaluation Criteria
  • Effectiveness
  • Precision, Recall
  • Efficiency
  • Space and time complexity
  • Usability
  • How useful for real users?

32
Evaluation Criteria
  • Effectiveness (Ad-hoc Task)
  • Precision, Recall
  • Efficiency
  • Space and time complexity
  • Usability (Interactive Task)
  • How useful for real users?

33
Evaluation Metrics
34
Precision and Recall Curve
  • Evaluate the precision at every retr. document
  • Plot a precision recall curve

35
Precision and Recall Curve
  • Evaluate the precision at every retr. document
  • Plot a precision recall curve

recall
precision
36
Precision and Recall Curve
  • Interpolation take maximum of all the future
    points
  • Why?

recall
precision
37
Precision and Recall Curve
  • Interpolation take maximum of all the future
    points
  • Why? users are willing read more if precision and
    recall are getting better

38
Precision and Recall Curve
  • How to compare to precision recall curves?

System 2
System 4
System 1
System 3
39
11-point Interpolated Average Prec.
Avg. Prec. 0.425
40
Multiple Evaluation Criteria
  • To obtain a comprehensive view of IR performance,
    it is often necessary to examine multiple
    criteria

41
Evaluating Web Search Engines
  • Ultimate goal of web search engine
  • Make user happy
  • Factors include
  • Speed of response
  • Numbers of web pages being indexed
  • User interface
  • Most important relevance

42
Evaluating Web Search Engines
  • Web pages retrieved in the first page matters
    most
  • Commonly used metrics
  • Precision at rank 5, 10, and 20
  • Mean average precision (MAP)
  • For each query, compute its average precision
    across all recall levels,
  • Average the average precision across all the
    queries
  • Given K queries Q q1, q2, , qK , relevant
    documents for qj is
  • dj1, dj2, , djK , MAP(Q) is computed as

43
Zipfs Law
Collected from collection WSJ 1987
44
Zipfs Law
Slope 1
45
Zipfs Law
  • Excerpted from Jamie Callans slides

46
Implication of Zipfs Law
  • Term usage is highly skewed
  • Important for retrieval algorithms

47
Statistical Profile
48
Zipfs Law
  • Question how to estimate the probability of a
    word that does not appear in a collection?

49
Heaps Law
  • Estimate the vocabulary size for a collection
    based on the number of tokens found in the
    collection

M vocabulary size T number of tokens b
slope, 0.5 (sub-linear) k between 30 and 100
Write a Comment
User Comments (0)
About PowerShow.com