An Information Retrieval Approach based on Discourse Type - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

An Information Retrieval Approach based on Discourse Type

Description:

The effectiveness of information retrieval (IR) systems varies ... Implant Dentistry. 308. Query Title. Query No. DY Wang _at_ 2006. 9. Information Unit (IU) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 17
Provided by: www4Comp
Category:

less

Transcript and Presenter's Notes

Title: An Information Retrieval Approach based on Discourse Type


1
An Information Retrieval Approach based on
Discourse Type
NLDB 2006
  • D. Y. Wang, R. W. P. Luk, K.F. Wong1 and K.L.
    Kwok2

Department of Computing The Hong Kong Polytechnic
University 1Department of Systems Engineering
and Engineering Management The Chinese University
of Hong Kong 2Department of Computer
Science City University of New York
2
Content
  • Introduction
  • Motivation
  • Discourse Type
  • Information Unit
  • Problem Formulation
  • Score of topic terms
  • Score of discourse type
  • Document Re-ranking
  • Experimental Results
  • Conclusion

3
Motivation
  • The effectiveness of information retrieval (IR)
    systems varies substantially from one topic to
    another.
  • One reason Users Information need is very
    diverse
  • Our approach finding the discourse type of the
    topic and adopt appropriate strategy

4
Discourse Type
  • Definition of discourse type
  • The functions (including properties and
    relations that cannot exist independently) of the
    independent entities

5
Performance Difference
Average 0.2768
6
Why Choose Advantage / Disadvantage as our
example?
  • Its performance is worse than the average
  • 0.204 v.s. 0.277
  • It is relatively abstract and therefore it is
    unlikely to be investigated before.
  • Compared with concrete things (e.g. people,
    country)
  • It is related to some cue phrases (e.g., more
    than) that are composed of stop words.
  • Conventional IR ignores stop words

7
Why Choose Advantage / Disadvantage as example?
(cont.)
  • It is a popular discourse type of information
    need.
  • we found that there are at least 40 questions
    that are asking about advantages and
    disadvantages of something at a website
    (http//www.answerbag.com).
  • It has a reasonable amount (i.e., eight) of TREC
    topics for investigation
  • See next slide

8
Eight Queries with discourse type Advantage /
Disadvantage
9
Information Unit (IU)
w words
w words
t
A document
........................
term1........................ ..............
...............................................
...................................
term2................. ......
term1.............................................
.
10
Why IU?
  • Assumption terms inside an IU (around topic
    terms) are more important to relevance of
    document than the terms outside the IU
  • Simplify the processing of the documents
  • Compute score for each IU
  • Aggregate the scores of all IU as the score of
    the document

11
Score of Topic Terms
  • sumtf 4
  • Dtf 3
  • (d distinct)
  • Graph-based
  • Model
  • atS3
  • 1/11/51/3
  • atS4
  • 1/51/3

1
5
3
12
Example Score of Discourse Type
  • more (comparative words)3
  • support' back ',' confirm ',' contest ','
    contrari ',' defend ',' encourag ',' endors ','
    object ',' oppon ',' oppos ',' opposit ',' prove
    ',' quibbl ',' refer ',' sponsor ',' support ' (
    from www.answers.com )
  • support 2

13
Documents Re-ranking
  • IU score before re-ranking S0
  • S0 similarity score of the document that
    contains the IU
  • IU re-ranking score S
  • S S0 score of topic terms
  • S S0 score of discourse type
  • S S0 score of topic term score of discourse
    type
  • Aggregate the re-ranking score of all IUs in a
    document as the final score of the document.
  • Re-rank the documents by the final score.

14
Re-ranking Results in MAP
15
Conclusion
  • Re-ranking based on topic terms and discourse
    type can both improve the retrieval performance.
  • Combining above two can improve the results most
    significantly (at 95 confidence level, already
    considering the sample size).
  • This approach is promising and is worth further
    investigation.

Acknowledgement We thank the Center for
Intelligent Information Retrieval, University of
Massachusetts, for facilitating Robert Luk to
develop the basic IR system, when he was on leave
there. This work is supported by the CERG Project
PolyU 5226/05E.
16
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com