An Information Retrieval Approach based on Discourse Type

About This Presentation

Title:

An Information Retrieval Approach based on Discourse Type

Description:

The effectiveness of information retrieval (IR) systems varies ... Implant Dentistry. 308. Query Title. Query No. DY Wang _at_ 2006. 9. Information Unit (IU) ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 17

Provided by: www4Comp

Category:

more less

Transcript and Presenter's Notes

Title: An Information Retrieval Approach based on Discourse Type

1
An Information Retrieval Approach based on
Discourse Type
NLDB 2006

D. Y. Wang, R. W. P. Luk, K.F. Wong1 and K.L.
Kwok2

Department of Computing The Hong Kong Polytechnic
University 1Department of Systems Engineering
and Engineering Management The Chinese University
of Hong Kong 2Department of Computer
Science City University of New York
2
Content

Introduction
Motivation
Discourse Type
Information Unit
Problem Formulation
Score of topic terms
Score of discourse type
Document Re-ranking
Experimental Results
Conclusion

3
Motivation

The effectiveness of information retrieval (IR)
systems varies substantially from one topic to
another.
One reason Users Information need is very
diverse
Our approach finding the discourse type of the
topic and adopt appropriate strategy

4
Discourse Type

Definition of discourse type
The functions (including properties and
relations that cannot exist independently) of the
independent entities

5
Performance Difference
Average 0.2768
6
Why Choose Advantage / Disadvantage as our
example?

Its performance is worse than the average
0.204 v.s. 0.277
It is relatively abstract and therefore it is
unlikely to be investigated before.
Compared with concrete things (e.g. people,
country)
It is related to some cue phrases (e.g., more
than) that are composed of stop words.
Conventional IR ignores stop words

7
Why Choose Advantage / Disadvantage as example?
(cont.)

It is a popular discourse type of information
need.
we found that there are at least 40 questions
that are asking about advantages and
disadvantages of something at a website
(http//www.answerbag.com).
It has a reasonable amount (i.e., eight) of TREC
topics for investigation
See next slide

8
Eight Queries with discourse type Advantage /
Disadvantage
9
Information Unit (IU)
w words
w words
t
A document
........................
term1........................ ..............
...............................................
...................................
term2................. ......
term1.............................................
.
10
Why IU?

Assumption terms inside an IU (around topic
terms) are more important to relevance of
document than the terms outside the IU
Simplify the processing of the documents
Compute score for each IU
Aggregate the scores of all IU as the score of
the document

11
Score of Topic Terms

sumtf 4
Dtf 3
(d distinct)
Graph-based
Model
atS3
1/11/51/3
atS4
1/51/3

1
5
3
12
Example Score of Discourse Type

more (comparative words)3
support' back ',' confirm ',' contest ','
contrari ',' defend ',' encourag ',' endors ','
object ',' oppon ',' oppos ',' opposit ',' prove
',' quibbl ',' refer ',' sponsor ',' support ' (
from www.answers.com )
support 2

13
Documents Re-ranking

IU score before re-ranking S0
S0 similarity score of the document that
contains the IU
IU re-ranking score S
S S0 score of topic terms
S S0 score of discourse type
S S0 score of topic term score of discourse
type
Aggregate the re-ranking score of all IUs in a
document as the final score of the document.
Re-rank the documents by the final score.

14
Re-ranking Results in MAP
15
Conclusion

Re-ranking based on topic terms and discourse
type can both improve the retrieval performance.
Combining above two can improve the results most
significantly (at 95 confidence level, already
considering the sample size).
This approach is promising and is worth further
investigation.

Acknowledgement We thank the Center for
Intelligent Information Retrieval, University of
Massachusetts, for facilitating Robert Luk to
develop the basic IR system, when he was on leave
there. This work is supported by the CERG Project
PolyU 5226/05E.
16
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

An Information Retrieval Approach based on Discourse Type - PowerPoint PPT Presentation

An Information Retrieval Approach based on Discourse Type

The effectiveness of information retrieval (IR) systems varies ... Implant Dentistry. 308. Query Title. Query No. DY Wang _at_ 2006. 9. Information Unit (IU) ... – PowerPoint PPT presentation