Robust Pseudo Feedback - PowerPoint PPT Presentation

About This Presentation
Title:

Robust Pseudo Feedback

Description:

To test the effectiveness of some recent language modeling methods for genomics retrieval ... a dynamically set. a manually set. 11/16/06. 14. Goal of Participation ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 35
Provided by: jingj5
Learn more at: http://www.mysmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Robust Pseudo Feedback


1
Robust Pseudo Feedback HMM Passage
ExtractionUIUC at TREC 2006 Genomics Track
  • Jing Jiang, Xin He, ChengXiang Zhai
  • University of Illinois at Urbana-Champaign

2
Goal of Participation
  • To test the effectiveness of some recent language
    modeling methods for genomics retrieval
  • Robust pseudo feedback Tao Zhai 06
  • HMM passage extraction Jiang Zhai 06
  • Task at 2006 genomics track
  • Document-level retrieval
  • Passage-level retrieval
  • Aspect-level retrieval

3
Overall Approach
Medline articles
paragraphs
ranked passages
k
2
1


1
Document Retrieval Module
Passage Extraction Module
Q
2
ranked paragraphs
pseudo relevance feedback

k
user relevance feedback

4
Goal of Participation
  • To test the effectiveness of some recent language
    modeling methods for genomics retrieval
  • Robust pseudo feedback Tao Zhai 06
  • HMM passage extraction Jiang Zhai 06

5
KL-Divergence Retrieval ModelLafferty Zhai 01
the 0.020 for 0.015 prp 0.102 mad 0.034 cow 0.034
diseas 0.068
Thefor spongiformPrP protein
D1
document
Prion diseases that(PrP C)This
D2

role 0.2 prnp 0.2 mad 0.2 cow 0.2 diseas 0.2
topic
which(PrP C)to theprion protein
Dk

6
KL-Divergence Retrieval ModelLafferty Zhai 01
the 0.020 for 0.015 prp 0.102 mad 0.034 cow 0.034
diseas 0.068
Thefor spongiformPrP protein
D1
document
Prion diseases that(PrP C)This
D2

role 0.2 prnp 0.2 mad 0.2 cow 0.2 diseas 0.2
topic
which(PrP C)to theprion protein
Dk

7
Model-Based FeedbackZhai Lafferty 01
Thefor spongiformPrP protein
the 0.02 for 0.01 prp 0.003 prion 0.004
background
D1
Prion diseases that(PrP C)This
D2
role 0.2 prnp 0.2 mad 0.2 cow 0.2 diseas 0.2
the ? for ? prp ? prion ?

which(PrP C)to theprion protein
topic
feedback
Dk

8
Model-Based FeedbackZhai Lafferty 01
Thefor spongiformPrP protein
the 0.02 for 0.01 prp 0.003 prion 0.004
background
D1
EM algorithm
Prion diseases that(PrP C)This
D2
role 0.2 prnp 0.2 mad 0.2 cow 0.2 diseas 0.2
the 0.003 for 0.002 prp 0.02 prion 0.05

which(PrP C)to theprion protein
topic
feedback
Dk

9
Model-Based FeedbackZhai Lafferty 01
Thefor spongiformPrP protein
the 0.02 for 0.01 prp 0.003 prion 0.004
background
D1
Prion diseases that(PrP C)This
D2
role 0.2 prnp 0.2 mad 0.2 cow 0.2 diseas 0.2
the 0.003 for 0.002 prp 0.02 prion 0.05

which(PrP C)to theprion protein
topic
feedback
Dk

2 parameters a and ?
10
Regularized EstimationTao Zhai 06
Thefor spongiformPrP protein
the 0.02 for 0.01 prp 0.003 prion 0.004
background
D1
Prion diseases that(PrP C)This
D2
role 0.2 prnp 0.2 mad 0.2 cow 0.2 diseas 0.2
the ? for ? prp ? prion ?

which(PrP C)to theprion protein
topic
feedback
Dk

11
Regularized EstimationTao Zhai 06
Thefor spongiformPrP protein
the 0.02 for 0.01 prp 0.003 prion 0.004
background
D1
regularized EM algorithm
prior
Prion diseases that(PrP C)This
D2
role 0.2 prnp 0.2 mad 0.2 cow 0.2 diseas 0.2
the 0.003 for 0.002 prp 0.02 prion 0.05

which(PrP C)to theprion protein
topic
feedback
Dk

12
Regularized EstimationTao Zhai 06
Thefor spongiformPrP protein
the 0.02 for 0.01 prp 0.003 prion 0.004
background
D1
prior
Prion diseases that(PrP C)This
D2
role 0.2 prnp 0.2 mad 0.2 cow 0.2 diseas 0.2
the 0.003 for 0.002 prp 0.02 prion 0.05

which(PrP C)to theprion protein
topic
feedback
Dk

1 parameter ?
13
Original vs. Regularized EM
original
a manually set
a dynamically set
14
Goal of Participation
  • To test the effectiveness of some recent language
    modeling methods for genomics retrieval
  • Robust pseudo feedback Tao Zhai 06
  • HMM passage extraction Jiang Zhai 06

15
HMM Passage ExtractionJiang Zhai 06
relevant passage
paragraph
w
w

w
w

w
w
w
w
w

w
w
p(wB1) the 0.02 for 0.01 prp 0.001
p(wR) the 0.003 for 0.002 prp 0.02
p(wB2) the 0.02 for 0.01 prp 0.001
B1
R
B2
HMM
p(RB1) 0.1
p(B2R) 0.05
p(B1B1) 0.9
p(RR) 0.95
p(B2B2) 1
16
HMM Passage ExtractionJiang Zhai 06
transition probabilities estimated from
observations
end-of-paragraph state
B1
R
B3
E
B2
a background state for smoothing
17
Experiment Design
  • Pre-processing
  • HTML parsing
  • paragraph boundaries
  • Tokenization
  • User relevance feedback

18
Official Runs
Medline articles
paragraphs
ranked passages
k
2
1


1
KL-Div Retrieval
HMM Passage Extraction
Q
2
ranked paragraphs

Q'
k

19
UIUCauto
Medline articles
paragraphs
ranked passages
k
2
1


1
KL-Div Retrieval
HMM Passage Extraction
Q
2
ranked paragraphs

Q'
k
regularized estimation

20
UIUCinter
Medline articles
paragraphs
ranked passages
k
2
1


1
KL-Div Retrieval
HMM Passage Extraction
Q
2
ranked paragraphs

Q'
k
regularized estimation

21
UIUCinter2
Medline articles
paragraphs
ranked passages
k
2
1


1
KL-Div Retrieval
HMM Passage Extraction
Q
2
ranked paragraphs

Q'
k
F
original estimation

22
Pseudo Relevance Feedback(k 10)
? is similar to ? / (1 - ?)
23
Pseudo Relevance Feedback(k 10)
? is similar to ? / (1 - ?)
24
Pseudo Relevance Feedback(k 10)
? is similar to ? / (1 - ?)
25
Parameter Sensitivity(pseudo feedback, k 10)
26
User Relevance Feedback
27
User Relevance Feedback
28
User Relevance Feedback
29
HMM Passage Extraction
30
Passage Length (In Bytes)
HMM passages are generally too long!
31
Example Passage
Prion diseases, which include Creutzfeldt-Jacob
disease in humans, mad cow disease in cattle, and
scrapie in sheep, involve the misfolding of the
benign cellular prion protein (PrP C) 1 to the
infectious disease-causing scrapie isoform PrP
Sc. The prion protein (PrP C) is a copper-binding
cell surface glycoprotein. The role of copper in
the normal function of PrP, as well as in prion
diseases, has been the subject of a number of
excellent reviews. The mature cellular form of
PrP consists of residues 23 to 231 and is
tethered to the cell surface via a
glycosylphosphatidylinositol anchor at the C
terminus. There are now a number of NMR solution
structures of copper-free mammalian PrPs. A
crystal structure of PrP C has also been
published this structure is dimeric involving
domain swapping of the monomeric form.
32
Example Passage
Prion diseases, which include Creutzfeldt-Jacob
disease in humans, mad cow disease in cattle, and
scrapie in sheep, involve the misfolding of the
benign cellular prion protein (PrP C) 1 to the
infectious disease-causing scrapie isoform PrP
Sc. The prion protein (PrP C) is a copper-binding
cell surface glycoprotein. The role of copper in
the normal function of PrP, as well as in prion
diseases, has been the subject of a number of
excellent reviews. The mature cellular form of
PrP consists of residues 23 to 231 and is
tethered to the cell surface via a
glycosylphosphatidylinositol anchor at the C
terminus. There are now a number of NMR solution
structures of copper-free mammalian PrPs. A
crystal structure of PrP C has also been
published this structure is dimeric involving
domain swapping of the monomeric form.
33
Conclusions and Future Work
  • The two language modeling methods in general
    works well in genomics domain
  • Regularized feedback estimation can effectively
    eliminates parameter a
  • HMM passages improves over paragraphs
  • User relevance feedback is effective
  • Limitations and future work
  • Regularized feedback estimation still has
    parameter ? to tune
  • How to eliminate ??
  • The inherent coherence property of HMM passages
    may not suit the task well
  • Different/better HMM architecture?

34
The End
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com