Title: CMU Y2 Rosetta GnG Distillation
1CMU Y2 Rosetta GnG Distillation
- Jonathan Elsas
- Jaime Carbonell
2Rosetta GnG System Evolution
Y1 Eval
3Distillation Challenges
- Multiple aspects to information need
- Query arguments, Locations, Related Words
- Static expansion terms/phrases
- Bigrams, trigrams, term windows
- Named-Entity wildcards constraints
- Occurrence of each of these in a document is a
feature indicating relevance of the document
to the information need. - Question How to best choose the weights for each
feature?
Or sentences, paragraphs, nuggets, etc.
4Query Feature Construction
- DESCRIBE THE ACTIONS OF Mahmoud Abbas DURING
- Location Middle East
- Equivalent terms
- Mahmoud Abbas
- Abu Mazen
- President of the Palestinian National Authority
Query Features
Unigram Features
5Query Feature Construction
- DESCRIBE THE ACTIONS OF Mahmoud Abbas DURING
- Location Middle East
- Equivalent terms
- Mahmoud Abbas
- Abu Mazen
- President of the Palestinian National Authority
Query Features
6Query Feature Construction
- DESCRIBE THE ACTIONS OF Mahmoud Abbas DURING
- Location Middle East
- Equivalent terms
- Mahmoud Abbas
- Abu Mazen
- President of the Palestinian National Authority
Query Features
7Query Feature Construction
- DESCRIBE THE ACTIONS OF Mahmoud Abbas DURING
- Location Middle East
- Equivalent terms
- Mahmoud Abbas
- Abu Mazen
- President of the Palestinian National Authority
Query Features
8Query Feature Construction
- DESCRIBE THE ACTIONS OF Mahmoud Abbas DURING
- Location Middle East
- Equivalent terms
- Mahmoud Abbas
- Abu Mazen
- President of the Palestinian National Authority
Query Features
9Query Feature Construction
- DESCRIBE THE ACTIONS OF Mahmoud Abbas DURING
- Location Middle East
- Equivalent terms
- Mahmoud Abbas
- Abu Mazen
- President of the Palestinian National Authority
Query Features
potentially many more structural features,
PRF, SRL annotations
10Learning Approach to Setting Feature Weights
- Goal Utilize existing relevance judgments to
learn optimal weight setting - Recently has become a hot research area in IR.
Learning to Rank
11Pair-wise Preference Learning
- Learning a document scoring function
-
- Treated as a classification problem on pairs of
documents -
- Resulting scoring function is used as the learned
document ranker.
Correct
Incorrect
12Committee Perceptron Algorithm
- Online algorithm (instance-at-a-time)
- Fast training, low memory requirements
- Ensemble method
- Selectively chooses N best hypotheses encountered
during training - N heads are better than 1 approach
- Significant advantages over previous perceptron
variants - Many ways to combine output of hypotheses
- Voting, score averaging, hybrid approaches
- This is the focus of current research
13Committee Perceptron Training
Training Data
Committee
Current Hypothesis
14Committee Perceptron Training
Training Data
Committee
Current Hypothesis
15Committee Perceptron Training
Training Data
Committee
Current Hypothesis
16Committee Perceptron Training
Training Data
Committee
Current Hypothesis
17Committee Perceptron Performance
18Committee Perceptron Learning Curves
19Next Steps
- (in progress) Integrate current work with GALE
GnG system - Document ranking is the obvious first step
- Passage ranking poses additional challenges
- Both will be addressed this year
- Implement feature-based query generation
framework for Rosetta GnG System - Extend improve performance of our rank learning
algorithm
20Future Work
- Investigate application of preference learning in
Utility system, adapting to real-time user
preference feedback.
21(No Transcript)