Title: User Based Evaluation of Summary Quality Fennie Liang
1User Based Evaluation of Summary QualityFennie
Liang
2Introduction
- Summary types
- Extraction vs Abstraction
- Approaches
- cut-and-paste vs Understanding identifying
rewriting - Evaluation categories
- Intrinsic vs Extrinsic
- There is no single best rule for evaluation
(Jing, 1998)
3Motivation
- Problem
- -Users searching time.
- -Search engines techniques.
- Solution
- - Query Terms Collocation algorithm
- Hypothesis
- - Implementing the QTC algorithm to produce
summaries which more accurately and concisely
represent the original pages and help users to
make quicker, more accurate judgements without
accessing the actual pages.
41. Accept query2. Foreach stop_word in
_at_stop_word_list
mark stop_word3. _at_query split query by
marked stop_word4. Foreach phrase in the
_at_query
_at_total_term push phrase into _at_total_term5.
Foreach phrase in the _at_query
if (phrase contains
space) _at_term
split phrase by space
foreach single_term in _at_term
_at_total_term push single_term
into _at_total_term
6.
Foreach term in _at_total_term
total_number_of_term add 1 into
total_number_of_term 7. Foreach
score_of_term in _at_total_term
assign total_number_of_term into
score_of_term
total_number_of_term subtract 1 from
total_number_of_term
The QTC algorithm
5Example of the QTC process
- how is water supplied to mojave desert region
- lthowgt ltisgt water supplied lttogt mojave desert
region - water supplied, mojave desert region
- _at_total_term water supplied, mojave desert
region - _at_total_term water supplied, mojave desert
region, water, supplied, mojave, desert, region - Total terms of 7
- water supplied gt 7
- mojave desert region gt 6
- water gt 5
- supplied gt 4
- mojave gt 3
- desert gt 2
- region gt 1
6The QTC summarisation system
7Experiment (1)
- Design
- 1. Representativeness How well do our
summaries represent their corresponding page
contents ? - 2. Judgeability What percentage of the
systems summaries is meaningful to users? - Subjects
- Mature a native English speaker a regular
search engine user in a close range of English
language proficiency. - Data set Selected 12 TREC9 web track queries as
our test queries to retrieve English language web
pages. - - contained no misspelled terms.
- - the first 10 returned pages were in
English. - - the queries contained two or more terms
after removing stop words.
8Experiment (2)
- Test sheets
- 1. Summary representativeness test sheets
(intrinsic, relative to the original). - 2. Summary judgeability test sheets
(extrinsic, task-based).
Group A with Google Group B with QTC
The first day Query 1, 2, 3 Query 4, 5, 6
Group A with QTC Group B with Google
The second day Query 7, 8, 9 Query 10 , 11, 12
9Experiment (3)
...(F1)
(F2)
Mscore
Sscore
x Mscore ......(F3)
10Result
Subject QTC Google
A 0.57 0.61
B 0.61 0.64
C 0.53 0.36
D 0.60 0.69
E 0.69 0.75
F 0.69 0.67
G 0.51 0.51
H 0.71 0.53
I 0.47 0.49
J 0.53 0.53
Mean 0.59 0.58
QTC QTC QTC QTC Google Google Google Google
Sub R I K U R I K U
A 0.44 0.43 0.87 0.13 0.20 0.53 0.73 0.27
B 0.57 0.33 0.90 0.10 0.43 0.37 0.80 0.20
C 0.33 0.37 0.70 0.30 0.27 0.43 0.70 0.30
D 0.44 0.33 0.77 0.23 0.23 0.20 0.43 0.57
E 0.44 0.33 0.77 0.23 0.23 0.17 0.40 0.60
F 0.47 0.37 0.84 0.16 0.30 0.43 0.73 0.27
G 0.27 0.57 0.84 0.16 0.40 0.43 0.83 0.17
H 0.17 0.60 0.77 0.23 0.23 0.47 0.70 0.30
I 0.67 0.23 0.90 0.10 0.57 0.20 0.77 0.23
J 0.17 0.33 0.50 0.50 0.27 0.10 0.37 0.63
Mean 0.40 0.39 0.79 0.21 0.32 0.33 0.65 0.35
Summary judgeability (F2)
Summary representativeness (F1)
11Evaluation(1)
r 0.623, df 9, a 0.05, t 0.452.
Subject No 1 2 3 4 5 6 7 8 9 10
Subject I G C J A D B E F H
12Evaluation(2)
r 0.711, df 9, a 0.05, t 3.588.
Subject No 1 2 3 4 5 6 7 8 9 10
Subject J C D E H F G A B I
13Evaluation(3)
r 0.745 df9 a 0.05 t 4.735
Subject No 1 2 3 4 5 6 7 8 9 10
Subject J C I G D A E H B F
14Conclusion
- A user based evaluation method.
- QTC algorithm.
- A summarys quality representativeness,
judgeability - Three formulas for summary quality measurement.
- A t-test shows a significant result for QTCs
summary quality. - Current automatic summarisation techniques used
in search engines are not yet performing well
enough to satisfy users requirements. - Human judgements need to consider various
aspects, which depend on the purposes of the
evaluation.