User Based Evaluation of Summary Quality Fennie Liang

About This Presentation

Title:

User Based Evaluation of Summary Quality Fennie Liang

Description:

cut-and-paste vs Understanding identifying rewriting. Evaluation categories: ... Query Terms Collocation algorithm. Hypothesis: ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 15

Provided by: fen9

Category:

more less

Transcript and Presenter's Notes

Title: User Based Evaluation of Summary Quality Fennie Liang

1
User Based Evaluation of Summary QualityFennie
Liang
2
Introduction

Summary types
Extraction vs Abstraction
Approaches
cut-and-paste vs Understanding identifying
rewriting
Evaluation categories
Intrinsic vs Extrinsic
There is no single best rule for evaluation
(Jing, 1998)

3
Motivation

Problem
-Users searching time.
-Search engines techniques.
Solution
- Query Terms Collocation algorithm
Hypothesis
- Implementing the QTC algorithm to produce
summaries which more accurately and concisely
represent the original pages and help users to
make quicker, more accurate judgements without
accessing the actual pages.

4
1. Accept query2. Foreach stop_word in
_at_stop_word_list
mark stop_word3. _at_query split query by
marked stop_word4. Foreach phrase in the
_at_query
_at_total_term push phrase into _at_total_term5.
Foreach phrase in the _at_query
if (phrase contains
space) _at_term
split phrase by space
foreach single_term in _at_term

_at_total_term push single_term
into _at_total_term
6.
Foreach term in _at_total_term
total_number_of_term add 1 into
total_number_of_term 7. Foreach
score_of_term in _at_total_term
assign total_number_of_term into
score_of_term
total_number_of_term subtract 1 from
total_number_of_term
The QTC algorithm
5
Example of the QTC process

how is water supplied to mojave desert region
lthowgt ltisgt water supplied lttogt mojave desert
region
water supplied, mojave desert region
_at_total_term water supplied, mojave desert
region
_at_total_term water supplied, mojave desert
region, water, supplied, mojave, desert, region
Total terms of 7
water supplied gt 7
mojave desert region gt 6
water gt 5
supplied gt 4
mojave gt 3
desert gt 2
region gt 1

6
The QTC summarisation system
7
Experiment (1)

Design
1. Representativeness How well do our
summaries represent their corresponding page
contents ?
2. Judgeability What percentage of the
systems summaries is meaningful to users?
Subjects
Mature a native English speaker a regular
search engine user in a close range of English
language proficiency.
Data set Selected 12 TREC9 web track queries as
our test queries to retrieve English language web
pages.
- contained no misspelled terms.
- the first 10 returned pages were in
English.
- the queries contained two or more terms
after removing stop words.

8
Experiment (2)

Test sheets
1. Summary representativeness test sheets
(intrinsic, relative to the original).
2. Summary judgeability test sheets
(extrinsic, task-based).

Group A with Google Group B with QTC
The first day Query 1, 2, 3 Query 4, 5, 6
Group A with QTC Group B with Google
The second day Query 7, 8, 9 Query 10 , 11, 12
9
Experiment (3)

...(F1)

(F2)
Mscore
Sscore
x Mscore ......(F3)
10
Result
Subject QTC Google
A 0.57 0.61
B 0.61 0.64
C 0.53 0.36
D 0.60 0.69
E 0.69 0.75
F 0.69 0.67
G 0.51 0.51
H 0.71 0.53
I 0.47 0.49
J 0.53 0.53
Mean 0.59 0.58
QTC QTC QTC QTC Google Google Google Google
Sub R I K U R I K U
A 0.44 0.43 0.87 0.13 0.20 0.53 0.73 0.27
B 0.57 0.33 0.90 0.10 0.43 0.37 0.80 0.20
C 0.33 0.37 0.70 0.30 0.27 0.43 0.70 0.30
D 0.44 0.33 0.77 0.23 0.23 0.20 0.43 0.57
E 0.44 0.33 0.77 0.23 0.23 0.17 0.40 0.60
F 0.47 0.37 0.84 0.16 0.30 0.43 0.73 0.27
G 0.27 0.57 0.84 0.16 0.40 0.43 0.83 0.17
H 0.17 0.60 0.77 0.23 0.23 0.47 0.70 0.30
I 0.67 0.23 0.90 0.10 0.57 0.20 0.77 0.23
J 0.17 0.33 0.50 0.50 0.27 0.10 0.37 0.63
Mean 0.40 0.39 0.79 0.21 0.32 0.33 0.65 0.35
Summary judgeability (F2)
Summary representativeness (F1)
11
Evaluation(1)
r 0.623, df 9, a 0.05, t 0.452.
Subject No 1 2 3 4 5 6 7 8 9 10
Subject I G C J A D B E F H
12
Evaluation(2)
r 0.711, df 9, a 0.05, t 3.588.
Subject No 1 2 3 4 5 6 7 8 9 10
Subject J C D E H F G A B I
13
Evaluation(3)
r 0.745 df9 a 0.05 t 4.735
Subject No 1 2 3 4 5 6 7 8 9 10
Subject J C I G D A E H B F
14
Conclusion

A user based evaluation method.
QTC algorithm.
A summarys quality representativeness,
judgeability
Three formulas for summary quality measurement.
A t-test shows a significant result for QTCs
summary quality.
Current automatic summarisation techniques used
in search engines are not yet performing well
enough to satisfy users requirements.
Human judgements need to consider various
aspects, which depend on the purposes of the
evaluation.