User Based Evaluation of Summary Quality Fennie Liang - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

User Based Evaluation of Summary Quality Fennie Liang

Description:

cut-and-paste vs Understanding identifying rewriting. Evaluation categories: ... Query Terms Collocation algorithm. Hypothesis: ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 15
Provided by: fen9
Category:

less

Transcript and Presenter's Notes

Title: User Based Evaluation of Summary Quality Fennie Liang


1
User Based Evaluation of Summary QualityFennie
Liang
2
Introduction
  • Summary types
  • Extraction vs Abstraction
  • Approaches
  • cut-and-paste vs Understanding identifying
    rewriting
  • Evaluation categories
  • Intrinsic vs Extrinsic
  • There is no single best rule for evaluation
    (Jing, 1998)

3
Motivation
  • Problem
  • -Users searching time.
  • -Search engines techniques.
  • Solution
  • - Query Terms Collocation algorithm
  • Hypothesis
  • - Implementing the QTC algorithm to produce
    summaries which more accurately and concisely
    represent the original pages and help users to
    make quicker, more accurate judgements without
    accessing the actual pages.

4
1. Accept query2. Foreach stop_word in
_at_stop_word_list
mark stop_word3. _at_query split query by
marked stop_word4. Foreach phrase in the
_at_query
_at_total_term push phrase into _at_total_term5.
Foreach phrase in the _at_query
if (phrase contains
space) _at_term
split phrase by space
foreach single_term in _at_term

_at_total_term push single_term
into _at_total_term
6.
Foreach term in _at_total_term
total_number_of_term add 1 into
total_number_of_term 7. Foreach
score_of_term in _at_total_term
assign total_number_of_term into
score_of_term
total_number_of_term subtract 1 from
total_number_of_term
The QTC algorithm
5
Example of the QTC process
  • how is water supplied to mojave desert region
  • lthowgt ltisgt water supplied lttogt mojave desert
    region
  • water supplied, mojave desert region
  • _at_total_term water supplied, mojave desert
    region
  • _at_total_term water supplied, mojave desert
    region, water, supplied, mojave, desert, region
  • Total terms of 7
  • water supplied gt 7
  • mojave desert region gt 6
  • water gt 5
  • supplied gt 4
  • mojave gt 3
  • desert gt 2
  • region gt 1

6
The QTC summarisation system
7
Experiment (1)
  • Design
  • 1. Representativeness How well do our
    summaries represent their corresponding page
    contents ?
  • 2. Judgeability What percentage of the
    systems summaries is meaningful to users?
  • Subjects
  • Mature a native English speaker a regular
    search engine user in a close range of English
    language proficiency.
  • Data set Selected 12 TREC9 web track queries as
    our test queries to retrieve English language web
    pages.
  • - contained no misspelled terms.
  • - the first 10 returned pages were in
    English.
  • - the queries contained two or more terms
    after removing stop words.

8
Experiment (2)
  • Test sheets
  • 1. Summary representativeness test sheets
    (intrinsic, relative to the original).
  • 2. Summary judgeability test sheets
    (extrinsic, task-based).

Group A with Google Group B with QTC
The first day Query 1, 2, 3 Query 4, 5, 6
Group A with QTC Group B with Google
The second day Query 7, 8, 9 Query 10 , 11, 12
9
Experiment (3)






...(F1)

(F2)
Mscore
Sscore
x Mscore ......(F3)
10
Result
Subject QTC Google
A 0.57 0.61
B 0.61 0.64
C 0.53 0.36
D 0.60 0.69
E 0.69 0.75
F 0.69 0.67
G 0.51 0.51
H 0.71 0.53
I 0.47 0.49
J 0.53 0.53
Mean 0.59 0.58
QTC QTC QTC QTC Google Google Google Google
Sub R I K U R I K U
A 0.44 0.43 0.87 0.13 0.20 0.53 0.73 0.27
B 0.57 0.33 0.90 0.10 0.43 0.37 0.80 0.20
C 0.33 0.37 0.70 0.30 0.27 0.43 0.70 0.30
D 0.44 0.33 0.77 0.23 0.23 0.20 0.43 0.57
E 0.44 0.33 0.77 0.23 0.23 0.17 0.40 0.60
F 0.47 0.37 0.84 0.16 0.30 0.43 0.73 0.27
G 0.27 0.57 0.84 0.16 0.40 0.43 0.83 0.17
H 0.17 0.60 0.77 0.23 0.23 0.47 0.70 0.30
I 0.67 0.23 0.90 0.10 0.57 0.20 0.77 0.23
J 0.17 0.33 0.50 0.50 0.27 0.10 0.37 0.63
Mean 0.40 0.39 0.79 0.21 0.32 0.33 0.65 0.35
Summary judgeability (F2)
Summary representativeness (F1)
11
Evaluation(1)
r 0.623, df 9, a 0.05, t 0.452.
Subject No 1 2 3 4 5 6 7 8 9 10
Subject I G C J A D B E F H
12
Evaluation(2)
r 0.711, df 9, a 0.05, t 3.588.
Subject No 1 2 3 4 5 6 7 8 9 10
Subject J C D E H F G A B I
13
Evaluation(3)
r 0.745 df9 a 0.05 t 4.735
Subject No 1 2 3 4 5 6 7 8 9 10
Subject J C I G D A E H B F
14
Conclusion
  • A user based evaluation method.
  • QTC algorithm.
  • A summarys quality representativeness,
    judgeability
  • Three formulas for summary quality measurement.
  • A t-test shows a significant result for QTCs
    summary quality.
  • Current automatic summarisation techniques used
    in search engines are not yet performing well
    enough to satisfy users requirements.
  • Human judgements need to consider various
    aspects, which depend on the purposes of the
    evaluation.
Write a Comment
User Comments (0)
About PowerShow.com