Part 3 Real World Applications: SumTimeMousam - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Part 3 Real World Applications: SumTimeMousam

Description:

NLG system that automates the task of writing weather forecasts ... Maxim of Manner: Be perspicuous. More specifically: Avoid obscurity of expression. ... – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 38

Provided by: Somay

Category:

more less

Transcript and Presenter's Notes

Title: Part 3 Real World Applications: SumTimeMousam

1
Part 3Real World Applications SumTime-Mousam
2
In this lecture you learn

SumTime-Mousam
Knowledge acquisition
Design
Document planning
Microplanning
realization
Evaluation
Post-edit
End-user

3
Introduction

So far we studied
Data analysis techniques
Time series data
Spatial data
Visualization techniques
NLG techniques
Now we will study
SumTime-Mousam
a weather forecast text generation system
HCE 3.0
a visual knowledge discovery tool

4
SumTime-Mousam

NLG system that automates the task of writing
weather forecasts
Developed in our department
InputNumerical Weather Prediction (NWP) data
Data samples for a few dozens of parameters every
hour/3 hour from two NWP models
Output marine forecasts - forecasts for offshore
oilrig applications
Has been used by our industrial collaborator
since June 2002.
Forecasts for 150 locations per day

5
Example
6
Example
7
Knowledge Acquisition (KA)

KA Tasks
Think aloud sessions
Direct Acquisition of knowledge
Onsite Observations
Corpus analysis
Collaborative prototype development

8
Corpus Description

SumTime-Meteo - parallel Text-Data Corpus
Size - 1045 parallel Text-Data units
Unit
NWP Model Data
Human Written Forecast Text
Similar in concept to statistical MT (Machine
Translation)
Naturally Occurring
written for oilrig staff in the North Sea
Distribution of the Corpus
Available in the public domain

9
Parallel Text - Data
WSW 10-15 increasing 17-22 by early morning,
then gradually easing 9-14 by midnight.
10
Corpus Analyses

Meanings of Time phrases
Meanings of time phrases in terms of numerical
data
required for lexical choice in summarization
No standard time phrase mappings exist
Numerical time values not mentioned in forecasts

11
Alignment

Step 1
Parsing the forecast texts
parser tuned for forecast text syntax
break the text into phrases
extract information such as wind speed and wind
direction
parser carried forward values for the missing
fields (shown later in the example)

12
Example
SSW 12-16 BACKING ESE 16-20 IN THE MORNING,
BACKING NE EARLY AFTERNOON THEN NNW 24-28 LATE
EVENING
13
Alignment (2)

Step 2
Associate each phrase with an entry in the input
data set
43 of the phrases matched with a single entry
(without ambiguity)
heuristics used for improving the accuracy of
alignment to 70
Further improvements in alignment under
investigation

14
Example (2)
Example Phrase VEERING SW 10-14 BY EVENING
Input Data 1800 SW
By evening ---------gt 1800 hours
Example Phrase BACKING ESE 16-20 IN THE MORNING
Input Data 0600 ESE 18 0900 ESE 16
In the morning -------------gt 0600 hours
15
Results
16
Limitations of Corpus Analysis

Quality of knowledge acquired
good in some cases
poor in many cases
required clarifications from experts
Useful when used along with other KA techniques

17
KA Methodology
Directly Ask Experts for Knowledge
Initial Prototype
Structured KA with Experts
Corpus Analysis
Initial Version of Full System
Expert Revision
Final System
18
SumTime-MousamArchitecture
Control Data

Document planning
content selection and organisation
Microplanning
selecting words and phrases
ellipsis
Realisation
output text using the words and phrases by
applying grammar rules
Control Data
derived from end user profile

19
Content Selection

What data items are worth picking up for the
summary?
Reasoning from first principles - no detailed
user model
Reusing data analysis techniques used by KDD
community
Attractive
but not developed for communication
Adapting data analysis techniques to suit needs
of communication using the Gricean Maxims

20
Data Analysis

Experts View
Step Method
Report changes above thresholds (Significant
changes)
Corpus View
Segmentation Method
Report changes in Slopes/ report trends

21
Example

MAGNUS / THISTLE / NW HUTTON, EAST OF SHETLAND
day hour wind dir wind speed (Knots)
20-1-01 6 S 4
20-1-01 9 S 6
20-1-01 12 S 7
20-1-01 15 S 10
20-1-01 18 S 12
20-1-01 21 S 16
21-1-01 0 S 18
FORECAST FOR 06-24 GMT, 20- Jan 2001
S 02-06 INCREASING 16-20 BY EVENING

22
Experts View-Step Model
S 3-8 INCREASING 8-13 BY AFTERNOON AND 13-18 BY
EVENING.
23
Corpus View-Segmentation Model
S 3-8 INCREASING 15-20 BY MIDNIGHT.
24
Gricean Maxims (Grice 1975)

Maxim of Quality Try to make your contribution
one that is true. More specifically
Do not say what you believe to be false.
Do not say that for which you lack adequate
evidence.
Maxim of Quantity
Make your contribution as informative as is
required (for the current purposes of the
exchange).
Do not make your contribution more informative
than is required.
Maxim of Relevance Be relevant.
Maxim of Manner Be perspicuous. More
specifically
Avoid obscurity of expression. -Avoid
ambiguity.
Be brief. -Be orderly.

25
Application of Gricean Maxims - Example

Maxim of Quality
Try to report true values from the input data
Use linear interpolation instead of linear
segmentation
Uncertainty in the input data needs to be
communicated to the user

26
Sample Data
27
Linear Regression Vs Linear Interpolation
28
Linear Regression Vs Linear Interpolation (2)

Linear Regression
S 03-07 INCREASING 16-20 BY MIDNIGHT
Linear Interpolation
S 06-10 INCREASING 18-22 BY MIDNIGHT
Human Written Forecast
S 06-10 INCREASING 18-22 BY MIDNIGHT
Although visually linear regression looks better
forecasters do not use it.
Uncertainty
Speed values are mentioned as ranges e.g. 06-07
18-22

29
Intrinsic Evaluation of content determination

Metrics
Short - Size (Accessibility)
Accurate - Error (Informativeness)
Size Computation
measured at the conceptual level
number of wind states
Error Computation
Vertical distance from the line of approximation
combined error in wind speed and wind direction
normalized

30
Results of Evaluation

Segmentation produces shorter summaries without
losing accuracy
Details
16.5 of cases segmentation is better than step
in both size and error
0.56 of cases the step method is better than
segmentation in both size and error
2.5 of cases segmentation is better then step
error wise but worse size wise
32 of cases segmentation is better then step
size wise but worse error wise
31 of cases segmentation is better than step
error wise but equal size wise

31
Micro-planning Realization

Based on Parallel corpus analysis (described
earlier) and
Expert KA/Revision
Details in Papers at
www.csd.abdn.ac.uk/research/sumtime/papers.html

32
SumTime-Mousam at Weathernews (UK) Ltd.
33
Post-edit Evaluation

Total number of forecasts analysed 2728
2728 texts divided into 73041 phrases
7608 (10) phrases could not be aligned
Alignment failures imply that forecasters are not
happy with our content determination
Which is dependent on a process called
segmentation
Forecasters seem to perform more sophisticated
reasoning than simple segmentation

34
Analysis results (1)

Out of the successfully aligned phrases
43914 phrases matched perfectly
21519 phrases are mismatches
Detailed analysis of the mismatches

35
Analysis Results (2)
36
End-user Evaluation

73 End-users (oil company staff supporting
offshore oilrigs) participated in this evaluation
used forecasts produced by the following three
methods
human written weather forecasts
SumTime-Mousam generated weather forecasts
SumTime-Mousam expressing Human select content
Each participant completed a questionnaire that
has two parts
Part 1
forecast produced by one of the above three
methods (anonymous)
Participant is required to answer comprehension
questions based on the forecast
Part 2
showed any two forecasts from the above three
methods (anonymous)
Participant specified his/her preference for one
of the two forecasts
The main result
end-users consider the SumTime-Mousam generated
output linguistically better than human written
forecasts
Content of SumTime-Mousam is not as good as human
selected content

37
Conclusion

SumTime-Mousam is the result of knowledge
obtained from
several knowledge acquisition studies
Expert based
Corpus based
Several evaluation studies
Intrinsic evaluation
Post-edit evaluation
End-user evaluation
The development of SumTime-Mousam went through
many cycles
Building novel technology requires iterative
approach with multiple KA and evaluation studies