Title: Testing methods for alarmbased earthquake prediction strategies
1Testing methods for alarm-based earthquake
prediction strategies
- Jeremy Zechar
- University of Southern California
2Probabilisticpredictions
Alarm-basedpredictions
- Forecast rate of target events or probability of
occurrence - Standard evaluation and comparison via likelihood
methods
- Forecast occurrence or non-occurrence of target
events - Several evaluation approaches
- Few attempts at establishing statistical
significance - Can be derived from probabilistic predictions
3Deriving alarms from probabilisticpredictions
Low threshold ????
Medium threshold ????
High threshold ????
(after Holliday et al 2005)
4Assumptions (or at least what I have in mind)
- The hypotheses were interested in testing lead
to forecasts of this nature we expect one or
more target earthquakes (earthquakes with
magnitude gt target magnitude) in a given space. - If were deriving alarms from a probabilistic
forecast, were interested in evaluating
individual alarm sets and a range of derived
alarm sets. - For the examples in this presentation, alarms are
fixed at the beginning of an experiment and do
not expire until the end of the experiment.
5Binary prediction, binary outcome
Space
Time
6In other words
- Having given the number of instances
respectively in which things are both thus and
so, in which they are thus but not so, in which
they are so but not thus, and in which they are
neither thus nor so, it is required to determine
the special quantitative relativity subsisting
between the thusness and the soness of the
things. - M.H. Doolittle, 1888
7Contingency table (Finley 1884)
8Scalar performance measures
- Some measures permit an optimal score to be
obtained by a simple strategy. - e.g., if one never declares an alarm, false alarm
rate will be optimized. - Some measures rely on explicit specification of
negative-alarms. - Otherwise, tester must infer negative alarms in
order to count correct negatives. - Introduces subjectivity
- Some measures consider a Type I error to have
equal importance of Type II error. - e.g., Critical Success Index combines false
alarms and misses. - Is this desirable?
- Considered alone, none of these measures seem
ideal.
Joliffe and Stephenson 2003
9Receiver Operating Characteristic (ROC) a
splitters definition
- False alarm rate b/(bd)
- Hit rate a/(ac)
- One set of alarms corresponds to a single point
on ROC. - Area under curve (AUC) is a common performance
measure. - All hits are considered equally good, all false
alarms are considered equally bad.
10What is the prior probability of target
earthquake within a given alarm space-time region?
- Standard estimate is given by Poisson
- Here, r is average target earthquake occurrence
rate within alarm region based on catalog of
maximal length and t is duration of the alarm
(Jackson 1996).
11Estimation by simulation (Jackson 1996)
- Create random catalogs and count successful
alarms (hits and correct negatives) - Construct frequency distribution of successes and
sum those with as many or more successes. - Each success counts equally, regardless of prior
probability. - What abt misses?
- How do we determine the prior probability for a
missed event?
Repeat many times
12Construct a simple model
- If a simple strategy can produce an alarm set
that obtains as many hits as a more complex
strategy alarm set, the effectiveness of the
complex model is questionable. - E.g., VAN analyses (in particular, Kagan 1996)
13Molchan diagram
- Miss rate c/(ac)
- Fraction of alarm space-time is the space-time
occupied by alarms divided by the total amount of
space-time during experiment. - Measure of space used to compute t is important.
Options - Map area
- Seismic intensity- weighted area
1
1
1
tmap1/4
tint1/2
(after Molchan and Kagan 1992)
14t increases, n decreases
15- Alarm sets that are significantly better than
random will yield points below the confidence
steps (outside the shaded regions). These
intervals are independent of the measure of space.
16Retrospective testing CA portion of national
seismic hazard map
- Define target eqks (Mgt5 in ANSS catalog)
- Define study region (lat 32 to 38.3, lon -123 to
-115) and grid spacing (0.1x 0.1) - Define experiment duration (2000-2009 inclusive)
- Details
- 17 target events occurred in the study region
between 2000 and 15 May 2006. - Earthquake rate predictions per each box obtained
from Frankel 2002 codes, converted to
probabilities using Poisson assumption.
17USING MAP AREA FOR t No surprise NSHMP is
significantly better at predicting earthquakes
than someone throwing darts at a map of
California.
18USING INTENSITY-WEIGHED AREA FOR t NSHMP does not
regularly yield points outside the confidence
regions. What can we say?
19Introducing another level of abstraction
- To determine skill of crossing paths, we can
compute the area under the Molchan trajectory. - This describes predictive significance of all
alarms produced up to the point of interest,
rather than a single set of alarms.
20Properties of Molchan trajectory area measure
- Perfect prediction yields 0 area, anti-perfect
prediction yields unit area. - Expected value of area for unskilled predictions
is ½. - Preliminary simulations indicate that area
distribution for unskilled predictions quickly
approaches a normal distribution as N, the number
of target earthquakes, increases. - Preliminary results suggest an exact analytic
form for variance of area of unskilled
predictions as a function of N. - Together, these properties allow intuitive
hypothesis testing using my ASS - Area Skill Score (ASS) 1 Molchan trajectory
area measure.
21USING MAP AREA FOR t NSHMP does extremely well,
as we would expect.
22USING INTENSITY-WEIGHED AREA FOR t We cannot
reject the null hypothesis that the NSHMP
trajectory was obtained by using a prediction
method with no skill.
23Conclusions
- Deriving target earthquake alarms might provide a
common ground for comparing forecast/prediction
algorithms. - There are a number of tools and performance
measures available for testing alarm-based
predictions. - The questions CSEP wants to answer should drive
selection development of testing procedures
performance measures rather than the other way
around. - We may need to tailor existing performance
measures and tools.