Title: TRAINING SESSION ON HOMOGENISATION METHOD
1TRAINING SESSION ON HOMOGENISATION METHOD
Our approach to homogenisation
Bologna, 17th-18th May 2005
Maurizio Maugeri, University of Milan
2Within this context, in the year 2000, a research
program with the aim of better investigating the
impact of data quality and homogeneity issues on
the detection of Italian temperature and
precipitation trends in the last two centuries
was set up
Final goal revise and update the results
presented in Maugeri and Nanni (1998), Buffoni et
al. (1999) and Brunetti et al. (2000).
The program has been developed both within EU
project ALP-IMP and within National project
CLIMAGRI Climate and Agriculture, Ministry for
Agriculture and Forests.
3Principal steps of the program
- ? Data and metadata recovery
- ? Homogeneity testing and record adjusting
- ? Data Analysis
-
- ? Understanding local versus larger scale
4Why spend more time on data and metadata recovery?
Italy is well placed in the field of long term
records
- ? Invention of some of the principal
meteorological instruments - ? Introduction of the first synoptic network
- Six series beginning in the 18th century
Bologna, Milano, Roma, Padova, Palermo and Torino
So, over the last 3 centuries a huge amount of
data and metadata has been collected in Italian
data archives
5The importance of these data has been known for a
long time.
-
- Cantù V. and Narducci P. (1967) Lunghe serie di
osservazioni meteorologiche. Rivista di
Meteorologia Aeronautica, Anno XXVII, n. 2,
71-79. - Eredia F. (1908) Le precipitazioni atmosferiche
in Italia dal 1880 al 1905. In Annali
dell'Ufficio Centrale di Meteorologia. Serie II,
Vol. XXVII, anno 1905, Rome. - Eredia F. (1919) Osservazioni pluviometriche
raccolate a tutto l'anno 1915 dal R. Ufficio
Centrale di Meteorologia e Geodinamica. Ministero
dei Lavori Pubblici, Rome. - Eredia F. (1925) Osservazioni pluviometriche
raccolate nel quinquennio 1916-1920 dal R.
Ufficio Centrale di Meteorologia e Geodinamica.
Ministero dei Lavori Pubblici, Rome. - Mennella C. 1967. Il Clima d'Italia. Napoli
Fratelli Conti Editori, 724 pp. - Millosevich (1882) Sulla distribuzione della
pioggia in Italia. In Annali dell'Ufficio
Centrale di Meteorologia. Serie II, Vol. III,
anno 1881, Rome. - Millosevich (1885) Appendice alla memoria sulla
pioggia in Italia. In Annali dell'Ufficio
Centrale di Meteorologia. Serie II, Vol. V, anno
1883, Rome. - Narducci, P., 1991 Bibliografia Climatologica
Italiana, Consiglio Nazionale dei Geometri, Roma.
6 but until a few years ago only a small amount
of the data was available in digital format
and no attempts were made to collect the metadata
Adapted from Anzaldi C., Mirri L. and Trevisan
V., 1980 Archivio Storico delle osservazioni
meteorologiche, Pubblicazione CNR AQ/5/27, Roma.
7Data and metadata collection air temperature
8Data and metadata collection air temperature
9Data and metadata collection precipitation
10Data and metadata collection precipitation
11Data and metadata collection other variables
the activities are still in progress (EU
project ALP-IMP) They concern air pressure,
cloud cover, humidity and snow
HUMIDITY (i.e. dry / wet temperatures). Daily
data - 2 records
AIR PRESSURE (secular records)
CLOUD COVER (secular records)
SNOW (HS snow at ground HN fresh snow) daily
/ monthly data About 15 records of northern Italy
1951-2004 PERIOD All variables available in
digital format Italian Air Force data-set.
___________________________________
The role of national and international
projects CLIMAGRI (MiPAF), ALP-IMP (EU), COFIN e
FIRB (MIUR)
12Data and metadata collection metadata
- Metadata collection was performed with two main
objectives - to understand the evolution of the Italian
meteorological network - to reconstruct the history of all the stations
of the data-set.
The research on the history of the single
stations was performed both by analysing a large
amount of grey literature and by means of the
UCEA archive. All information was summarized in
a card for each data series. Each card is
divided into three parts. In the first part all
the information obtained from the literature is
reported. In the second part there are abstracts
from the epistolary correspondence between the
stations and the Central Office. In the third
part the sources of the data used to construct
the record are summarized.
For full details see CLIMAGRI project WEB site
(www.climagri.it)
13Metadata for every station
- Abstracts of all published papers (grey
literature) - Abstracts of the correspondence between the
observatories and the Central Office - Position
- Data sources
- Data availability
- Other notes
For more details see CLIMAGRI project WEB site
14 1) Make a synthesis of the metadata and study
the impact of possible changes2) Perform an
initial homogenisation by means of direct
methodologies3) Perform a final homogenisation
by means of indirect methodologies
Homogenisation principal steps
We developed a method consisting in
15Metadata for every station
Maugeri, M., Buffoni, L., Chlistovsky, F., 2002
Daily Milan temperature and pressure series
(1763-1998) history of the observations and data
and metadata recovery, Climatic Change, 53,
101-117.
16Corrections by means of metadata an example
Maugeri, M., Buffoni, L., Delmonte, B., Fassina,
A., 2002 Daily Milan temperature and pressure
series (1763-1998) completing and homogenising
the data, Climatic Change, 53, 119-149.
Corrections applied to Milan daily air pressure
data to eliminate the bias introduced by
calculating daily means using observations taken
at A 8 a.m., 2 p.m. and 7. p.m. and B sunrise
and mid-afternoon. The corrections A apply to the
period December 1st, 1932 - December 31st, 1987,
corrections B to 1763-1834.
17Homogenisation by means of indirect methods
- The indirect methods make use of meteorological
data from neighbouring stations. - Formally, data of a given series can be
represented as a sum of more terms. Be X(t) the
meteorological variables value X at the time t.
Therefore it can be written -
- X(t) N A(t) IH(t) (t 1, 2,..., n) (1)
-
- where N is Xs normal value (it is defined by
considering the mean value over a suitable time
interval like, for example, the period
1961-1990), A(t) is the anomaly related to the
instant t (it defines the departure of the
variable X from its normal value) and IH(t) is
the possible inhomogeneity lying in the measured
value X(t) (in the simplest case, IH(t) is a step
function that equals to 0 until the
inhomogeneity-inducing event takes place, and
then that equals to a constant value which
represents the effect of the inhomogeneity in
fact). - By using an analogous notation, a reference
series which is constituted, for example, by the
data of a neighbouring station can be written as
follows -
18Homogenisation by means of indirect methods
-
- R(t) N A(t) IH(t) (t 1, 2,...,
n) (2) -
- If the two series belong to the same climatic
area, it can be assumed that A(t) A(t) for
each value of t. Moreover, if you postulate the
reference series as homogeneous, it will be
always true that IH(t) 0. - Therefore, the series of the differences will be
-
- Z(t) X(t) - R(t) (N - N) IH(t) (t 1,
2,..., n) (3) -
- In other terms it can be assumed that, unless
there are possible inhomogeneities, the series of
the differences must result as constant. The same
approach is followed for the series of the
ratios. The latter approach is particularly used
for precipitation series. Possible deviations
from Z(t) constant path are therefore assumed as
being due to inhomogeneities.
19Homogenisation by means of indirect methods
-
- The application of indirect methodologies is
actually much more complicated than what the
previous relations seem to suggest. - In fact, whenever in a relation like the (3)
there is a signal which is characterised by one
or more steps, it is usually very hard to
understand whether it is due to the station under
exam or to the station used as a reference. Then,
if you consider not too short periods, it can
also happen that both the stations present some
significant inhomogeneities and that there are
several step-shaped signals. - So, the question of the identification of a
reference series - is actually very problematic
20How do we select the reference series?
A procedure that rejects the a priori existence
of homogeneous reference series is used. Each
series is tested against each other series in
subgroups of 10 series. Subsequently, the break
signals of one series against all others are
collected in a decision matrix and the breaks are
assigned to the single series according to
metadata and/or to probability.
21How do we compare the test and the reference
series?
- The comparison between a test series and a
reference series can be performed by a number of
different mathematical techniques. - We use of them
- the Craddock homogeneity test
22Homogenisation the Craddock statistical test
- One among the most commonly used statistical
tests is the Craddock test. At first it was
developed for analysing the precipitation series
and subsequently it has been widely updated,
improved and extended to thermometric records. It
accumulates the normalized differences between
two series (a and b) according to one of the
following formulas -
- where the mean values of the series are
calculated over the entire period in which the
comparison is performed and where the choice of
the proper formula depends on the underlying
hypothesis, such as on considering as a constant
the difference either the ratio between stations
of the same area.
23Homogenisation the Craddock statistical test
See also the example presented on the
craddock.xls Excel File.
24In order to display the ability of the Craddock
homogeneity test to identify some typical
inhomogeneities, we have made use of records
generated by means of random numbers. In
particular, we have generated some records with
the features of Milan yearly mean temperature and
yearly total precipitation.
TEMPERATURE Series length 240 data Average 13.3
C St. Dev. 0.9 C
PRECIPITATION Series length 240 data Average
1015 mm St. Dev. 202 mm
Then we have applied the Craddock test to A) some
pairs of completely random temperature/precipitati
on records and B) some pairs of records obtained
partially from random series and in part from the
series to test itself (i.e. we introduce a 0.7
correlation between the pair of series to subject
to the Craddock homogeneity test). Then we have
added to the series some typical errors as step
functions, trends,
All results are displayed in the Excel files
Craddock_TMED_12 and Craddock_PREC_12.
25Homogenisation statistical test and metadata
News about a damage to the pluviometer. In
corrispondence with repairing the damage, the
cause of the underestimation of precipitation has
been removed for the period 1900-1928
Craddock test - Bologna precipitation record
Allinizio del 1857 a questo pluviometro,
ridotto in cattivo stato pel lungo uso, ne venne
sostituito un altro di migliore costruzione, e
lavorato con molta precisione...
Change in data origin from Osservatorio
Astronomico to Istituto Idrografico
Introduction of a new pluviometer (Fuess
recorder) ... fu collocato a cura del prof
Bernardo Dessau nel periodo 1900-1903 ...
26- Basic problem what has to be corrected?
- a) All the periods found by statistical methods
- b) Only the periods for which there is evidence
in metadata
The problem is, in part, still open
Our methodology Wide use of statistical methods
(especially for air temperature) Critical
analysis in the light of metadata
The CLIMAGRI project