Title: INTELLIGENT DATA REDUCTION ALGORITHMS FOR REAL-TIME DATA ASSIMILATION
1INTELLIGENT DATA REDUCTION ALGORITHMS FOR
REAL-TIME DATA ASSIMILATION Xiang Li, Rahul
Ramachandran, Sara Graves ITSC/University of
Alabama in Huntsville Bradley Zavodsky
ESSC/University of Alabama in Huntsville Steven
Lazarus, Mike Splitt, Mike Lueken Florida
Institute of Technology May 5, 2009
2Data Reduction
- It is a common practice to remove a portion of or
combine high spatial and temporal resolution
observations to reduce data volume in DA process,
due to - High computation resources required for large
volume data set (exponential increase with data
volume) - Data redundancy in large volume high resolution
observations - Local spatial correlation of satellite data
- observation data resolution exceeds assimilation
grid resolution - Reducing data redundancy may improve analysis
quality (Purser et al., 2000) -
3Computational Resources Required for Data
Assimilation
Lot
Computational Resources
Little
Analysis Technique
Successive Corrections
Statistical Interpolation
4D-Var
3D-Var
Lot
Data Volume
Little
Horizontal Resolution
1km
80 km
4Need for new Data Reduction Techniques
- Current data thinning approaches
- Sub-sampling
- Random Sampling
- Super-Obing (subsampling with averaging)
- Limitations
- All data points are treated equally
- Information contents that observation data
contain and their contributions to data analysis
performance may be different - Intelligent Data Thinning Algorithms
- Reduces number of data points required for an
analysis - Maintains fidelity of the analysis (keeps the
most important data points)
5Example
Simple subsampling strategies can be susceptible
to impact from missing significant data sample.
High Data Volume from satellite platforms ( e.g.
infrared based SST, scatterometer winds) carry
redundant data. Computationally Expensive!
Same data subsampling interval, but shifted.
Analyses derived from simple subsampling of data
can be inconsistent and are not optimal in
efficiency.
6Intelligent data thinning algorithms
- Objective reserve samples in the thinned data
set that have high information content and large
impact on analysis. - Assumption samples with high local variances
contain high information content - Approach Use synthetic test to determine and
validate the optimal thinning strategy and then
apply to real satellite observations - Synthetic Data Test Truncated Gaussian
- Real Data Experiment Atmospheric Infrared
Sounder (AIRS) profiles
7Synthetic Data Test Truncated Gaussian
- Explicitly defined truth and background fields
- Direct thinning method
- 35 observations sampled to find the 5
observations yielding the best analysis (1D
variational approach) - 325,000 unique spatial combinations
- First guess base of Gaussian function
- Observations created by adding white noise to
truth
optimal observation locations
truth
analysis
first guess
8Synthetic Data Test Truncated Gaussian (cntd)
- Optimal observation configuration retains data
at the - peak
- gradient
- anchor points (where gradient changes most
sharply) - Dependent on key elements of the analysis
itself - length scale (L)
- quality of background and observations
Lesson Learned Thinned data samples should
combine homogeneous points, gradient points, and
anchor points for optimal performance, and a
dynamic length scale should be applied to each
thinned data set.
9Intelligent Data Reduction Algorithms
- Earlier versions of intelligent data thinning
algorithms (IDT, DADT, mDADT) - Density-Balanced Data Thinning (DBDT)
- Three metrics are calculated for data samples and
samples are put into priority queues for the
three metrics - Thermal Front Parameter (TFP) High value of TFP
indicates rapid change of temperature gradient
and anchor samples - Local Variance (LV) high values indicate
gradient regions - Homogeneity low values indicate homogeneous
regions - Data selected from the three metrics user
determines the portions of samples from these
metrics - Radius of impact (R) used to control uniform
spatial distribution of thinned data set.
Distance between any two samples needs to be
larger than R - Data selection process select top qualified
samples from priority queues. Start with TFP
queue, followed by LV queue and homogeneity queue - DBDT algorithm performs best in these thinning
algorithms
10AIRS ADAS Our Real-World Testing Ground
- Atmospheric Infrared Sounder (AIRS)
- NASA hyperspectral sounder
- generates temperature and moisture profiles
with 50-km resolution at nadir - each profile contains a pressure level above
which quality data are found - ARPS Data Assimilation System (ADAS)
- version 5.2.5 Bratseth scheme
- background comes from a short-term Weather
Research and Forecasting (WRF) model forecast - error covariances
- background standard short-term forecast
errors cited in ADAS - observation from Tobin et al. (2006) AIRS
validation study - dynamic length scale (L) calculated from
average distance of nearest observation neighbors - D. C. Tobin, H. E. Revercomb, R. O. Knuteson, B.
M. Lesht, L. L. Strow, S. E. Hannon, W. F. Feltz,
L. A. Moy, E. J. Fetzer, and T. S. Cress, ARM
site atmospheric state best estimates for AIRS
temperature and water vapor retrieval
validation, J. Geophys. Res., D09S14, pp. 1-18,
2006.
11Thinning Strategies (11 of full)
- Subsample
- Takes profile with most retrieved levels within
a 3x3 box - Random
- Searches observations and ensures that retained
observations are thinned to a user-defined
distance - 10 permutations performed to create an ensemble
- DBDT
- thins on 2-D pressure levels using equivalent
potential temperature then levels are recombined
to form 3-D structure - Thinning uses Equivalent Potential Temperature
(?e) to account for both temperature and moisture
profiles
12Case Study Day 12 March 2005
- 700 hPa temperature gradient in observations
and background over midwest and northern Gulf of
Mexico - Observations and background show similar
patterns
13700 hPa Temperature Analysis Comparison
- Overall analysis increments are 1.5oC over
AIRS swath - Largest differences between analyses in upper
midwest and over Southern Canada
Subsample
Random
DBDT
14Quantitative Results (Full vs. Thinned)
Full Subsample Random DBDT
OBS 793 99 100 87
ALYS TIME (s) 244 56 56 106
L (km) 80 146 147 152
?e MSE N/A 0.60 0.56 0.36
- Computation times are 50-70 faster for the
thinned data sets - MSEs compare analyses between full and each
thinned - DBDT is superior analysis with least
observations - has a longer computation time (thinning
algorithm more rigorous) - cuts MSE almost in half with 1/10 the
observations of the full
15Conclusions
- Intelligent data thinning strategies are
important to eliminate redundant observations
that may hinder convergence of DA schemes and
reduce computation times - Synthetic data tests have shown that
observations must be retained in gradient,
anchor, and homogeneous regions and that results
are dependent on key elements of the analysis
system - Analyses of AIRS thermodynamic profiles using
different thinning strategies yields the DBDT as
the superior thinning technique
16Future Work
- Manuscript in review with Weather and
Forecasting (AMS) - Testing forecasts spawned from the various
thinned analyses to see if superior DBDT analysis
produces the best forecasts - Demonstration of algorithm capabilities with
respect to real-time data dissemination - Use of gradient detecting portion of algorithm
for applications in locating cloud edges for
radiance assimilation
17Thank you for your attention. Are there any
questions?