Title: 1Peter Bajcsy, 1Chulyun Kim, 2Jihua Wang and 2Yu-Feng Lin
1 A FRAMEWORK FOR GEOSPATIAL MODELING FROM SPARSE
FIELD MEASUREMENTS USING IMAGE PROCESSING AND
MACHINE LEARNING
- 1Peter Bajcsy, 1Chulyun Kim, 2Jihua Wang and
2Yu-Feng Lin - 1National Center for Supercomputing Applications
(NCSA) - 2Illinois State Water Survey (ISWS)
- University of Illinois at Urbana-Champaign (UIUC)
2Outline
- Introduction
- Problems Addressed by Spatial Pattern To Learn
(SP2Learn) - SP2Learn Architecture and Functionality Overview
- Running SP2Learn
- Summary
3Introduction
4General Problem
- Compute a set of geo-spatially dense accurate
predictions of variables - given a set of direct geo-spatially sparse point
measurements and - auxiliary variables with implicit relationships
with respect to the predicted variable - Motivation
- minimize cost of taking direct point measurements
- maximize accuracy of predictions and
- automate discovering relationships among direct
field measurements and indirect variables
5Formulation
- Input sets of geo-spatially sparse variables
Vipij dense auxiliary variables a priori
tacit knowledge of experts - Output geo-spatially dense (raster) Ok
- Unknown selection of methods workflow of
operations/methods parameters of methods
relationships of auxiliary variables w.r.t Ok
quantitative metric of output goodness
p2j
Interpolations Mathematical models
p1j
V1 V2
O1
Auxiliary Variables Tacit Knowledge
6Applied Problem
Recharge and Discharge Rate Prediction
Bedrock elevation
Discharged
Recharged
Water table elevation
7Interdisciplinary Objectives
- Ground Water (Hydrologic Science) View
- Evaluation of Alternative Conceptual (implicit
relationships) and Mathematical Models (explicit
relationships) - Accurate Prediction of Groundwater Recharge and
Discharge Rates from Limited Number of Field
Measurements - Computer Science View
- Computer-Assisted Learning to Assess Alternative
Conceptual and Mathematical Models - Optimization of Prediction Models From a Set of
Geo-Spatially Sparse Point Measurements
DIALOG
8State-of-the-Art Results
- Limited Spatial Resolution and Accuracy
9Existing Software for Groundwater and Surface
Water Modeling
- MODFLOW is a three-dimensional finite-difference
ground-water model - http//water.usgs.gov/nrp/gwsoftware/modflow2005/m
odflow2005.html - freeware (2005) - PEST - is software for model calibration,
parameter estimation and predictive uncertainty
analysis - http//www.sspa.com/pest/ - freeware (2007)
University of Queensland, Australia - Precipitation-Runoff Modeling System (PRMS) is
deterministic, distributed-parameter modeling
system developed to evaluate the impacts of
various combinations of precipitation, climate,
and land use on streamflow, sediment yields, and
general basin hydrology - http//water.usgs.gov/software/prms.html -
freeware (1996) USGS - Deep Percolation Model (DPM) - facilitates
estimation of ground-water recharge under a large
range in climatic, landscape, and land-use and
land-cover conditions - http//pubs.usgs.gov/sir/2006/5318/ USGS
10Related Work
- Singh A. et al. Expert-Driven Perceptive
Models for Reducing User Fatigue in an
Interactive Hydrologic Model Calibration
Framework
Conductivity (K) and Hydraulic heads (H) for the
hypothetical aquifer
11Motivation
- Ground Water (Hydrologic) Science
- Currently, there is no single method that could
estimate R/D rates and patterns for all practical
applications. - Therefore, cross analyzing results from various
estimation methods and related field information
is likely to be superior than using only a single
estimation method. - Computer Science
- It is currently impossible
- (a) to replace an expert with a lot of tacit
domain knowledge by computer algorithms or - (b) to learn by an expert new I/O relationships
from a plethora of possible variables and an
extremely large space of processing methods and
their parameters - Thus, assisting experts to discover, evaluate
and validate new relationships in an iterative
way will likely enable - (a) better understanding of the underlying
phenomena, and - (b) more automated and cost-efficient predictions
12Problems Addressed by Spatial Pattern To Learn
13Our Approach
- Data-Driven Analyses to Test Alternative Models,
and to Search the Space of Processing Operations
and Their Parameters - Interpolation methods
- Mathematical models
- Image processing algorithms
- Machine learning algorithms
- Scalability of algorithms with large size data
- Computer-Assisted Comparisons and Evaluations of
Multiple Models and Sub-Optimal Solutions - Model/Solution Representation
- Closed Loop (Iterative) Workflows
- Human Computer Interfaces
- Overall Approach An Exploration Framework for a
Class of Alternative Models/Hypotheses and
Optimal Solutions
14SP2Learn Problem Formulation
- Given a set of geo-spatially sparse field
measurements and auxiliary variables, derive
accurate, spatially dense, R/D rate map by - (a) using physics-based model
- (b) incorporating boundary conditions and
- (c) exploring auxiliary variables representing
prior knowledge about R/D patterns but missing in
the physics-based model
15Challenges
- (1) How to Recognize Meaningful Pattern of
Predicted Map? - (2) How to Quantify the Goodness of the Pattern?
- Approach
- (1a) Recognize patterns by utilizing multiple
image enhancement and segmentation techniques
applied to R/D rate predictions - (1b) Introduce relationship between R/D pattern
and auxiliary (a priori reference) information - (2a) Define goodness w.r.t. reference information
using experts selection of meaningful
relationships - (2b) Define goodness w.r.t. reference information
using complexity of machine learning
16Using Physics-Based Model
R/D Rate Prediction
Field Measurements
Discharged
Recharged
Water table elevation
Hydraulic conductivity
Incoming water
Outgoing water
Bed rock elevation
Ground water fluxhydraulic conductivity cell
area gradient of water table elevation (head)
over cell distance
17Incorporating Spatial Boundary Conditions
- BC R/D rate prediction could have smooth
transitions and recharge discharge regions
(contiguous pixels) should be clearly delineated - Approach Apply Image Restoration and De-noising
Techniques - Moving average based low pass filter
- TVL (Total Variation regularized L1-norm
function) based filter - Morphological operation based filter
- Using multiple techniques multiple times
Discharged
Recharged
18Exploring Auxiliary Variables Driving R/D Patterns
Prior Tacit Knowledge about R/D and Auxiliary
Variables
- Proximity to River P(R or D area/River is
close)high
- Soil Type P(R or D area/SoilClay)low
- Slope P(R or D area/ slopehigh)low
moving average normalizationTVL
normalizationTVL
moving average
19From Auxiliary Variables To Knowledge and
Accurate R/D
Load Variables
Integrate Maps
Load R/D Map
Define ROI
Create Decision Tree
Apply Rules
20SP2Learn Output
- A set of rules that define relationships between
predicted (R/D rate) variable and auxiliary
variables - Modified (more accurate) predictions according to
the user selected rules defining relationships of
predicted and auxiliary variables - Sensitivity analysis results with respect to
- Methods (interpolations, image enhancement, )
- Models
- Parameters
21Example Results
ROI
- ltRULE ID138 NUM_OF_CASES3975 SUPPORT32.65gt
- ltIFgtElevation is not in 330-344 AND
- Soil type is in RmRoscommon muck AND
- Proximity to water body is not near_water AND
- Slope is in 0-0.9 lt/IFgt
- ltTHENgtR/D rate is -0.004,-0.002lt/THENgt
22SP2Learn Architecture and Functionality
23Underlying SP2Learn Technology
24SP2Learn Functionality Overview
Load Raster Step
Integration Step
Create Mask Step
Rules Step
Attribute Selection Step
Apply Rule Step
25SP2Learn Workflow
26On-Line Help
27Software and Test Data Download
- Download web page of Image Spatial Data Analysis
group at NCSA http//isda.ncsa.uiuc.edu/download/
28Running SP2Learn
29Input Data to SP2Learn
- Raster files (maps)
- Predicted R/D rate models
- Auxiliary variables
- For mask creation
- Tables with geo-points
- Vector files with boundaries
- Raster files of categorical or continuous
variables
30Image Processing
- Filtering Methods
- Low pass (moving average) filters
- Morphological filters
- TVL1 (Total Variation regularized L1 function)
- Using multiple techniques multiple times
- Parameters
- Kernel size (row dimension, column dimension)
31Example Input Maps
Morphological Opening
Morphological Closing
Low Pass Filter
Kernel (10,10)
Kernel (10,10)
Kernel (10,10)
Kernel (5,5)
Kernel (5,5)
Kernel (5,5)
32Example Auxiliary Maps
- Slope
- DEM
- Soil
- River Stream
33Loading Files
- Load R/D rate models (maps)
- Load auxiliary maps to explore alternative models
- Proximity to water
- Soil type
- Slope
34Mosaic Maps
- Large spatial coverage a set of tiles
- Out-of-core representation
35Viewing Images
- Right mouse click
- Image information
- Zoom
- Check boxes
- Pseudo-color
- Auto-fit images
36Registration
- Integration of all maps (raster images) to a
common projection and spatial resolution
Before Convert
After Convert
37Create Mask
C
A
Mask Parameters
Visualization Panel
B
Mask Operations
38Mask Creation Options in SP2Learn
39User Defined Mask Creation
- Set Parameter User defined
- Mouse click-and-drag selection of region
- Click Paint and Show
- Click Apply
40Label Editor
- Assign categorical labels to colors
41Attribute Selection
- Output Predicted Variable
- Input Auxiliary Variables
- Check-boxes
- Show Table
- Prune Tree
42Decision Tree Based Modeling
- Tree structure can be represented as a set of
rules
43Rules from Decision Tree
- Num Node number in a decision tree.
- Support() Among all cases satisfying
conditions, the ratio of cases having the same
class (conclusion). - of cases The number of cases satisfying
conditions - Class Conclusion of a rule
- Conditions Conditions of a rule
- MDL Score MDL score of a decision tree. The less
the score is, the better the tree is
44Show Decision Tree
Show Tree Option
45Export Rules
Export Rules Option
46Apply Rules
- Visualization of
- Modified output variable
- Changed pixels
- Magnitude of changes (differences)
47Summary
- Novel Frameworks and Methodologies for
Exploratory Data-Driven Modeling and Scientific
Discoveries - Problems addressed in the prototype SP2Learn
solution - Prediction accuracy improvement by a combination
of mathematical models and data-driven (knowledge
based) models, supervised and unsupervised
iterative model optimization - Better Data Utilization!
48Extra Information
- A stack of informatics and cyber-infrastructure
software is open source - Other software of potential interest
- GeoLearn is an exploratory framework for
extracting information and knowledge from remote
sensing imagery - CyberIntegrator to support creation of
exploratory workflows, reuse of workflows, remote
server execution, data and process provenance
tracking and analysis, streaming data support - Image Provenance to Learn (IP2Learn) to support
decision processes based on visual inspection of
images - Load Estimation (work in progress) to support
optimal sampling of sediment loads using several
sediment-discharge rating curves, bias correction
factors and Monte Carlo simulations to predict
confidence limits - Download web page of Image Spatial Data Analysis
group at NCSA http//isda.ncsa.uiuc.edu/download/
49Acknowledgement
- Funding Agencies
- NASA, NARA, NSF, NIH, NAVY, DARPA, ONR, NCSA
Industrial Partners, NCSA Internal, COM UIUC,
State of Illinois - Full Time Employees
- Peter Bajcsy, Rob Kooper, Sang-Chul Lee, Luigi
Marini - Students
- Shadi Ashnai, Melvin Casares, Miles Johnson,
Chulyun Kim, Qi Li, Tim Nee, Arlex Torres, Ryo
Kondo, Henrik Lomotan, James Rapp - Collaborators
- College of Applied Health Sciences UIUC,
Kinesiology Dept. UIUC, CEE UIUC, CS UIUC, GISLIS
UIUC - UIC, UC Berkeley, Univ. of Texas at Austin, Univ.
of Iowa - ISWS, NARA, Nielsen, State Farm
- Instituto Tecnológico de Costa Rica, UNESCO-IHE
Netherlands
50Thank you!
- Questions
- Peter Bajcsy pbajcsy_at_ncsa.uiuc.edu
- Need More Details
- Publications http//isda.ncsa.uiuc.edu
51Backup