Title: Geographical Information System (GIS) to Knowledge
1Geographical Information System (GIS) to
Knowledge
Peter Bajcsy, Ph.D. Research Scientist, ALG,
NCSA Adjunct Assistant Professor, CS and ECE
Departments, UIUC
2Outline
- Introduction to GIS Related Decision Making
- Decision Making Scenario
- Input Information Extraction and Representation
- Georeferencing, Registration and Raster
Information Extraction - Boundary Aggregation and Evaluation
- Error Evaluation of New Boundary Aggregations and
Decision Making - Other Interesting Problems
- Summary
3Acknowledgement
- Project Team Members Peter Bajcsy, Peter Groves,
Sunayana Saha, Tyler Alumbaugh, Sang-Chul Lee - Support Michael Welge, Loretta Auvil, Dora Cai,
Tom Redman, David Clutter, Duane Searsmith, Lisa
Gatzke, Andrew Shirk, Ruth Aydt, Greg Pape, David
Tcheng, Chris Navaro, Marquita Miller.
4GIS Decision Making
- Decision Making Driven by applications
- GIS Application Examples
- Urban planning Where should we build new fire
stations? Hospitals? Schools? - Agriculture Should we apply pesticides ? How
much per acre? - Forestry Should we cut trees ? How many ?
- Wildlife What should we do to help endangered
species ? - Watershed/Catchment management Would we have an
surface erosion problem after cutting trees? - Archeology Where should we excavate to find
prehistoric sites? - Geology and mining Is is safe to build houses in
the location (lat/long)?
5Decision Making Motivation/Process/Goal
- Motivation Seek Optimal Solutions
- How?
- Evaluate Multiple Solutions
- Assess Quality and Significance of Input
Variables - Predict Output Variables
- Impose Application Specific Constraints
- Construct Optimality Metric
- Goal To Make the Most Optimal Decision
- Planning Decision
- Management Decision
6GIS Applications Natural Resources
- Agriculture land conservation, market analysis,
farm planning - Forestry timber assessment and management,
harvest scheduling and planning, environmental
impact assessment, pest management - Wildlife habitat assessment and management,
rare species studies - Catchment management runoffs and erosion
modeling, sedimentation and water quality
studies, integrated catchment management - Archeology prediction of prehistoric sites,
site vandalism studies - Geology and mining geologic hazard mapping,
oil, gas and mineral studies, open pit mine
design - Watershed management - guidelines for regulation
7GIS Applications Urban Resources
- Spatial Distribution of
- Utilities
- Hospitals
- Schools
- Fire Stations
- Management of Storm Water
- Crime Analysis
- Waste Collection Routing
- Hazardous Waste Transportation
- Disease Outbreak Patterns
8Feature Analysis Decision Making Scenario
- Problem Formulation
- Analyze FBI Crime Reports and Other Features per
County, Zip Code, US Census Bureau Track or
Blocks, and Support Decision Making Process about
Crime Prevention - Features
- FBI Crime Reports, Various Boundary Definitions,
Forest Cover Maps, DEM, Satellite and Aerial
Data, Prison Point Information
9Problem Statement
- Problem Statement search for the best partition
of any geographical area that is - (a) based on raster or point information,
- (b) formed by aggregations of known boundaries,
- (c) constrained or unconstrained by spatial
locations of know boundaries and - (d) minimizing an error metric.
- Raster or Point Information
- Grid-based information, e.g., from satellite or
air-borne sensors - Geographical point information, e.g., from GPS or
address data base - Boundaries (Vector Data)
- Man-made, e.g., Counties, US Census Bureau
Territories - Defined by environmental characteristics, e.g.,
Eco-regions, Historical iso-contours - Spatial Constraints and Error Metric
- Defined by applications
10Proposed Approach
Feature Analysis and Decision Making
Data Preprocessing
11Input Information Extraction and Representation
2D or 3D
12Input Information Extraction and Representation
13Data Types and Representation Examples
- Raster Information GeoImage Object
- Boundary Information Shape Object
- Tabular Information Table Object
- Neighborhood Information NBH Object
14Raster Information File Formats
- USGS Digital Elevation Data (DEM) Files
- Header file with georeferencing information
- Floating point values, 30 m spatial resolution,
IL coverage, published in 2002 - TIFF Files
- Georeferencing information from
- One or more standardized files are distributed
along with TIFF image data as .tfw and/or .txt
files. - The metadata is encoded in the image file using
private TIFF tags. - An extension of the TIFF format called GeoTIFF is
used. - Forest labels, 1km spatial resolution,
- Forest Cover Types 29 labels, USA coverage,
published in 2000 - Forest Fragmentation Index Map of North America,
8 labels, USA coverage, published in 1993 - Land use labels, 1km spatial resolution, world
wide coverage, published in 2001
15Boundary Information File Formats
- Computational Tradeoffs Between Vector
Information Retrieval and Data Storage - US Census Bureau TIGER Files
- Elaboration of the chain file structure (CFS)
- Used record files 1, 2, I, S, P
- Environmental Systems Research Institute (ESRI)
Shapefiles - Location list data structure (LLS)
- shp, shx, dbf files
- TIGER to ESRI Shapefiles
16Point Information File Formats
- FBI Crime Reports
- United States Crimes Database, years 94-98, USA
states, reports per county, published in 2001 - United States Crimes Database, years 98-00, IL
state, reports per county, published in 2002 - Entries
- Theme_Keyword crime, arrests, murder, forcible
rape, rape, robbery, aggravated assault, assault,
burglary, larceny, motor vehicle theft, theft,
arson - Challenges
- Multiple Files
- Varying notation
- Association with geographical boundary
information
17Data Size
- Data size driven
- operations
- Sub-setting
- Sub-sampling
- Cropping
- Zooming
18Formation of Vector Data
- Iso-contour extraction from historical maps
- Segmentation and clustering of raster data into
homogeneous regions
19(No Transcript)
20Georeferencing Data Sets and Raster Information
Extraction
21Georeferencing Data Sets and Raster Information
Extraction
22Why Registration Georefencing ?
23(No Transcript)
24Georeferencing Based on Data Types
- Raster and Raster
- Vector and Vector
- Raster and Vector
25Georeferencing Based on Coordinate Systems
26Image Registration Without Georeferencing
Information
- Registration the act of correct alignment
- Registration Steps
- Determine locations of salient features in
multiple data sets (spatial correspondence) - Select registration transformation (geometric
transformation) - Evaluate accuracy with a metric
- Is my registration correct?
27Image Registration Computation
- Extract features from multiple images
- select feature space
- Search for a finite set of parameters for a
transformation function (also called deformation,
warping function) - Select transformation function
- Select similarity metric
- Choose a search technique which can reduce the
computational cost search strategy
28(No Transcript)
29Raster Information Extraction Continuous Variable
Elevation Statistics Per County
Standard Deviation
Sample Mean
Skew
Kurtosis
30Raster Information Extraction Categorical
Variable
Frequency of Occurrence
31Feature Driven Boundary Aggregation and Evaluation
32Feature Driven Boundary Aggregation and Evaluation
33Aggregation of Territories
- Functionality Aggregate territories based on
similarity of attributes - Example if auto theft in County 1 is similar to
auto theft in County 2 then aggregate County 1
and 2 - Aggregation Constraints
- Segmentation aggregation with neighborhood
constraint - Clustering aggregation without neighborhood
constraint - Hierarchy of results
- Desired Number of Aggregations
- Maximum Internal Dissimilarity of Aggregations
- Number of Layers in Hierarchy
- Resulting Labels Saved in ESRI Shape Files
34Territorial Aggregation Input Parameters
- Tools Main Frame Menu/ImLabels/SegGeoPts and
ClustGeoPts - Input Parameters
- "input .shp or .shx filename ESRI shape files
including .nbr file. - select features Any combination of numerical
features in .dbf file. - select weights Any numerical feature for
weighting territories during aggregation.
Output Aggregations Results
Aggregations Results in dBASE (.dbf) File
IL Counties
Statistics
35Spatially Unconstrained Boundary Aggregation
- Hierarchical clustering of crime data with the
exit criterion being the number of clusters and
the clustered feature being auto theft in 2000
leads to six aggregations.
Tabular Display
Geographical Display
Boundaries
Boundary Aggregations
36Spatially Constrained Boundary Aggregation
- Hierarchical segmentation and hierarchical
clustering of oak hickory feature with the exit
criterion of 18 numbers of county aggregations
With Spatial Constraint
Without Spatial Constraint
Boundaries
Boundary Aggregations
37Boundary Aggregation With Hierarchical Output
- Hierarchical segmentation of extracted forest
statistics (oak hickory occurrence) with two
output partitions.
43 aggregations
21 aggregations
Boundaries
Boundary Aggregations
38Visualization of Aggregation Results
- Functionality Visualization of Labels and
Hierarchy of Labels - Data to Visualize Labels in dBASE (.dbf) file,
Hierarchy of Labels in dBASE (.dbf) file - Input Parameters None (ShowResults and
ShowGeoResults)
IL Counties
39Error Evaluations of New Territorial Partitions
- Error evaluation of partitions obtained by
clustering and segmentation of mean elevation
feature per Illinois county with Variance error
metric
40Territorial Partition Error Evaluation
- Error Metrics
- Variance
- Normalized Variance
- City Block
- Normalized City Block
41Geographical Error Evaluations and Decision Making
- Geographical error evaluation of partitions
obtained by clustering and segmentation of mean
elevation feature per Illinois county with
Variance error metric
Partition Index
Eval0
Eval1
Eval2
Eval3
42(No Transcript)
43Example of Feature Analysis
- Crime Data Over Multiple Boundary Definitions
- Evaluation Parameters
- Features Auto Theft, years 1999, 2002
- Error Evaluation using Variance
- Number of aggregations 50
- Spatial Aspect
- Boundary type
- Spatial constraints on aggregations (contiguity
?) - Optimal number of aggregations
- Temporal Aspect
- Temporal changes of crime report features
- Optimality Metric
- Feature weighting
- Cost/evaluation function
44Example Results Auto Theft
Clustering00
Clustering Error
Clustering99
Segmentation99
Segmentation00
Segmentation Error
45Example Results Auto Theft
- Crime data evaluations
- AutoTheft 00, 99
- Seg
- Evaluation Results
- Evaluation Metric Variance
- Number of Evaluations 2
- Eval 0 Seg_0
- Number of Aggregations54
- Error5051.924242424242
- Eval 1 Seg_0
- Number of Aggregations54
- Error420688.6572463768
- Crime data evaluations
- AutoTheft 00, 99
- Clust
- Evaluation Results
- Evaluation Metric Variance
- Number of Evaluations 2
- Eval 0 Clust_0
- Number of Aggregations37
- Error5575.483333333334
- Eval 1 Clust_0
- Number of Aggregations36
- Error4387.848695652174
46Decision Making
- Which global partition minimizes a chosen error
metric? - Which partition minimizes a chosen error metric
at a selected boundary definition? - What is the geographical error distribution given
a territorial partition?
47Additional Interesting Problems
- Feature Selection
- High-Dimensional Data Visualization
- Data Fusion
48Feature SelectionProblem Formulation
- General Formulation
- Given a set of candidate features, select the
best subset in a classification problem. - Specific Formulation with the Focus on
Multispectral and Hyperspectral Image Analysis - Given a set of spectral bands and ground
measurements, select the best subset of bands
with the best predictive accuracy for the ground
measurements
49Hyperspectral Band Selection
- Spectral band selection problem with ground
measurements - Soil conductivity continuous variable
- Grass label categorical variable
- Input Data sets
- Regional Data Assembly Centers Sensor (RDACS),
model hyperspectral (H-3), which is a 120-channel
prism-grading, push-broom sensor developed by
NASA Plus ground soil conductivity from
Agriculture department, UIUC - AVIRIS Sensor developed by NASA Plus Gramma grass
labels from UCSD
50Input Hyperspectral Data
RDACS Sensor Data
Ground Measurements e.g., Soil Conductivity
51Input Hyperspectral Data
AVIRIS Sensor
White spatial location of valid labels grass
categories
Three bands
52Hyperspectral Band Selection
- Objectives
- What is the optimal number of selected bands?
- Which bands should form the optimal subset?
- What methods should be used for band selection?
- Challenges
- Computational complexity
- combinations of bands with each supervised
method, where nb is the number of bands - No free lunch (NFL) theorem
53Proposed Approach
- Use multiple unsupervised methods to rank the
spectral bands - Test those band choices with multiple supervised
methods (Wrapper Method) - Select those band combinations that minimize
classification error given a particular set of
ground measurements - Select that pair of unsupervised and supervised
methods that minimize classification error
54Cross Validation
- How to measure accuracy?
- Testing on same data set as trained on is
cheating. - Overfitting model becomes lookup table, cannot
deal with new examples - Use n-fold cross-validation
- Split data set into n subsets
- Train n models, with one subset held out for
each - Find average error rate over the holdout sets
with predictions made by the appropriate model.
55Analysis of Continuous Variable Soil Conductivity
Unsupervised Ranking
56Analysis of Continuous Variable Soil Conductivity
- Linear Regression
- Prediction is a linear combination of the inputs
(Xe) - Learning becomes finding the coefficients (Find
that satisfies g(T)).
57Analysis of Continuous Variable Soil Conductivity
Supervised Linear Regression Based Classification
58(No Transcript)
59Data Fusion of GIS Data
- Map Mosaicking Challenges Heterogeneous Data
Sets - Multiple Data Sets with Different
- Data Types
- BYTE, SHORT, INT, LONG, FLOAT, DOUBLE
- Spatial Resolutions
- HORIZONTAL AND VERTICAL RESOLUTIONS
- Spectral Resolutions
- NUMBER OF BANDS
- OVERLAPPING ALIGNED OR MISALIGNED WAVELENGTHS
- Geographic Projections
- DATUMS (3D GLOBE MODEL)
- 2D PROJECTIONS
60Map Mosaicking Task
- Task Automatic Data Fusion that Resolves
Dissimilar Projections, Spatial Resolutions, Data
Types, and Number of Bands - Example of a Simplified Task
- Puzzle-Piece Problem
- Identical projections, resolutions, data types,
number of bands - Different geographic locations
61ArcMap vs I2K Reprojection Results
- Tentative Results
- Resampling Method Highly Accurate
- On-the-fly projection not quite as good
- I2K in between
62(ArcMap on-the-fly reprojection shown)
63Table of Results
64Summary
- Applications of GIS tools
- Remote Sensing
- Agriculture
- Hydrology
- Water Quality Survey
- Atmospheric Science
- Military
- Socio-Economics
- Interested ? Useful ? Let us know.
- Email pbajcsy_at_ncsa.uiuc.edu
- Reading http//alg.ncsa.uiuc.edu/do/documents
- Teaching http//cee.uiuc.edu/people/kumar1/cee498
HI/lectures.htm
65Documentation
66References
- Journal Papers
- Peter Bajcsy and Peter Groves, Methodology For
Hyperspectral Band Selection, Photogrammetric
Engineering and Remote Sensing journal, accepted
June 2003. - Conferences
- Peter Groves and Peter Bajcsy, Methodology for
Hyperspectral Band and Classification Model
Selection, IEEE Workshop on Advances in
Techniques for Analysis of Remotely Sensed Data,
Washington DC, October 27,2003. - Peter Bajcsy and Tyler Jeffrey Alumbaugh,
Georeferencing Maps With Contours, Proceedings
of the 7th World Multiconference on Systemics,
Cybernetics and Informatics (SCI 2003), Orlando,
Florida, July 27-30, 2003. - Peter Bajcsy, Automatic Extraction Of
Isocontours From Historical Maps, Proceedings of
the 7th World Multiconference on Systemics,
Cybernetics and Informatics (SCI 2003), Orlando,
Florida, July 27-30, 2003. - ALG Technical Reports
- Tyler Alumbaugh, and Peter Bajcsy,
Georeferencing Maps with Contours in I2K,
Technical Report NCSA-ALG-02-0001, October 2002 - Peter Groves, Sunayana Saha and Peter Bajcsy,
Boundary Information Storage, Retrieval,
Georeferencing and Visualization, Technical
Report NCSA-ALG-03-0001, February 2003 - Peter Bajcsy, Peter Groves, Sunayana Saha, Tyler
Alumbaugh, and David Tcheng, A System for
Territorial Partitioning Based on GIS Raster and
Vector Data, Technical Report NCSA-ALG-03-0002,
February 2003