Geographical Information System (GIS) to Knowledge - PowerPoint PPT Presentation

About This Presentation
Title:

Geographical Information System (GIS) to Knowledge

Description:

Geographical Information System (GIS) to Knowledge Peter Bajcsy, Ph.D. Research Scientist, ALG, NCSA Adjunct Assistant Professor, CS and ECE Departments, UIUC – PowerPoint PPT presentation

Number of Views:287
Avg rating:3.0/5.0
Slides: 67
Provided by: PeterBa150
Category:

less

Transcript and Presenter's Notes

Title: Geographical Information System (GIS) to Knowledge


1
Geographical Information System (GIS) to
Knowledge
Peter Bajcsy, Ph.D. Research Scientist, ALG,
NCSA Adjunct Assistant Professor, CS and ECE
Departments, UIUC
2
Outline
  • Introduction to GIS Related Decision Making
  • Decision Making Scenario
  • Input Information Extraction and Representation
  • Georeferencing, Registration and Raster
    Information Extraction
  • Boundary Aggregation and Evaluation
  • Error Evaluation of New Boundary Aggregations and
    Decision Making
  • Other Interesting Problems
  • Summary

3
Acknowledgement
  • Project Team Members Peter Bajcsy, Peter Groves,
    Sunayana Saha, Tyler Alumbaugh, Sang-Chul Lee
  • Support Michael Welge, Loretta Auvil, Dora Cai,
    Tom Redman, David Clutter, Duane Searsmith, Lisa
    Gatzke, Andrew Shirk, Ruth Aydt, Greg Pape, David
    Tcheng, Chris Navaro, Marquita Miller.

4
GIS Decision Making
  • Decision Making Driven by applications
  • GIS Application Examples
  • Urban planning Where should we build new fire
    stations? Hospitals? Schools?
  • Agriculture Should we apply pesticides ? How
    much per acre?
  • Forestry Should we cut trees ? How many ?
  • Wildlife What should we do to help endangered
    species ?
  • Watershed/Catchment management Would we have an
    surface erosion problem after cutting trees?
  • Archeology Where should we excavate to find
    prehistoric sites?
  • Geology and mining Is is safe to build houses in
    the location (lat/long)?

5
Decision Making Motivation/Process/Goal
  • Motivation Seek Optimal Solutions
  • How?
  • Evaluate Multiple Solutions
  • Assess Quality and Significance of Input
    Variables
  • Predict Output Variables
  • Impose Application Specific Constraints
  • Construct Optimality Metric
  • Goal To Make the Most Optimal Decision
  • Planning Decision
  • Management Decision

6
GIS Applications Natural Resources
  • Agriculture land conservation, market analysis,
    farm planning
  • Forestry timber assessment and management,
    harvest scheduling and planning, environmental
    impact assessment, pest management
  • Wildlife habitat assessment and management,
    rare species studies
  • Catchment management runoffs and erosion
    modeling, sedimentation and water quality
    studies, integrated catchment management
  • Archeology prediction of prehistoric sites,
    site vandalism studies
  • Geology and mining geologic hazard mapping,
    oil, gas and mineral studies, open pit mine
    design
  • Watershed management - guidelines for regulation

7
GIS Applications Urban Resources
  • Spatial Distribution of
  • Utilities
  • Hospitals
  • Schools
  • Fire Stations
  • Management of Storm Water
  • Crime Analysis
  • Waste Collection Routing
  • Hazardous Waste Transportation
  • Disease Outbreak Patterns

8
Feature Analysis Decision Making Scenario
  • Problem Formulation
  • Analyze FBI Crime Reports and Other Features per
    County, Zip Code, US Census Bureau Track or
    Blocks, and Support Decision Making Process about
    Crime Prevention
  • Features
  • FBI Crime Reports, Various Boundary Definitions,
    Forest Cover Maps, DEM, Satellite and Aerial
    Data, Prison Point Information

9
Problem Statement
  • Problem Statement search for the best partition
    of any geographical area that is
  • (a) based on raster or point information,
  • (b) formed by aggregations of known boundaries,
  • (c) constrained or unconstrained by spatial
    locations of know boundaries and
  • (d) minimizing an error metric.
  • Raster or Point Information
  • Grid-based information, e.g., from satellite or
    air-borne sensors
  • Geographical point information, e.g., from GPS or
    address data base
  • Boundaries (Vector Data)
  • Man-made, e.g., Counties, US Census Bureau
    Territories
  • Defined by environmental characteristics, e.g.,
    Eco-regions, Historical iso-contours
  • Spatial Constraints and Error Metric
  • Defined by applications

10
Proposed Approach
Feature Analysis and Decision Making
Data Preprocessing
11
Input Information Extraction and Representation
2D or 3D
12
Input Information Extraction and Representation
13
Data Types and Representation Examples
  • Raster Information GeoImage Object
  • Boundary Information Shape Object
  • Tabular Information Table Object
  • Neighborhood Information NBH Object







14
Raster Information File Formats
  • USGS Digital Elevation Data (DEM) Files
  • Header file with georeferencing information
  • Floating point values, 30 m spatial resolution,
    IL coverage, published in 2002
  • TIFF Files
  • Georeferencing information from
  • One or more standardized files are distributed
    along with TIFF image data as .tfw and/or .txt
    files.
  • The metadata is encoded in the image file using
    private TIFF tags.
  • An extension of the TIFF format called GeoTIFF is
    used.
  • Forest labels, 1km spatial resolution,
  • Forest Cover Types 29 labels, USA coverage,
    published in 2000
  • Forest Fragmentation Index Map of North America,
    8 labels, USA coverage, published in 1993
  • Land use labels, 1km spatial resolution, world
    wide coverage, published in 2001

15
Boundary Information File Formats
  • Computational Tradeoffs Between Vector
    Information Retrieval and Data Storage
  • US Census Bureau TIGER Files
  • Elaboration of the chain file structure (CFS)
  • Used record files 1, 2, I, S, P
  • Environmental Systems Research Institute (ESRI)
    Shapefiles
  • Location list data structure (LLS)
  • shp, shx, dbf files
  • TIGER to ESRI Shapefiles

16
Point Information File Formats
  • FBI Crime Reports
  • United States Crimes Database, years 94-98, USA
    states, reports per county, published in 2001
  • United States Crimes Database, years 98-00, IL
    state, reports per county, published in 2002
  • Entries
  • Theme_Keyword crime, arrests, murder, forcible
    rape, rape, robbery, aggravated assault, assault,
    burglary, larceny, motor vehicle theft, theft,
    arson
  • Challenges
  • Multiple Files
  • Varying notation
  • Association with geographical boundary
    information

17
Data Size
  • Data size driven
  • operations
  • Sub-setting
  • Sub-sampling
  • Cropping
  • Zooming

18
Formation of Vector Data
  • Iso-contour extraction from historical maps
  • Segmentation and clustering of raster data into
    homogeneous regions

19
(No Transcript)
20
Georeferencing Data Sets and Raster Information
Extraction
21
Georeferencing Data Sets and Raster Information
Extraction
22
Why Registration Georefencing ?
23
(No Transcript)
24
Georeferencing Based on Data Types
  • Raster and Raster
  • Vector and Vector
  • Raster and Vector




25
Georeferencing Based on Coordinate Systems
26
Image Registration Without Georeferencing
Information
  • Registration the act of correct alignment
  • Registration Steps
  • Determine locations of salient features in
    multiple data sets (spatial correspondence)
  • Select registration transformation (geometric
    transformation)
  • Evaluate accuracy with a metric
  • Is my registration correct?

27
Image Registration Computation
  • Extract features from multiple images
  • select feature space
  • Search for a finite set of parameters for a
    transformation function (also called deformation,
    warping function)
  • Select transformation function
  • Select similarity metric
  • Choose a search technique which can reduce the
    computational cost search strategy

28
(No Transcript)
29
Raster Information Extraction Continuous Variable





Elevation Statistics Per County
Standard Deviation
Sample Mean
Skew
Kurtosis
30
Raster Information Extraction Categorical
Variable
Frequency of Occurrence
31
Feature Driven Boundary Aggregation and Evaluation
32
Feature Driven Boundary Aggregation and Evaluation
33
Aggregation of Territories
  • Functionality Aggregate territories based on
    similarity of attributes
  • Example if auto theft in County 1 is similar to
    auto theft in County 2 then aggregate County 1
    and 2
  • Aggregation Constraints
  • Segmentation aggregation with neighborhood
    constraint
  • Clustering aggregation without neighborhood
    constraint
  • Hierarchy of results
  • Desired Number of Aggregations
  • Maximum Internal Dissimilarity of Aggregations
  • Number of Layers in Hierarchy
  • Resulting Labels Saved in ESRI Shape Files

34
Territorial Aggregation Input Parameters
  • Tools Main Frame Menu/ImLabels/SegGeoPts and
    ClustGeoPts
  • Input Parameters
  • "input .shp or .shx filename ESRI shape files
    including .nbr file.
  • select features Any combination of numerical
    features in .dbf file.
  • select weights Any numerical feature for
    weighting territories during aggregation.

Output Aggregations Results
Aggregations Results in dBASE (.dbf) File
IL Counties
Statistics
35
Spatially Unconstrained Boundary Aggregation
  • Hierarchical clustering of crime data with the
    exit criterion being the number of clusters and
    the clustered feature being auto theft in 2000
    leads to six aggregations.

Tabular Display
Geographical Display
Boundaries
Boundary Aggregations
36
Spatially Constrained Boundary Aggregation
  • Hierarchical segmentation and hierarchical
    clustering of oak hickory feature with the exit
    criterion of 18 numbers of county aggregations



With Spatial Constraint
Without Spatial Constraint
Boundaries
Boundary Aggregations
37
Boundary Aggregation With Hierarchical Output
  • Hierarchical segmentation of extracted forest
    statistics (oak hickory occurrence) with two
    output partitions.

43 aggregations
21 aggregations
Boundaries
Boundary Aggregations
38
Visualization of Aggregation Results
  • Functionality Visualization of Labels and
    Hierarchy of Labels
  • Data to Visualize Labels in dBASE (.dbf) file,
    Hierarchy of Labels in dBASE (.dbf) file
  • Input Parameters None (ShowResults and
    ShowGeoResults)

IL Counties
39
Error Evaluations of New Territorial Partitions
  • Error evaluation of partitions obtained by
    clustering and segmentation of mean elevation
    feature per Illinois county with Variance error
    metric

40
Territorial Partition Error Evaluation
  • Error Metrics
  • Variance
  • Normalized Variance
  • City Block
  • Normalized City Block

41
Geographical Error Evaluations and Decision Making
  • Geographical error evaluation of partitions
    obtained by clustering and segmentation of mean
    elevation feature per Illinois county with
    Variance error metric

Partition Index
Eval0
Eval1
Eval2
Eval3
42
(No Transcript)
43
Example of Feature Analysis
  • Crime Data Over Multiple Boundary Definitions
  • Evaluation Parameters
  • Features Auto Theft, years 1999, 2002
  • Error Evaluation using Variance
  • Number of aggregations 50
  • Spatial Aspect
  • Boundary type
  • Spatial constraints on aggregations (contiguity
    ?)
  • Optimal number of aggregations
  • Temporal Aspect
  • Temporal changes of crime report features
  • Optimality Metric
  • Feature weighting
  • Cost/evaluation function

44
Example Results Auto Theft
Clustering00
Clustering Error
Clustering99
Segmentation99
Segmentation00
Segmentation Error
45
Example Results Auto Theft
  • Crime data evaluations
  • AutoTheft 00, 99
  • Seg
  • Evaluation Results
  • Evaluation Metric Variance
  • Number of Evaluations 2
  • Eval 0 Seg_0
  • Number of Aggregations54
  • Error5051.924242424242
  • Eval 1 Seg_0
  • Number of Aggregations54
  • Error420688.6572463768
  • Crime data evaluations
  • AutoTheft 00, 99
  • Clust
  • Evaluation Results
  • Evaluation Metric Variance
  • Number of Evaluations 2
  • Eval 0 Clust_0
  • Number of Aggregations37
  • Error5575.483333333334
  • Eval 1 Clust_0
  • Number of Aggregations36
  • Error4387.848695652174

46
Decision Making
  • Which global partition minimizes a chosen error
    metric?
  • Which partition minimizes a chosen error metric
    at a selected boundary definition?
  • What is the geographical error distribution given
    a territorial partition?

47
Additional Interesting Problems
  • Feature Selection
  • High-Dimensional Data Visualization
  • Data Fusion

48
Feature SelectionProblem Formulation
  • General Formulation
  • Given a set of candidate features, select the
    best subset in a classification problem.
  • Specific Formulation with the Focus on
    Multispectral and Hyperspectral Image Analysis
  • Given a set of spectral bands and ground
    measurements, select the best subset of bands
    with the best predictive accuracy for the ground
    measurements

49
Hyperspectral Band Selection
  • Spectral band selection problem with ground
    measurements
  • Soil conductivity continuous variable
  • Grass label categorical variable
  • Input Data sets
  • Regional Data Assembly Centers Sensor (RDACS),
    model hyperspectral (H-3), which is a 120-channel
    prism-grading, push-broom sensor developed by
    NASA Plus ground soil conductivity from
    Agriculture department, UIUC
  • AVIRIS Sensor developed by NASA Plus Gramma grass
    labels from UCSD

50
Input Hyperspectral Data
RDACS Sensor Data
Ground Measurements e.g., Soil Conductivity
51
Input Hyperspectral Data
AVIRIS Sensor
White spatial location of valid labels grass
categories
Three bands
52
Hyperspectral Band Selection
  • Objectives
  • What is the optimal number of selected bands?
  • Which bands should form the optimal subset?
  • What methods should be used for band selection?
  • Challenges
  • Computational complexity
  • combinations of bands with each supervised
    method, where nb is the number of bands
  • No free lunch (NFL) theorem

53
Proposed Approach
  • Use multiple unsupervised methods to rank the
    spectral bands
  • Test those band choices with multiple supervised
    methods (Wrapper Method)
  • Select those band combinations that minimize
    classification error given a particular set of
    ground measurements
  • Select that pair of unsupervised and supervised
    methods that minimize classification error

54
Cross Validation
  • How to measure accuracy?
  • Testing on same data set as trained on is
    cheating.
  • Overfitting model becomes lookup table, cannot
    deal with new examples
  • Use n-fold cross-validation
  • Split data set into n subsets
  • Train n models, with one subset held out for
    each
  • Find average error rate over the holdout sets
    with predictions made by the appropriate model.

55
Analysis of Continuous Variable Soil Conductivity
Unsupervised Ranking
56
Analysis of Continuous Variable Soil Conductivity
  • Linear Regression
  • Prediction is a linear combination of the inputs
    (Xe)
  • Learning becomes finding the coefficients (Find
    that satisfies g(T)).

57
Analysis of Continuous Variable Soil Conductivity
Supervised Linear Regression Based Classification
58
(No Transcript)
59
Data Fusion of GIS Data
  • Map Mosaicking Challenges Heterogeneous Data
    Sets
  • Multiple Data Sets with Different
  • Data Types
  • BYTE, SHORT, INT, LONG, FLOAT, DOUBLE
  • Spatial Resolutions
  • HORIZONTAL AND VERTICAL RESOLUTIONS
  • Spectral Resolutions
  • NUMBER OF BANDS
  • OVERLAPPING ALIGNED OR MISALIGNED WAVELENGTHS
  • Geographic Projections
  • DATUMS (3D GLOBE MODEL)
  • 2D PROJECTIONS

60
Map Mosaicking Task
  • Task Automatic Data Fusion that Resolves
    Dissimilar Projections, Spatial Resolutions, Data
    Types, and Number of Bands
  • Example of a Simplified Task
  • Puzzle-Piece Problem
  • Identical projections, resolutions, data types,
    number of bands
  • Different geographic locations

61
ArcMap vs I2K Reprojection Results
  • Tentative Results
  • Resampling Method Highly Accurate
  • On-the-fly projection not quite as good
  • I2K in between

62
(ArcMap on-the-fly reprojection shown)
63
Table of Results
64
Summary
  • Applications of GIS tools
  • Remote Sensing
  • Agriculture
  • Hydrology
  • Water Quality Survey
  • Atmospheric Science
  • Military
  • Socio-Economics
  • Interested ? Useful ? Let us know.
  • Email pbajcsy_at_ncsa.uiuc.edu
  • Reading http//alg.ncsa.uiuc.edu/do/documents
  • Teaching http//cee.uiuc.edu/people/kumar1/cee498
    HI/lectures.htm

65
Documentation
66
References
  • Journal Papers
  • Peter Bajcsy and Peter Groves, Methodology For
    Hyperspectral Band Selection, Photogrammetric
    Engineering and Remote Sensing journal, accepted
    June 2003.
  • Conferences
  • Peter Groves and Peter Bajcsy, Methodology for
    Hyperspectral Band and Classification Model
    Selection, IEEE Workshop on Advances in
    Techniques for Analysis of Remotely Sensed Data,
    Washington DC, October 27,2003.
  • Peter Bajcsy and Tyler Jeffrey Alumbaugh,
    Georeferencing Maps With Contours, Proceedings
    of the 7th World Multiconference on Systemics,
    Cybernetics and Informatics (SCI 2003), Orlando,
    Florida, July 27-30, 2003.
  • Peter Bajcsy, Automatic Extraction Of
    Isocontours From Historical Maps, Proceedings of
    the 7th World Multiconference on Systemics,
    Cybernetics and Informatics (SCI 2003), Orlando,
    Florida, July 27-30, 2003.
  • ALG Technical Reports
  • Tyler Alumbaugh, and Peter Bajcsy,
    Georeferencing Maps with Contours in I2K,
    Technical Report NCSA-ALG-02-0001, October 2002
  • Peter Groves, Sunayana Saha and Peter Bajcsy,
    Boundary Information Storage, Retrieval,
    Georeferencing and Visualization, Technical
    Report NCSA-ALG-03-0001, February 2003
  • Peter Bajcsy, Peter Groves, Sunayana Saha, Tyler
    Alumbaugh, and David Tcheng, A System for
    Territorial Partitioning Based on GIS Raster and
    Vector Data, Technical Report NCSA-ALG-03-0002,
    February 2003
Write a Comment
User Comments (0)
About PowerShow.com