Spatial Dependency Modeling Using Spatial AutoRegression - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Spatial Dependency Modeling Using Spatial AutoRegression

Description:

Spatial Dependency Modeling Using Spatial Auto-Regression ... Spatial Dependency Modeling Using SAR. 2. Outline of Today's Talk. Motivation & Background ... – PowerPoint PPT presentation

Number of Views:369
Avg rating:3.0/5.0
Slides: 25
Provided by: barism7
Category:

less

Transcript and Presenter's Notes

Title: Spatial Dependency Modeling Using Spatial AutoRegression


1
Spatial Dependency Modeling Using Spatial
Auto-Regression
  • Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar
    1,3,
  • Daniel Boley 1, David J. Lilja 1,2
  • 1 CSE Department _at_ University of Minnesota, Twin
    Cities
  • 2 ECE Department _at_ University of Minnesota, Twin
    Cities
  • 3 Army High Performance Computing Research Center
  • 4 Oracle USA

2
Outline of Todays Talk
  • Motivation Background
  • Problem Definition
  • Related Work Contributions
  • Proposed Approach
  • Experimental Evaluation
  • Conclusion Future Work

3
Motivation
  • Widespread use of spatial databases
  • Mining spatial patterns
  • The 1855 Asiatic Cholera on London Griffith
  • Fair Landing NYT, R. Nader
  • Correlation of bank locations with loan
  • activity in poor neighborhoods
  • Retail Outlets NYT, Walmart, McDonald etc.
  • Determining locations of stores by relating
  • neighborhood maps with customer
  • databases
  • Crime Hot Spot Analysis NYT, NIJ CML
  • Explaining clusters of sexual assaults by
  • locating addresses of sex-offenders
  • Ecology Uygar
  • Explaining location of bird nests based on
    structural environmental variables


4
Spatial Auto-correlation (SA)
  • Random Distributed Data (no SA) Spatial
    distribution satisfying assumptions of classical
    data

Pixel property with independent
identical distribution
Random Nest Locations
  • Cluster Distributed Data Spatial distribution
    NOT satisfying assumptions of classical data

Pixel property with spatial auto-correlation
Cluster Nest Locations
5
Execution Trace
6th row
6th row
  • Given
  • Spatial framework
  • Attributes

Space 4-neighborhood
Binary W
Row-normalized W
  • W allows other neighborhood definitions
  • distance based
  • 8-neighbors

6
SDM Provides Better Model!
  • Linear Regression ? SAR
  • Spatial auto-regression (SAR) model has higher
    accuracy and removes IID assumption of linear
    regression

7
Data Structures in SAR Model
  • Vectors y, ß, e
  • Matrices W, x
  • W is a large matrix

8
Computational Challenge
  • Maximum-Likelihood Estimation MINimizing the
    log-likelihood Function
  • Solving SAR Model
  • 0 ? Least Squares Problem
  • 0, 0 ? Eigen-value Problem
  • General case ? Computationally expensive due to
    the log-det term in
    the ML Function

Log-det term Theorem 1
SSE term
9
Outline
  • Motivation Background
  • Problem Definition
  • Related Work Contributions
  • Proposed Approach
  • Experimental Evaluation
  • Conclusion Future Work

10
Problem Statement
  • Given
  • A spatial framework S consisting of sites s1, ,
    s?q for an underlying geographic space G
  • A collection of explanatory functions fxk S ? ?k
    , k1,, K. ?k is the range of possible values
    for the explanatory functions
  • A dependent function fy ? ? ?y
  • A family of F (SAR equation) of learning model
    functions mapping ?1 x x ?k ? ?y
  • A neighborhood relationship (4 and 8- neighbor)
    on the spatial framework
  • Find
  • The SAR parameter ? and the regression
    coefficient vector ? with a desired precision to
    save log-det computations.

11
Problem Statement Contd
  • Objective
  • Algebraic error ranking of approximate SAR model
    solutions.
  • Constraints
  • S is a multi-dimensional Euclidean Space,
  • The values of the explanatory variables x and the
    dependent function (observed variable) y may not
    be independent with respect to those of nearby
    spatial sites, i.e., spatial autocorrelation
    exists.
  • The domain of x and y are real numbers.
  • The SAR parameter ? varies in the range 0,1),
  • The error is normally distributed with unit
    standard deviation and zero mean, i.e., ?
    N(0,?2I) IID
  • The neighborhood matrix W exhibits sparsity.

12
Related Work
13
Contributions
  • A new approximate SAR model solution
    Gauss-Lanczos approximation method
  • Key Idea Do not find all of the eigenvalues of W
  • Error ranking of approximate SAR model solutions

14
Outline
  • Motivation Background
  • Problem Definition
  • Related Work Contributions
  • Proposed Approach
  • Experimental Evaluation
  • Conclusion Future Work

15
Gauss-Lanczos Approximation
  • Log-det is approximated by transforming the
    eigenvalue problem to the quadratic form.
  • Finally, Gauss-type quadrature rules are applied
    using Lanczos procedure

16
How does GL Method Work?
  • GL (Algorithm 3.2) is repeated
  • m (i.e., 400) times in our experiments
  • Parameter r varies between 5 and 8 in our
    experiments.
  • For large problem sizes, the effects of m and r
    for getting good solution are low.

17
Taylors Series Approximation
  • Log-det term in terms of Taylors Series
  • Trace is sum of eigen-values W is symmetrized
    neighborhood matrix

18
Chebyshev Polynomial Approximation
  • Log-det term in terms of Chebyshev Polynomials
  • Trace is sum of eigen-values, Ts are matrix
    polynomials, cs are Chebyshev polynomial
    coefficients

19
Outline
  • Motivation Background
  • Problem Definition
  • Related Work Contributions
  • Proposed Approach
  • Experimental Evaluation
  • Conclusion Future Work

20
Experiment Design
21
Exact and Approximate Values of Log-det
  • GL gives better approximation while spatial
    autocorrelation
  • increases

22
Absolute Relative Error of Approximations
  • Absolute relative error of approximation goes
    down as
  • spatial autocorrelation increases (GL Mean
    error 0.9, GL max error 1.78)

23
Conclusions
  • GL is slightly more expensive than Taylor series
    and Chebyshev polynomials.
  • GL gives better approximations when spatial
    autocorrelation is high and the problem size is
    large.
  • GL quality depends on the number of iterations
    and the initial Lanczos vector and the random
    number generator.
  • No need to compute all eigenvalues.

24
Acknowledgments
  • AHPCRC
  • Minnesota Supercomputing Institute (MSI)
  • Spatial Database Group Members
  • ARCTiC Labs Group Members
  • Dr. Dan Boley
  • Dr. Sanjay Chawla
  • Dr. Vipin Kumar
  • Dr. James LeSage
  • Dr. Kelley Pace
  • Dr. Pen-Chung Yew

THANK YOU VERY MUCH Q/A
Write a Comment
User Comments (0)
About PowerShow.com