Access Pattern Analysis, Ideas and Alternative Approaches - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Access Pattern Analysis, Ideas and Alternative Approaches

Description:

Prune based on distance threshold. Computation Saving: Distances have been calculated ... Circles pruned based on number of points. Intersecting circles merged. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 47
Provided by: MOH545
Category:

less

Transcript and Presenter's Notes

Title: Access Pattern Analysis, Ideas and Alternative Approaches


1
Access Pattern Analysis, Ideas and Alternative
Approaches
Crimestat Performance Tuning
  • Pradeep Mohan

2
Outline
  • Overview - Crimestat
  • Motivation
  • Background
  • Datasets Description
  • Distance Calculation
  • Types of Distance Requests
  • Function Categories
  • Access Pattern Analysis
  • K- Nearest Neighbor Analysis
  • Hotspot Analysis
  • K Means
  • Hierarchical Clustering
  • Access Patterns Other Modules
  • Journey To Crime
  • Journey To Crime
  • Crime travel Demand
  • Problem Definition
  • Proposed Approach
  • Voronoi Diagram

3
Overview - Crimestat
  • A multi-threaded windows application for crime
    mapping and analysis.
  • Main modules of interest
  • Distance Analysis (K- Nearest Neighbor Analysis)
  • Hotspot Analysis (Nearest Neighbor Hierarchical
    Clustering and K Means)
  • Journey To Crime( Bayesian Journey to Crime)
  • Space Time Analysis (Knox Index)
  • Crime Travel Demand (Network Assignment)
  • Datasets
  • Multiple point datasets (Ex Criminal Arrest
    Record (with location, time) and Crime Incidence
    Record (with location and time) )
  • Traffic Analysis Zones and Road Network data sets
    (for Journey to Crime and Crime Travel Demand).
  • Different distance metrics computed between
    different point sets. (Ex. Euclidean, Spherical,
    Manhattan and Network.

4
Motivation
How are crime incidents clustered together?
Courstsey ESRI.
Hot Spots Analysis
What are the predicted trips of a serial
offender?
Journey to Crime Estimation
5
Background - Datasets Description
Centroid of a TAZ
Courtsey Ned Levine and Associates
6
Background - Datasets Description
Courtsey Ned Levine and Associates
7
Background - Datasets Description
Primary Pointset
Courtsey Ned Levine and Associates
8
Background - Datasets Description
Secondary Pointset
Courtsey Ned Levine and Associates
9
Background Distance Calculation
  • set of N x K points in Euclidean space (or on a
    network)
  • Distance between all pairs of points. (What is
    the problem ?)
  • Normal computation takes O(n2) (Why is it hard ?)
  1. Do all functions require these distances?
  2. Can calculated distances be re-used?
  3. How do we store them? (in a file)
  4. How do we efficiently search for them?
  5. Is there a single algorithm to calculate all
    these distances?
  6. Can they be calculated on the fly?
  1. Euclidean Distance
  2. Manhattan Distance
  3. Spherical Distance
  4. Network Distance

Calculations in a single threaded application
A Distance Cell
10
Background - Types of Distance Requests
  • given a distance d, find all points separated
    within this distance (whole dataset!!)
  • Given a point p, find its k order nearest
    neighbors. (Ex. p6 is 1st order and p2 is 2nd
    order).
  • Find a set P ( of incident points ), given a zone
    Z, a polygon.
  • Given a set of Points P and another set Q, find
    all pair euclidean distance.

11
Background Function Categories
  • K - Nearest Neighbor Analysis based modules
  • Nearest Neighbor Analysis (K order)
  • Ripleys K Statistic (simulation)
  • Point to Point allocation
  • Point to Zone Allocation
  • Hot Spot Analysis based modules
  • K Means clustering
  • Spatio Temporal Analysis of Crime
  • Anselins Moran
  • Nearest Neighbor Hierarchical clustering
  • Risk Adjusted Nearest Neighbor Hierarchical
    Clustering
  • Space Time Analysis
  • Crime Trip Estimation
  • Travel Demand Modeling

12
Access Patter Analysis Analysis
13
K- Nearest Neighbor Analysis
  • Input
  • A set of incident locations, N
  • K (order of NN)
  • Ouput
  • 1- order Nearest Neighbor for all N
  • K- order Nearest Neighbor for all N
  • Method
  • 1. For every point (pi , pk )? P, computes the
    distance.
  • 2. For every particular point quick sorts all
    distances to get the nearest neighbor.
  • 3. For , K order , top K neighbors of a point are
    selected.
  • 4. Statistics calculated mean Random distance,
    Nearest Neighbor Index etc.

14
Access Patterns
  • All K-order distances calculated.
  • Such computations performed on whole dataset.
  • A distance matrix is calculated O(n.n)

d1
d2
d3
d4
dk
15
Assigning Point to Point , Point to Zone
(polygon or grid)
  • Point to Point Assignment

Input A set P of N points and another Q set of M
points. Output An assignment of each point in P
to a point of Q. Method Proximity calculation
For every point in P to every other point in Q.
Point to Polygon Assignment
Input A set P of N points and another set Q of M
polygons. Output An assignment of each point in
P to a polygon of Q. Method Point in Polygon for
M x N times.
A Pre-computed distance Matrix is used currently
for distances.
16
Access Pattern
Secondary Point
Primary Point
d4
  • Distance of every Pi with every Si is
    calculated and ordered.
  • A distance Matrix is used.

d3
Si
d2
Pi
d5
d6
d1
17

Pi
Incident Point (in P )
TAZ Centroid
Courtsey Ned Levine and Associates
18
Ripleys K Statistic
  • Input
  • Set of N Points.
  • Output
  • L(t) measure of second order clustering
  • Method
  • Draw a circle around each point,
  • Collect all points within the radius.
  • Increase the radius and repeat above operations
  • Repeat for 100 increments of radius till maximum
    distance

Random Point Set is generated, so distance matrix
needs to be recalculated.
19
(No Transcript)
20
Hot Spot Analysis Modules
21
Mode and Fuzzy Mode
  • Input
  • A set of N Points.
  • A Radius R. (Fuzzy Mode)
  • Output
  • Frequency of Incidents at each point Mode
  • Frequency of incidents at a radius from a point
    Fuzzy Mode
  • Method
  • Mode A frequency count of number of incidents
    on each point. (For N points)
  • Fuzzy Mode A frequency count of incident within
    a radius around a particular point.

22
Access Patterns K Means
  • Computing initial seeds
  • Using secondary set of points as seeds
  • Overlay a grid and cell with highest count is
    seed.
  • Grid approach expensive in O(gridsize x k), k
    number of clusters.
  • If a grid size doesnt produce a cluster another
    is tried. worst case
  • Distance measurements of all n points with k
    clusters performed till convergence. O (k x n x
    iterations till convergence)

23
Nearest Neighbor Hierarchical Clustering (NNH)
  • Input
  • A set of N points (same file)
  • A search distance, d (random or fixed)
  • No. of simulations, k (order)
  • Min number of points per cluster
  • Output
  • 1. All k order clusters
  • Conditions
  • Distance between pairs of points gt d
  • Cluster size gt minimum number of points.
  • Method
  • Compute all pair euclidean distance.
  • Prune based on distance threshold.
  • Computation Saving
  • Distances have been calculated already.

N 1349, Fixed Distance (dt) 5 miles Pair
Count 231742 (so many distances evaluated)
Courtsey Ned Levine and Associates
24
Access Patterns
25
Risk Adjusted Nearest Neighbor Hierarchical
Clustering (NNH)
  • Use of baseline variable ( Ex. census blocks).
  • Interpolate to a grid size based on primary file.
    (say size N)
  • Determine absolute densities (of secondary) as
    points per grid cell.
  • Proceed as NNH.

An O(N.N) for calculating grid parameters, N is
primary point set size.
26
Spatio Temporal analysis of Crime (STAC)
  • Area divided into grids
  • Circle drawn on grids
  • Circles pruned based on number of points.
  • Intersecting circles merged.

27
Access Patterns Other Modules
  • Other Modules
  • Knox Index
  • Mantel Index

Distance Matrix re-computed every time for each
simulation (simulated point set).
28
Journey To Crime
29
Journey to Crime
  • Input
  • A List of incidents committed by a serial
    offender.
  • A travel decay function.
  • A Reference Grid.
  • Origin and Destination of offenders. (Case 2)
  • Output
  • The origin of crime (home of offender)
  • Crime Trip (case 2)
  • Observation
  • O( N(grid cells).Incidents)

30
Courtsey Ned Levine and Associates
31
Crime Travel Demand
  • Given Origin and Destination of Criminal
    Generate Trips.
  • Assign Origins and Destinations to Zones (Ex.
    TAZ, Census Blocks).
  • Predict trips based on various demand models.
  • All data points of Primary and Secondary file
    accessed.
  • Distance computations are O(N.N).

32
Network Assignment
  • Given
  • A set of Crime Trips.
  • A Transportation Network
  • Find
  • The actual route on the transportation network
  • Constraints
  • Routes are weighted by both distance and time

Computation an Expensive Join between Euclidean
Crime trip and a Network based on constraints.
33
Courtsey Ned Levine and Associates
34
Courtsey Ned Levine and Associates
35
Courtsey Ned Levine and Associates
36
Problem Definition
  • Given
  • set of N Primary Points
  • set of M secondary points
  • TAZ or Census Block
  • A transportation Network
  • User Defined Parameters
  • Request for a particular task based on spatial
    proximity.
  • Find
  • Proximity measure in terms of distance.
  • Objective
  • Define a suitable data structure for storage of
    input data.
  • Define a suitable Hierarchical Index.
  • Define a suitable Join (and Hierarchical Join
    Index) between different spatial sets.
  • Define an appropriate storage representation on
    disk.
  • Constraints
  • Find out all requested proximity measures with
    lower cost of computation.

37
Related Work
  • Naïve approach to distance calculation
  • Lazy approach to distance calculation.
  • R Tree based Index
  • Hierarchical Join Index
  • Hierarchical Voronoi based Index
  • Useful only for Euclidean space.
  • Networks also need to be stored and accessed
    separately.
  • Costly Joins Need to be computed

R Tree Index Structure
38
Proposed Approach Voronoi Diagram
  1. Based on spatial proximity of points.
  2. O(nLogn) to calculate the diagram.
  3. Distance of nearest neighbors stored during
    construction.
  4. A hierarchical index based on voronoi to be
    constructed.
  5. Voronoi Joins for Euclidean and Networks.

39
Proposed Approach Point to Point and Point to
Polygon Assignment TAZ Approximation
Incident Point (in P )
TAZ Centroid
40
K- Order Nearest Neighbor Calculation
  • For every edge in voronoi,
  • The sites split by an edge is known . (during
    computation of voronoi)
  • There are O(N) edges (for N points).
  • An edge traversal gives a pair of distance.
    (which is stored)
  • Points are kept in an ordered bucket.
  • Quick sort within the points all its distances.
  • Also store its neighbor connected by that
    distance.

Query Point
All k order nearest neighbors
Median Center of a Polygon (TAZ)
Incident Point ( belongs to set P)
41
Proposed Approach NNH
  • Use of an already existing algorithm Amoeba by
    Castro et al.2
  • O(nLogn) , n is the number of points.
  • Makes use of Delaunay Triangulation.

42
Proposed Approach - Network Assignment
  • Use of a Network Voronoi Diagram , Graf and
    Winter 1.
  • Computation of the Voronoi Diagram of the
    Euclidean Crime Trips ( Origin Destination
    points)
  • Computing a Join between the two.
  • These computations might be expensive.
  • Eulidean Voronois have been calculated already
    and stored.

43
Proposed Approach - Spatial Indexing
Hierarchical Voronoi Indexing 3
  • Efficient Paging Mechanism
  • Spatial Proximity very useful.
  • Extensible even to networks.

44
Challenges
  • When to compute the voronoi diagrams?
  • Repeated computations might be costly.
  • Need to store the computed distances, voronoi
    partitions, intermediate results in a file.
  • Need for a Hierarchical Index efficient access
    points and voronoi seeds.

45
References
  • 1. Graf, M.,Winter, S., 2003
    Netzwerk-Voronoi-Diagramme. Österreichische
    Zeitschrift für Vermessung und Geoinformation,
    91(3) 166-174. (Network Voronoi Diagrams,
    english translation)
  • Castro, E., Vladimir and Lee., I (2000). AMOEBA
    Hierarchical Clustering Based on Spatial
    Proximity Using Delaunay Diagram. Proceedings of
    the 9th International Symposium on Spatial Data
    Handling (SDH2000). Beijing, China.
  • Gold,C., and Angel., P ., 2006 Voronoi
    Hierarchies LECTURE NOTES IN COMPUTER SCIENCE, pp
    99111, 2006. Springer-Verlag Berlin Heidelberg
    2006

46
Thank You
Write a Comment
User Comments (0)
About PowerShow.com