ANALYSIS OF POINT PATTERNS - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

ANALYSIS OF POINT PATTERNS

Description:

CHAPTER IV ANALYSIS OF POINT PATTERNS OUTLINE (Last Week) GENERAL CONCEPTS IN SPATIAL DATA ANALYSIS 3.1. Introduction 3.2. Visualizing Spatial Data 3.3. – PowerPoint PPT presentation

Number of Views:248
Avg rating:3.0/5.0
Slides: 81
Provided by: stud1146
Category:

less

Transcript and Presenter's Notes

Title: ANALYSIS OF POINT PATTERNS


1
CHAPTER IV
  • ANALYSIS OF POINT PATTERNS

2
OUTLINE (Last Week)GENERAL CONCEPTS IN SPATIAL
DATA ANALYSIS
  • 3.1. Introduction
  • 3.2. Visualizing Spatial Data
  • 3.3. Exploring Spatial Data
  • 3.3.1. Distinction between visualizing and
    exploring spatial data
  • 3.3.2. Distinction between exploring and
    modeling spatial data
  • 3.4. Modeling Spatial Data
  • 3.5. Practical Problems of Spatial Data Analysis
  • 3.6. Computers and Spatial Data Analysis
  • 3.6.1. Methods of coupling GIS and spatial
    data analysis

3
OUTLINE ANALYSIS OF POINT PATTERNS
  • 4.1. Introduction
  • 4.2. Case Studies
  • 4.3. Visualizing Spatial Point Patterns
  • 4.4. Exploring Spatial Point Patterns
  • 4.4.1. Quadrat Methods
  • 4.4.2. Kernel Estimation
  • 4.4.3. Nearest Neighbor Distance
  • 4.4.4. The K Function

4
4.1. Introduction
  • In this chapter it is considered to investigate
    methods for analysis of a set of point locations,
    which is often referred as point pattern.
  • A spatial point process is any stochastic
    mechanism that generates a countable set of
    events (si) in a plane

5
Basic Definitions
  • Event The location of observed occurrence of the
    spatial phenomena, differentiated from other
    arbitrary locations in the study region.
  • Mapped point pattern All relevant events in a
    study area R have been recorded
  • Point Arbitrary locations or locations other
    than events.
  • Sampled point pattern Events are recorded from a
    sample of different areas

6
4.1. Introduction
  • Objectives
  • To determine if there is a tendency for points to
    exhibit a systematic pattern (i.e. some form of
    regularity or clustering)
  • If there is a systematic pattern, then to examine
    at what spatial scale this pattern occurs and
    whether particular clusters are associated with
    proximity to particular sources of some factors.
  • To estimate how the intensity of points varies
    across the study region
  • To seek models to account for observed point
    patterns

7
4.1. Introduction
  • Analysis Approach
  • Events may have attributes which can be used to
    distinguish types but it is the location
    pattern that is analyzed
  • Patterns in event locations are the focus
  • Stochastic aspect is where events are likely to
    occur
  • Does a pattern exhibit clustering or regularity?
  • Over what spatial scales do patterns exist?

8
?E.g. Such methods are relevant to the study of
patterns of occurrence of
  • Diseases
  • Crime types
  • Earthquake epicenters
  • Plant distributions
  • Etc.
  • A Point pattern is simple example of spatial
    data, since the data contains only the
    coordinates of events. However, this does not
    mean that the analysis is any easier than for
    other spatial data types. In fact from a
    statistical perspective, point patterns can in
    some ways be mathematically more complex to
    handle.

9
Usually data in point pattern analysis comprise
  • Locations (coordinates)
  • Attributes (tree types, crime type, date of
    disease notification, etc.)
  • A point pattern is a data set consisting of a
    series of point locations (s1,s2,) in some study
    region R at which events of interest have
    occurred.

10
Basic Assumptions
  • Data present a complete set of events in the
    study region R, which is called mapped point
    pattern. i.e. all relevant events occurred in R
    have been recorded.
  • !!!Remark Some point pattern analysis are
    directed towards extracting limited information
    about a point process, by recording events in a
    sample of different areas of the whole region,
    which is called sampled point pattern.
  • ?E.g. Field studies in forestry, ecology or
    biology, where complete enumeration is not
    feasible.

11
Basic Assumptions
  • 2. The study region R might be of any arbitrary
    shape. Some of the methods can be applied to
    only to regions, which are square or rectangle.
  • 3. In order to eliminate edge effects, a suitable
    guard area between perimeter of the original
    study region and sub-region within which analysis
    is performed is left.
  • 4. In all cases, the final area selected for
    study is assumed to be in some sense
    representative of any larger region from which it
    has been selected.

12
?Spatial point process is defined by
From a statistical point of view spatial point
pattern can be thought of Number of events
occurring in arbitrary sub-regions or areas, A,
of the whole study region R.
  • Where
  • Y(A) is the number of events occurring in the
    area A.

13
First-Order Properties of Point Patterns
  • First-order properties are described in terms of
    intensity, ?(s), of the process, which is the
    mean number of events per unit area at the point
    s.
  • Mathematically ?(s) is defined by

Where ds Small region around the point
s AS Areas of this region
14
For a stationary process ?(s) is constant over R,
expressed by ?.
  • Then

Where a is the area of A.
15
Second-Order Properties of Point Patterns
  • Second-order properties relate to spatial
    dependence and involve relationship between
    numbers of events in pairs of areas in R. This
    can be formally defined as second order
    intensity, ?(sI ,sJ) of the process. i.e. It is
    the number of events in pairs of areas in R.

Mathematically ? (sI ,sJ) is defined by
16
For a stationary process
. i.e.Second-order intensity
depends on the vector difference (h), (direction
and distance) between si and sj (not on their
absolute locations).
For an isotropic process .
i.e. the dependence is purely a function of
length, h, of the vector, h, and not its
orientation, in other words dependence is purely
a function of the distance between si and sj not
the direction.
17
4.2. Case Studies
  • The following cases will be of concern when
    studying point patterns.
  • The locations of craters in a volcanic field in
    Uganda
  • The locations of granite tors in Bodmin Moor
  • The locations of redwood seedlings in a forest
  • The locations of centers of biological cells in a
    section of tissue
  • The locations of the homes of juvenile offenders
    on a Cardiff estate
  • Locations of theft from property offences in
    Oklahoma City
  • Locations of cases of cancer larynx and lung in
    part of Lancashire
  • Locations of Burkitts lymphoma in an area of
    Uganda

18
1. The locations of craters in a volcanic field
in Uganda
  • The data set involves the locations of centers of
    craters of 120 volcanoes in the Bunyaruguru
    volcanic field in west Uganda. A map of the
    distribution shows a broad regional trend in a
    north-easterly direction, representing elongation
    along a major fault.

19
The purposes of studying this case
  • To obtain a smooth map of such broad regional
    variation.
  • To explore and model the distribution of
    craters in a smaller scale.
  • To answer the following questions
  • ? Is the distribution random within the study
    region?
  • ? Is there evidence of clustering or regularity?
  • To test the following hypotheses
  • It is expected that rift faults would guide
    volcanic activity to the surface, along fractures
    or lines of weakness. The hypothesis is to test
    weather this holds true.

20
2. The locations of granite tors in Bodmin Moor
  • There are 35 locations of granite tors and on a
    large scale there is clear spatial patterning.
  • The purposes of studying this case
  • To detect any evidence of departures from
    randomness at smaller scales.
  • To find if the regularity in the distribution
    is valid for only small distances.
  • To determine if the spatial distribution shows
    other patterning at slightly longer distances.

21
3. The locations of redwood seedlings in a forest
  • There are 62 redwood seedlings distributed in a
    square region of 23 m2.
  • The purposes of studying this case
  • To see some evidence of clustering around
    existing parent trees.

22
4. The locations of centers of biological cells
in a section of tissue
  • There are centers of 42 biological cells in a
    section of tissue.
  • The purposes of studying this case
  • To know whether there is evidence for departures
    from randomness in such data.
  • To answer the following question
  • ? Are such cells clustered or regular?

23
5. The locations of the homes of juvenile
offenders on a Cardiff estate
  • The data recorded in 1971. The purposes of
    studying this case
  • To know whether the distribution of homes of
    juvenile offenders exhibit some regularity
    (clustering).
  • To explore the locations of homes of juvenile
    offenders

24
6. Locations of theft from property offences in
Oklahoma City
  • The data are taken from research done on crime in
    Oklahoma City in late 1970s and comprise two
    distinct categories of events. One set refers to
    offences committed by whites, the other by
    blacks.

25
  • The purposes of studying this case
  • To see if the spatial pattern of the events
    differ
  • To investigate if the two sub-groups have
    different activity places
  • To answer the following questions
  • ? Do the crimes committed by different groups
    display different spatial patterns?
  • ? Are those for one group clustered or aggregated
    in some way, while those for the other group are
    more random?

26
7. Locations of cases of cancer larynx and lung
in part of Lancashire
  • The data are for a part of Lancashire in U.K. and
    have been collected over a 10 year period
    1974-83. Lung cancer is quite a common disease
    and there are 917 cases in the study area.
    Larynx cancer rare and there are only 57 cases
    notified during the study period.
  • The purposes of studying this case
  • To investigate if the residents living near the
    site of an old industrial waste incinerator that
    their health had been affected by exposure to
    the by-products of the incineration process.

27
8. Locations of Burkitts lymphoma in an area of
Uganda
  • The data comprise information on 188 cases of
    Burkitts lymphoma (a cancer affecting usually
    the jaw and abdomen, primarily in children) in
    the West Nile district of Uganda for the time
    period of 1961-75.
  • The purposes of studying this case
  • To assess evidence for space-time clustering in
    order to answer the following questions
  • ? Are the cases that are near each other in
    geographic space also near each other in time?
    If so, this might be evidence in support of the
    hypothesis that suggests an infective etiology
    for the disease.

28
4.3. Visualizing Spatial Point Patterns
  • Point patterns are visualized by the use of dot
    map. This gives an initial impression of the
    shape of the study region and any obvious pattern
    present in the distribution of events.
  • !!!Remark Intuitive ideas about what constitutes
    as random pattern can be misleading. Generally
    it is hard to come to any conclusion purely on
    the basis of a visual analysis.

29
4.3. Visualizing Spatial Point Patterns
30
Figure 4.1. Craters in Uganda Figure
4.2. Tors on Bodmin Moor
No conclusions possible from visual inspection
alone
31
4.3. Visualizing Spatial Point Patterns
  • Visualization Issues
  • Is there an underlying population distribution
    from which events arise in a region?
  • If population varies we would expect events to
    cluster in areas of high population.
  • Are they more or less clustered than we would
    expect on the basis of population alone?
  • Can create event symbols inversely proportional
    to population density in event location and look
    for gaps in the maps

32
4.4. Exploring Spatial Point Patterns
  • The methods of exploration of point patterns are
    divided into two
  • Methods concerned with investigating the
    first-order effects
  • ? Quadrat methods
  • ? Kernel estimation
  • Methods concerned with investigating the
    second-order effects
  • ? Nearest neighbor distances
  • ? The K function

33
4.4.1. Quadrat Methods
  • The simple way of summarizing the pattern in the
    locations of events in some region R is to
    partition R into sub-regions of equal area or
    quadrats and to use the counts of the number of
    events in each of the quadrats to summarize the
    spatial pattern. (i.e. creating a 2-D histogram
    or frequency distribution of the observed event
    occurrences).
  • How ?
  • Impose a regular grid over R
  • Count the number of events falling into each of
    grid
  • Convert this into an intensity measure by
    dividing the area of each of the grid
  • Observe the behaviour of intensity over R.

34
4.4.1. Quadrat Methods
  • Impose a regular grid over R

35
4.4.1. Quadrat Methods
2. Count the number of events falling into each
of grid 3. Convert this into an intensity measure
by dividing the area of each of the grid
36
4.4.1. Quadrat Methods
4. Observe the behaviour of intensity over R.
37
The intensity of the process, ?(s) is defined by
  • The quadrats may, may be randomly scattered in R
    and all events within each quadrat counted to
    give a crude estimate of how intensity varies
    over R.

38
Problem of Quadrat Methods
  • Basic problem Although the method gives a global
    idea of sub-regions with high or low intensity it
    throws away much of the spatial detail in the
    observed pattern. As quadrats are made smaller
    to retain most spatial information, variability
    of quadrat counts gets increased.

? E.g. The variance mean ratio (or index of
dispersion) varies depending on the size and
hence the number of quadrats
39
Problem of Moving Window Approach
Solution Use of counts per unit area in a
moving window can be a solution. A suitable
window is defined and moved over a fine grid of
locations in R. The intensity at each grid point
is estimated from the event count per unity area
of the window centered at that point. This
produces a more spatially smooth estimate of the
way in which ? (s) is varying.
  1. No account is taken of the relative location of
    events within the particular window
  2. It is difficult to decide the size of the window

40
4.4.1. Quadrat Methods
  • A windows is moved over a gird of points in R.
  • What should be the size of the window?

41
4.4.2. Kernel Estimation
  • It was originally developed to obtain a smooth
    estimate of a univariate or multivariate
    probability density from an observed sample of
    observations (i.e. smooth histogram). Estimating
    the intensity of a spatial point pattern is very
    like estimating a bivariate probability density .
  • If s represents a general location in R and s1
    ,...,sn are the locations of n observed events
    then the intensity, ?(s) at s is estimated by

Where k( ) Kernel ? Bandwidth ??(s) Edge
correction factor
42
  • Kernel It is a suitably chosen bivariate
    probability density function, which is symmetric
    about the origin.

Bandwidth It determines the amount of smoothing.
It is the radius of a disc centered on si within
which point si will contribute significantly to
. Note that ? gt 0.
Edge correction factor It is the volume under
the scaled kernel centered on s which lies inside
R.
43
  • For any chosen kernel and bandwidth, values of
    can be estimated at locations on a suitably
    chosen fine grid over R to provide a useful
    visual indication of the variation in the
    intensity over the study region.
  • Most of the time, for reasonably possible
    probability distributions of k ( ), the kernel
    estimate will be very similar for a
    given bandwidth ?. A typical choice of k ( )
    might be the quadratic kernel

44
When the above kernel used, ignoring the edge
correction factor, takes the
following form

Where hi Distance between the point s and the
observed event location si
!!!Remark Summation is all over the values of
hi, which do not exceed ?.
45
Figure 4.3. Kernel estimation of a point pattern
The region of influence within which observed
events contribute to is determined by
the circle with radius ? centered on s.
46
Figure 4.4. Slice through a quadratic kernel
  • From a visual point of view, kernel estimation
    can be thought of a 3-D floating function
    visiting each point s on a fine grid of locations
    in R. Distances to each observed event si lying
    in the region of influence are measured and
    contribute to intensity estimate according to how
    close they are to s.

47
The kernel function visits each s point. Events
within the bandwidth contribute to the intensity
based on weighting of kernel at that distance
48
The effect of bandwidth on kernel estimate
  1. For large ?, will appear flat and local
    features will be obscured.
  2. If ? is small then tends to become a
    collection of spikes centered on the si.

Changing the bandwidth allows you to look at the
variation in intensity at different scales. For
exploratory purposes it is useful to test various
bandwidths to examine the change in intensity at
different scales
49
The effect of bandwidth on kernel estimate
50
Figure 4.5. Kernel estimates of intensity of
volcanic craters (? (a) 100, (b) 220, (c)
500)
51
A rough choice for ? has been suggested as
  • for estimating the intensity, when R is unit
    square and n is the number of observed events in
    R.
  • In order to avoid too much smoothing and not to
    obscure details in dense areas, local adjustment
    of bandwidth may be applied, which is called
    adaptive kernel estimation. In this method ?
    is replaced by ?(si), which is some function of
    presence of events in the neighborhood of si.
    Ignoring the edge effects, will be

52
One practical method for specifying ?(si) is
  1. Perform non-adaptive kernel estimation with some
    reasonable bandwidth ?0 and achieve a pilot
    estimate of .
  2. Compute the geometric mean, , of pilot
    estimates at each si (nth root of their
    product).
  3. Formulate the adaptive bandwidths as

53
  • Where a is the sensitivity parameter and
  • If a 0 ? No local adjustment of t
  • If a 1 ? Maximum local adjustment
  • The choice of a 0.5 is found to be reasonable
    in practice.

54
(No Transcript)
55
4.4.3. Nearest Neighbor Distance
  • This method is designed for investigating the
    second order properties of the spatial point
    process and focuses on the relationship between
    inter-event distances. In this method the
    nearest neighbor event-event distance (W) and the
    nearest neighbor point-event distance (X) will
    constitute the basic area of interest.
  • W The distance between a randomly selected event
    in the study region a nearest neighboring event.
  • X The distance between a randomly selected point
    in the study region an the nearest neighboring
    event
  • W ? Mapped point pattern
  • X ? Sampled point pattern

56
4.4.3. Nearest Neighbor Distance
57
!Remark This method only provides information
about inter-event interactions at a small
physical scale, since by definition it uses only
small inter-event distances.
  • The simple way of summarizing pattern is to
    estimate the empirical cumulative probability
    distribution function ( for W or
    for X).

  • for W

  • for X

Where Number of n Total number of
events in R m Total number of sampled points
58
The resulting or are plotted
against values of w and x. Then it is examined
purely an exploratory way to see the evidence of
inter-event interaction.
Figure 4.6. A typical function of G
59
Interpretations for the plots of or
  • If the distribution function ( or
    ) climbs very steeply in the early part of
    its range before flattening out, then the
    indication would be an observed probability of
    short as opposed to long nearest neighbor
    distances, which suggest clustering.
  • If distribution function ( or )
    climbs very steeply in the later part of its
    range, then the suggestion might be one of
    inter-event regularity.

60
Late sharply rising function could indicate a
regular pattern repulsion
Early sharply rising function could indicate
clustering inter-event interaction
61
Note that a distance between 50 and 150 m
climbs up rapidly. This implies that there are
relatively a lot of short event-event distances.
(i.e. Indicating an impression of local
clustering in the data.
Figure 4.6. Nearest neighbor distribution
function for volcanic craters
62
Another alternative would be to plot
against .
  • If there is no interaction then these two
    distributions should be very similar and it is
    expected to obtain roughly a straight line in the
    plot.
  • In the case of positive interaction or
    clustering, the point-event distances (xi) will
    tend to be large relative to event-event
    distances (wi). Hence will have higher
    values than .The reverse holds for
    regular pattern.

63
(No Transcript)
64
Corrections for Edge Effects
For boundary cases, because the nearest event may
be located outside R, distance to the nearest
event is unknown. If the nearest neighbor is
taken to be the closest event within the study
area, expected nearest neighbor distances will be
greater for events located near the boundary than
for events located near the center of the study
region Thus estimates based on nearest neighbor
statistics will be biased without some edge
correction applied
65
There are several ways of handling edge effects
such as
  • 1. The problem can be overcome by constructing a
    guard area inside the perimeter of R. The
    nearest neighbor distances are not used for
    events within the guard area. But events in the
    guard area are allowed as neighbors of any event
    from the rest of R.
  • 2. Another approach to the problem can be
    employed when the study region is rectangle,
    which is called use of toroidal edge correction.
    The study region is regarded as the central
    region of a 33 grid of rectangle regions, each
    identical to the study region. i.e. top of the
    study region is assumed to be joined to the
    bottom and the left to the right. Events in the
    copies are allowed to be neighbors of any events
    (points) which are selected in the study region.

66
  • can be approximately estimated as
  • Where
  • bi is the distance from event i to the nearest
    point on the boundary of R. This effectively
    ignores wi values for events close to the
    boundary.

67
There are several ways of handling edge effects
such as
68
4.4.4. The K function
  • The nearest neighbor distances method uses
    distances only closest events and therefore only
    considers the smallest scales of pattern.
    Information on larger scales of pattern is
    ignored.
  • An alternative approach is to use an estimate of
    the reduced second moment measure or K function
    of the observed process, which provides a more
    effective summary of spatial dependence over a
    wider range of scales.

69
Properties of the K function
  1. The K function represents information at various
    scales of pattern.
  2. It involves use of precise location of events and
    includes all event-event distances, not just
    nearest neighbor distances.
  3. The theoretical form of K(h) is not only used for
    various possible spatial point pattern models,
    but also suggest specific models to present it
    and to estimate the parameters of such models.

70
4.4.4. The K function
  • Remark When examining spatial dependence over
    small scales in R, an implicit assumption is
    made, which is assuming that the process is
    isotropic over such scales.
  • However, second order properties are not
    necessarily constant over the considered scale
    and may be confused with first order effects.
  • ? E.g. If it is clear that there is large scale
    variation in intensity of given point pattern
    over the whole of R, this is truly a first order
    effect not a result of spatial dependence. In
    this case it is convinient to study second order
    effects over scales in R small enough for the
    assumption of isotropy to hold.
  • If there is no variation in the intensity, it is
    appropriate to study the second order effects
    over larger scales in the study region.

71
4.4.4. The K function
  • ? The K function relates to the second order
    properties of an isotropic process. However, if
    it is used in a situation where there are large
    scale first order effects, then any spatial
    dependence it may indicate could be due to first
    order effects rather than to interaction effects.
    In such a case, it is better to examine smaller
    sub regions of R, since isotropy can reasonably
    be assumed to hold.

72
4.4.4. The K function
  • The K function is defined by
  • ?K(h) E ((events within distance h of an
    arbitrary event))
  • Where
  • Number of
  • E () Expectation operator
  • ? Intensity (mean number of events / unit area)

73
4.4.4. The K function
74
4.4.4. The K function
  • The practical value of K (h) as a summary measure
    of second order effects is that it is feasible to
    obtain a direct estimate of it, ( ) from
    an observed point pattern.
  • How?
  • If A is the area of R, then the expected number
    of events in R is ?A.
  • The expected number of pairs of events a distance
    at most h apart is ?2AK(h).
  • If dij is the distance between ith and jth
    observed events in R and Ih(dij) is an indicator
    function which is 1 if dij 0
    otherwise, then the observed number of pairs is
    then a suitable estimate of
    K(h) is

75
4.4.4. The K function
  • The summation above excludes pairs of events for
    which the second event is outside R. Therefore,
    above eqaition should be corrected for edge
    effects.
  • Consider a circle centered on event i, passing
    through the point j, let wij be the proportion of
    the circumference of this circle which lies
    within R. Then wij is effectively the
    conditional probability that an event is observed
    in R, given that it is a distance dij from the
    ith event. Thus edge corrected estimator for K
    (h) is
  • When the unknown ? is replaced by its estimate,
    which is

76
Graphical Representaion of the K function
  • Imagine that an event is visited and that around
    it is constructed a set of concentric circles at
    a fine spacing. The cumulative number of events
    within each of these distance bands are counted.
    Every other event is similarly visited and the
    cumulative number of events within distance bands
    up to radius h around all the events becomes the
    estimate of K(h) when scaled by A/n2.

Figure 4.7. Estimation of K Function
77
Graphical Representaion of the K function
  • Assume that there are 62 events in a 100 m2 study
    area. It is required to estimate K(h) for h
    0.4 m.
  • K (0.4) (58/62) / (62/100) 1.508

Table 4.1. Counts of events within 0.4 m (Total
of events in each circle 58)
Figure 4.8. Estimating K Function for h 0.4 m
78
Comparison for randomness
The random occurrences of the events implies that
an event at any point in R is independent of
other events and equally likely over the whole of
R. Hence for a random process the expected
number of events within a distance of h of a
randomly chosen event would be ??h2.
  • ? The K function for a random event should be
  • (events within distance h
    of an arbitrary event))
  • ?K(h) ??h2 ? K(h) ?h2 for a random process
  • If the point pattern has regularity then K(h) lt
    ?h2
  • If the point pattern has clustering K(h) gt ?h2

79
  • For the observed data, the estimated is
    compared with ?h2 One way of doing this can be
    achieved by plotting L(h) against h, where
  • In this plot peaks in positive values tend to
    indicate clustering and troughs of negative
    values indicate regularity at corresponding
    scales of distance h in each case.

An alternative to the square root transformation
is to use a logarithmic transformation, plotting
I(h) against h. In this plot again peaks
indicate clustering and troughs indicate
regularity at corresponding scales of distance h
in each case.
80
? E.g. Explore the juvenile offenders on a
Cardiff estate. Visually some form of clustering
is observed on the nothern part. There are a
peaks at h 10 and h 20 m, suggesting
clustering at these scales.
Figure 4.9. (a) Juvenile offenders in Cardiff and
(b) assocaited L function
Write a Comment
User Comments (0)
About PowerShow.com