Title: Geography 38:286 Computer Cartography
1Geography 38286Computer Cartography
- Topic 5
- Choropleth Mapping
- Chapter 7 Dent
2Mapping TechniquesLecture Format
- Description
- Definition
- Types/variations
- Data characteristics
- Type of data
- Raw or Derived
- Spatial characteristics
- Discrete or Continuous
- Design Considerations
- Projection
- Legend
- Symbology
- Classification
- Colour scheme
- Scale
- Other
3What is Choropleth Mapping?
- Uses distinct
- Colour
- Shade
- Texture
- . . . to represent differences in value from one
area to another - Areal units typically administrative
- Can also be natural
- Also called enumeration area mapping
4Two Types of Choropleth Maps
- 1. Conventional or Simple
- Areal units grouped into classes
- Minimum four
- Maximum 6 to 8
- Five most common
- By far the more common method
5Two Types of Choropleth Maps
- 2. Classless, Unclassed, or Tonal
- Each area assigned unique colour, shade or
texture pattern - Directly proportional to value
- No grouping into classes
- Difficult to detect spatial patterns
6Three Types of Spatial Data
- Choice of mapping technique is often determined
by type of spatial data - While there are two general categories, in this
course we consider three - discrete
- areally discrete
- continuous
7Discrete Data
- Data values occur at
- a point
- a line
- or polygon
- No data occurs between features
- Phenomena is absent, nothing to measure
- Ex. hog barns, hydro lines, water quality
8Areally Discrete Data
- Special type of discrete data
- Represent aggregate values of discrete areas
- Values may be
- Totals
- Averages
- Rates/Proportions
- Represent entire area but do not necessarily
occur across entire area
9Continuous
- Values occur continuously across area of interest
- BUT are measured/sampled at specific locations
points - Values vary continuously
- Sometimes represented at points
- Includes most naturally occurring phenomena
- Precipitation, elevation, temperature
10Spatial Characteristics of Data
- Choropleth technique can be used when data are
aggregated by discrete areal units - Each areal unit has one value
- Assumed constant across the area
- So, data are lost
- Not appropriate for mapping
- discrete point or line data
- continuously distributed data
11Two Aspatial Data Types
- Certain maps require specific aspatial data
characteristics - Ex. Totals cant be used, must use a proportion
- May necessitate some preliminary data processing
- Convert totals to an appropriate rate/proportion
12Aspatial Data Types
- Totals or Raw Values
- Actual values of areal unit
- total population
- number farms
- Ukrainian first language
- Misleading since size of areal units may vary
significantly - Consequently, raw values not used
13Aspatial Data Types
- Derived Data
- Data expressed as a rate or proportion
- Typically normalized by either population or land
area - Wheat as a percentage of total cropland
- Population per square kilometre
- Elderly as a percentage of total population
- Crime as a percentage of . . . ?
- Curlers as a . . . ?
- ArcGIS can do this for you, but be careful
14Two Types of Areal Units
- Natural Areal Units
- Areas corresponding with naturally occurring
discrete phenomena - Drainage basins
- Ecoregions
- Land cover/Land use areas
- Soil boundaries
- Boundaries are finite and inherently associate
with phenomena concerned
15Types of Area Units
- Artificial Areal Units
- Areas created to spatially organize the Earths
surface - political, administrative regions, census areas
- Used for the purposes of collecting and analyzing
data - Boundaries are often arbitrary
16Modifiable Areal Unit Problem
- Since arbitrary boundaries are . . . well,
arbitrary, any artificial regionalization may be
equally valid - However, resulting areal units could yield very
different aggregate data - What are the implications with regard
- to choropleth mapping??
- regionalization of space???
17Modifiable Areal Unit Problem
18Considerations Scale
- Scale must be large enough so that smallest areal
unit visible - This dictates
- the geographic extent shown on the map
- or size of the final map composition
- or hierarchical level represented (next slide)
19Considerations Number Size of Areal Units
- Areal units are often nested
- Data reported at mulitple levels
- Provinces, CD, CSD, CMA, FED, CT, EA
- Ecozones, regions, provinces, districts
- We may be able to choose at what level data are
mapped
20Number Size of Areal Units
- This has implication in terms of
- level of aggregation/loss of data
- choice of symbolization
- perceived accuracy/appearance
- fewer/large areas appear coarse/inaccurate
- more/small areas appear finer/more accurate
21Number Size of Areal Units
- Choice usually dependent on
- purpose of the map
- acceptable level of generalization
- data availability
- scale, geographic extent, size of final map
22Considerations Classification Technique
- Significantly impacts message
- More than one version can be presented but not
common - Should use most appropriate method not one that
produces desire effect - Statement indicating technique should be included
23Data Classification
- Method of cartographic abstraction
- Inevitable loss of information
- Purposes is to
- Reduce observations to manageable size
- Identify groupings of observations
- Reveal information otherwise obscured
- Regionalization is a means of spatial
classification
24Classification Schemes
- Two classification schemes of classifications
- First identifies four types
- Exogenous
- Boundaries not related to data array
- But related to theme
- Often based on established critical or standard
values - poverty line
- age groups
- soil salinity
25Classification Schemes
- Arbitrary
- Boundaries also unrelated to data array
- Used for convenience
- Usually round numbers
- Less than 10, 10 to 30, greater than 30
26Classification Schemes
- Idiographic
- Boundaries are based on qualities of the data
array - Naturally occurring groups
- Serial
- Boundaries based on mathematical or statistical
characteristics - standard deviations
- equal intervals
- arithmetic and geometric progressions
27Classification Schemes
- Second classification based on resulting
intervals - Constant Intervals
- Analogous to passing a series of planes through a
3D model, each plane equal distance apart - Variable Intervals
- Planes are unequal distances apart
- Either can be used to accentuate or mask outliers
28Equal Steps
- Each class represents equal proportion of the
range of data values - Procedure
- Data range R H - L
- Interval I R / n ( classes)
- Class boundaries are then determined by
- L, L (1 x I), L (2 x I), . . . L (n x CD)
Lower Boundary
Upper Boundary
29Equal Steps
1
2
3
4
5
Five Classes 0.02 4.76 4.77 9.51 9.52
14.26 14.27 19.01 19.02 23.77
R 23.77 0.02 23.75 I 23.75 / 5
4.75 Class Boundaries are 0.02, 4.77, 9.52,
14.27, 19.02, 23.77 Note No overlapping classes
30Equal Steps
- Most appropriate when
- Frequency distribution is rectangular/even
- Areal units of equal size
- Neither is a common occurrence
- Accentuates outliers when distribution is not
rectangular
31Equal Steps
4.75
32Standard Deviations
- Limits based on mean and standard deviation
- Normally 1, 2, and 3 SDs above below mean
- Each class equal proportion of total deviation
- constant interval scheme
- Used when data displays normal distribution
- Common distribution
33Standard Deviations
- Procedure
- mean value m ?x / n
- SD ? (xi - m)2 / n-1-1/2
- Class boundaries are then determined by
- m (1 x SD) and m - (1 x SD)
- Usually 6 classes
- 3 above mean
- and 3 below mean
34Standard Deviations
-1
1
2
3
Mean 82.01/12 6.83 SD 7.5 Note No
overlapping classes
Six Classes lt1 SD 0 6.82 gt1 SD 6.83
14.32 gt2 SD 14.33 21.15 gt3 SD 21.16 27.99
35Standard Deviations
- Creates a more even looking distribution
- Even when distribution is skewed
- Masks outliers
- May present interpretation issues
- E.g. does map reader understand SD?
36Standard Deviations
- 1 SD
1 SD
2 SD
3 SD
37Geometric Intervals
- Mathematically defined class limits based on
arithmetic or geometric properties of data - Used when distribution approximates geometric
progression - Class intervals are progressively smaller or
larger toward one end of the distribution so a
variable interval technique - Less common
38Quantiles
- Boundaries are selected such that same number of
EAs occurs in each class - However, intervals not constant
39Quantiles
- Procedure
- arrange all values in ascending order
- determine number of obs in each class (K) by
K obs/ classes - Starting at the lowest value, place an equal
number of observations in each class - Class limit is mean value between adjacent
observations in different classes
40Quantiles
1
2
3
4
K 12 obs / 4 classes 3 obs/class Class
boundaries are mean values on either side of
class limits Still, no overlapping classes
Four Classes 0.02 1.60 1.61 4.19 4.20
11.01 11.02 23.77
41Quantiles
- Produces an even looking map
- A sense of diversity when there is little
- Masks outliers
42Quantiles
3 obs each
43Natural Breaks (Manual)
- Based on visual inspection of data using
- Histogram
- Cumulative percent curve
- Class limits identified where natural groupings
or breaks occur - Number of classes determined by number of natural
breaks - Subjective technique, but can be effective
- The manual scheme in ArcMap
44Natural Breaks
45Jenks Optimization Method
- An iterative technique
- Based on a measure called the goodness of
variance fit (GVF) - Maximizes
- between class heterogeneity
- and within class homogeneity
- You pick number of classes
- The way ArcMap calculates natural breaks
46Jenks Optimization Method
Natural Breaks 5 classes
1
?
2
?
3
?
4
?
5
47Jenks Optimization Method
- GVF SDAM - SDCM / SDAM
- Where
- SDAM sum of squared deviations from array mean
- SDCM sum of squared deviations from class means
- When SDCM is lowest then GVF will be closest to 1
- This is the best set of five classes
48Jenks Optimization Method
(x xi)2
SDAM ?(x xi)2 Where x the array mean
6.83 Xi each data value
?(x xi)2
49Jenks Optimization Method
(zc xi)2
SDCM ??(zc xi)2 Where zc the class mean Xi
each data value
??(zc xi)2
50Jenks Optimization Method
GVF 616.95 5.42 / 616.95 0.99
51Jenks Optimization Method
52Jenks Optimization Method
1
2
3
4
(zc xi)2
Quantiles with 4 classes GVF 616.95 99.87 /
616.95 0.84
??(zc xi)2
53Considerations Legend Design
- Significant interpretation error can occur when
range graded class boundaries used - Range grading refers to use of classes with
continuous intervals - E.g. 1-10, 11-20, 21-30, 31-40,
- In reality, continuous range of data may not exist
54Legend Design
Effect of range grading Non-continuous data
55Legend Design/Ancillary Data
- Legend boxes should be
- 2/3 as tall as they are wide
- not square boxes
- can also use irregular shapes
- Appropriate ancillary data include
- histogram or cumulative curve
- indication of classification technique
56Considerations Symbol Selection
- Symbols indicate relative change in value
- Achieved by varying symbol
- Arrangement
- Texture
- Orientation
- Colour saturation/chroma
- Colour value/intensity
- Colour hue
57Considerations Map Projection
- Relative proportion of map area represented by
different symbols affects interpretation - Consequently, an equivalent projection is most
appropriate - More important when mapping at smaller scale
(i.e. large geographic areas)
58MIDTERM TO HERE Questions?