Title: Geocoding
1Geocoding Data Collection with GPS
2Summary
- Introduction to Geocoding
- Geocoding Concepts and Definitions
- Relationship to other Census Processes
- Approaches to Data Collection
- NSO Benefits Concluding Remarks
- Introduction to GPS
- How GPS Works
- Sources of Error Accuracy
- Selecting a GPS
- Advantages Disadvantages
3Introduction
- Many NSOs have a specialized coding scheme and
understand geocoding as a dynamic process - Clarification within the statistical community
- Expansion and discussion on components and
methods within the process of geocoding - The purpose of this section is to introduce
geocoding concepts relevant for census mapping
and the different approaches to related data
collection.
4Geocoding
- Definitions
- Conceptual/Operational
- Geocoding vs Georeferencing
- Census Hierarchies
- Coding Scheme
- Data Collection Methods
- Direct Collection
- Matching Approach
- Benefits for NSOs
5- Geocoding can be broadly defined as the
assignment of a code to a geographic location.
Usually however, Geocoding refers to a more
specific assignment of geographic coordinates
(latitude, longitude) to an individual address. - Reference UN Report of the Expert Group Meeting
on Contemporary Practices in Census Mapping and
Use of Geographical Information Systems (2007)
6Definition of Geocoding
- Conceptual - 2 situations
- The more general process of assigning geographic
codes to features in a digital database. - A GIS function that determines a point location
based on an address. It could generally be
expected that such point locations will be
relatively precise (eg /-2m) in accuracy and
will be based upon use of GPS technology. - Operational
- Geocoding is the computer oriented process which
converts information about a unit from which
statistical information is collected into a set
of coordinates describing the geographic position
of that unit
7cont.
- Operational Elements
- Collecting precise data at the level of point
locations (or very low geographic level such as a
city block) and assigning codes for use in
dissemination. - Coding the centroid, building corners, or
building point of entry coordinates for a unit
such as a block of land, building or dwelling - Coordinates must contain latitude and longitude
or standardized x and y points for gridded
interpolation. A Z or Zed coordinate may
represent altitude or elevation - Codes cover each geographic unit and have a
combinational relationship to distinguish
different units (Enumeration Areas/Blocks)
8Georeferencing vs Geocoding
- Georeferencing
- Aligning geographic data to a known coordinate
system so it can be analyzed, viewed, and queried
with other geographic data - Geocoding
- The process of assigning geographic codes to
features in a digital database (including the GIS
operation for converting street addresses into
spatial data that can be displayed as features on
a map)
9Relationship to Other Census Processes
- Movement into a fully GIS based approach to
census mapping - Generation of high quality maps for use in the
collection phase - Reduction of work required for updating maps for
future censuses - Aggregation of records into customized units for
satisfying users requirements
10Census Enumeration the Geocoding System
- Delineation irrespective of the existence of
address - Ability to apply a geocode to any geographic
areal unit - Flexible Coding Scheme
- Ability to incorporate future administrative
divisions - Pre-enumeration geocoding critical
- links between GIS boundaries and tabular census
data
11Census Hierarchies
Define census geographic hierarchy
Develop geographic coding scheme
Development of an administrative and census units
listing
12Census Hierarchies some principles
- Internal political Boundaries
- Areal unit aggregation
- Resolution suitable to NSO needs and user demands
- Considers available datasets for continuous
development - The smaller area defined by the geocode the more
flexible the results for subsequent users
13Example of Administrative Hierarchy
country
region
province
district
sub-district
urban locality
rural locality
ward
Enumeration area
Enumeration area
14Illustration of a nestedAdmin. Hierarchy
Provinces
Districts
Localities
Enumeration
areas
15Hierarchical Coding Scheme operational
considerations
- Geographic units are numbered at each level of
the administrative hierarchy (gaps between the
numbers to allow changes) - For example at the province level, units may be
numbered 5, 10, 15 and so on. A similar scheme
would be used for lower-level administrative
units and for enumeration areas. - Since there are often, for example, more
districts in a province than provinces in a
country, more digits may be required at lower
levels - The unique identifier for the EA (the
smallest-level unit) concatenation of the
identifiers of the Admin. Units into which it
falls -
16Example of a Coding Scheme
A small country could use the following coding
scheme Province 2 digits District
3 digits Locality 4 digits EA
4 digits An EA code of 10 025 0105
0073 means that enumeration area number 73 is
located in province 10, district 25 and locality
105. The unique code is stored in the database
as a long integer or as a 13-character string
variable.
17Example of a Coding Scheme (cont.)
- The variable type needs to be the same in the
census database and the geographic database. - The integer variable has the advantage that
subsets of records can be selected easily (SQL) - Example of query
- SELECT ID gt 1203501550000 AND ID lt 1203501560000
- Will find all EAs within locality number 155 in
the database or on the digital map-
18(cont.)
- Special coding conventions needed to be
developed, in cases where admin. and reporting
units are not hierarchical - In any case, consistency should be complete in
defining and using the administrative unit
identifiers, since they are the link between GIS
boundaries and the tabular census data. - Maintenance NSOs should maintain a Master List
of EA and admin. units and their respective codes
and report any changes made to the Master List to
the GIS and census databases.
19Census Hierarchies
Given Country
Country
Province
District
Locality
Enumeration Areas
Blocks
Building
Dwelling
20Coding Scheme
21Geocoding Classifications
- Disaggregation into Spatial Entities or Civil
Divisions and Compatibility
1st Region Province
2nd District Municipality
3rd Town/Village
4th Dwelling
- Resultant geocoded units placed within a set of
Latitude and Longitudinal boundaries
22Data Collection Methods
- Two main methods
- Direct Collection Approach
- Matching Approach
23Direct Collection Approach
- Digitizing from available topographic maps
- Direct collection using field techniques (ex.GPS)
Global Positioning System (GPS)
Digitizing from a topographic map
Areas, Street, Dwelling
24Matching approach
- Using an Address locator database and street
network database in a GIS - Joining an address database to an existing
spatial database for the area of interest
Street Network
Street Segment
Left of Street
First Avenue
Left of Street
First Avenue
1
99
51
address number
1
99
Main Street
2
100
32
2
100
Right of Street
Right of Street
Second Avenue
Second Avenue
Nodes
25Data Maintenance
- Cleaning Addresses
- Retaining only the key address elements
- Establish a Matchcode (indicator of which address
elements will determine the geocode) - Eliminating extraneous characters
- Standardizing Spelling
26Staff Expertise Recommendations
Task/condition Direct collection Matching
Existence of digital base map for country Highly desirable Highly desirable
Statistical staff with expertise in use of GPS Essential Not Essential
Acquisition of large numbers of GPS receivers Essential Not Essential
Geo-referenced list of addresses or equivalent Not Essential Essential
Excellent address matching algorithms Not Essential Essential
Existence of a rational, consistent, and locally-recognized addressing system for housing units Highly desirable Essential
27Geocoding Benefits for National Statistical
Offices
- Improved map creation for the field
- Customizable map outputs for specified regional
activities - Coding techniques are transparent and
transferable - Fixates the groundwork for future statistical
activities and coding schemes
28Concluding Remarks
- Technologies are accessible and allow delineation
irrespective of the existence of address - Many available methods and technologies exist to
support accurate geocoding frameworks - Geocoding system is value-added for GIS based
Spatial Analysis of Statistical Data
29Global Positioning Systems (GPS)
- Technology has revolutionized field mapping in
recent years - Prices of GPS receivers have dropped
- GPS methods have been integrated in many
applications - User groups are widespread (utilities management,
surveying and navigation). GPS has contributed
and advanced to improve field research in areas
such as biology, forestry, geology, epidemiology
and population studies
30Global Positioning Systems (cont.)
- GPS has become a major tool in census
cartographic applications - Preparation and updating of enumerator (EA) maps
for census activities - Location of point features such as service
facilities or village centers - Coordinates can be downloaded or entered manually
into a digital mapping system or GIS, and can be
combined with existing, georeferenced information
31How GPS Works
- GPS receivers collect the signals transmitted
from more than 24 satellites21 active satellites
and three spares. The system is called NAVSTAR,
and is maintained by the U.S. Department of
Defense - The satellites are circling the earth in six
orbital planes at an altitude of approximately
20,000 km. At any given time five to eight GPS
satellites are within the field of view of a
user on the earths surface - The position on the earths surface is determined
by measuring the distance from several satellites
32The global positioning system (GPS)
33The global positioning system (cont.)
- GPS satellites circle the Earth twice a day
- The satellite signal
- Three kinds of coded information essential for
determining a position - The receiver
- 1. Calculates the distance to the first satellite
user is able to catch. - 2. Calculates the distance to a second satellite
for which it is able to catch a signal. - 3. Repeats the operation mentioned under point 2
with a third satellite.
34How GPS determines a locations coordinates
a
b
c
35Sources of GPS signal errors
- Good visibility and bad visibility of satellites
due to obstacles - signal multipath
- Uncontrollable sources of error over which the
user does not have control - Atmosphere delays
-
- Receiver clock errors
- Orbital errors
36Differential GPS
37GPS Accuracy
- Inexpensive GPS receivers
- Within 15 to 100 meters for civilian
applications. - Differential GPS reduces error further
- Accuracy of about 3-10m can be achieved with
quite affordable hardware and shorter observation
times. - More expensive systems and longer data collection
for each coordinate reading can yield sub-meter
accuracy.
38Problems with GPS
- In dense urban settings, the possible error of
standard GPS (standard 15m up to 100 meters) may
not be sufficient - Differential GPS can be used for cross-checking
GPS readings with other data sources - published maps
- aerial photographs
- sketch maps produced during fieldwork
39Selecting a GPS Unit
- Commercially available GPS receivers vary in
price and capabilities - Technical specifications determine the accuracy
by which positions can be achieved - The more powerful a receiver, the more expensive
it will be - In many mapping applications, the accuracy of
standard systems is quite sufficient - Receivers also vary in terms of
user-friendliness, tracking capabilities which
are useful in navigation
40Summary Advantages and Disadvantages of GPS
- Advantages
- Fairly inexpensive, easy-to-use field data
collection - Modern units require very little training for
proper use - Collected data can be read directly into GIS
databases minimizing intermediate data entry or
data conversion steps - Worldwide availability
- Sufficient accuracy for many census mapping
applicationshigh accuracy achievable with
differential correction
41Summary Advantages and Disadvantages of GPS
- Disadvantages
- Signal may be obstructed in dense urban or wooded
areas - Standard GPS accuracy may require differential
techniques - Differential GPS is more expensive, requires more
time in field data collection and more complex
post-processing to obtain more accurate
information - A very large number of GPS units may be required
for only a short period of data collection.
42Wheres your Datum
43Geocoding Classifications (cont.)
- Initial creation of Civil Divisions through
digitizing or segmentation/pixel based-approaches - Low to Zero levels of sampling through the
accurate placing of coded units, but flexible
enough to include changes - Appropriate detail that fits with the boundaries
of a geographic area for a given country