Title: Data Quality
1Data Quality
- GiGo garbage in, garbage out
- Cos its in the computer, dont mean its right
Its not the things you dont know that matter,
its the things you know that arent
so. Will Rogers Famous Okie GI
specialist
But there are also unknown unknowns the ones we
don't know we don't know. Donald Rumsfeld
2Murphys Laws of Mapmaking
- Cardinal Postulates
- area desired by the user has not yet been mapped.
- if mapped, area straddles zone boundaries--or at
least map sheets - if on one sheet, sheet is scheduled for update
next year last update was 1901 - Corollary for GIS
- area desired by user is still in paper-map form
- if in GIS, recorded with X-Y coordinates and
straddles zone boundaries--or at least map tiles - if one tile, projection unknown and no
information on date of creation and/or last
update - Conclusion GIS is not a panacea!
3Horwoods Short Laws on Data
- Dr. Edgar Horwood, founder of the Urban and
Regional Information Systems Association (URISA)
and Professor of Civil Engineering and Urban
Planning at the University of Washington was an
early pioneer of computer mapping in the early
1960s. - Good data are the data you already have.
- Bad data drives out good.
- The data you have for the present crisis was
collected to relate to the previous one. - The respectability of existing data grows with
elapsed time and distance from the source of the
data. - Data can be moved from one office to another but
cannot be created or destroyed. - If you have the right data, you have the wrong
problem and vice versa. - The important thing is not what you do but how
you measure it. - In complex systems there is no relationship
between the information gathered and the decision
made. - The acquisition of knowledge from experience is
an exception. - Knowledge grows at half the rate at which
academic courses proliferate. -
For more information, go to http//urisa.org/pre
v/GIS_Hall_of_Fame/halloffame.htm
4Data Quality How good is your data?
- Scale
- ratio of distance on a map to the equivalent
distance on the earth's surface - Primarily an output issue at what scale do I
wish to display? - Precision or Resolution
- the exactness of measurement or description
- Determined by input can output at lower (but not
higher) resolution - Accuracy
- the degree of correspondence between data and the
real world - Fundamentally controlled by the quality of the
input - Lineage
- The original sources for the data and the
processing steps it has undergone - Currency
- the degree to which data represents the world at
the present moment in time - Documentation or Metadata
- data about data recording all of the above
- Standards
- Common or agreed-to ways of doing things
- Data built to standards is more valuable since
its more easily shareable
5Scale
- ratio of distance on a map, to the equivalent
distance on the earth's surface. - Large scale --gtlarge detail, small area covered
(1200 or 12,400) - Small scale --gtsmall detail, large area
(1250,000) - A given object (e.g. land parcel) appears larger
on a large scale map - scale can never be constant everywhere on a map
cos of map projection - problem is worst for small scale maps certain
projections (e.g. mercator) - can be true from a single point to everywhere
- can be true along a line , or a set of lines
- on large scale maps, adjustments often made to
achieve close to true scale everywhere (e.g
State Plane and UTM systems) - scale representation
- Verbal (good for interpretation.) 0ne inch each
equals one statute mile - representative fraction (RF) 1 63,360(good for
measurement)(smaller fractionsmaller scale - 12,000,000 smaller than 12,000)
- scale bar(good if enlarged/reduced)
use them all on a map!
6Scale Examples
- Common Scales
- 1200 (116.8ft)
- 12,000 (156 yards 1cm20m)
- 120,000 (5cm1km)
- 124,000 (12,000ft)
- 125,000 (1cm.5km)
- 150,000 (2cm1km)
- 162,500 (1.6cm1km 1.986mi)
- 163,360 (11mile 1cm.634km)
- 1100,000 (11.58mi 1cm1km)
- 1500,000 (17.9mi 1cm5km)
- 11,000,000(115.8mi 1cm10km)
- 17,500,000(1118mi) 1cm750km)
- Large versus Small
- large above 112,500
- medium 113,000 - 1126,720
- small 1130,000 - 11,000,000
- very small below 11,000,000
- ( really, relative to whats available for a
given area Maling 1989) - Map sheet examples
- 124,000 7.5 minute USGS Quads
- (17 by 22 inches 6 by 8 miles)
- 17,500,000 US wall map
- (26 by 16 inches)
- 120,000,000 US 8.5 X 11
7Scale, Resolution Accuracy in GIS Systems
- On paper maps, scale is hard to change, thus it
generally determines resolution and accuracy--and
consistent decisions are made for these. - A GIS is scale independent since output can be
produced at any scale, irrespective of the
characteristics of the input data at least in
theory - in practice, an implicit range of scales or
maximum scale for anticipated output should be
chosen and used to determine - what features to show
- manholes only on large scale maps
- how features will be represented
- manhole a polygon at 150 cities a point at
11,000,000 - appropriate levels for accuracy and precision
- Larger scale generally requires greater
resolution - Larger scale necessitates a higher level of
accuracy - GIS also helps with the the generalization
problem implicit in paper maps - A road drawn with 0.5 mm wide line (the smallest
for decent visibility) - At 124,000 implies the road is 12 meters (36
feet) wide - At 1250,000 implies the road is 125 meters (375
feet) wide - At least in a GIS you can store the true road
width, but be careful with plots!
8Precision or Resolution its not the same as
scale or accuracy!
- Precision the exactness of measurement or
description - the size of the smallest feature which can be
displayed, recognized, or described - Can apply to space, time (e.g. daily versus
annual), or attribute (douglas fir v. conifer) - for raster data, it is the size of the pixel
(resolution) - e.g. for NTGISC digital orthos is 1.6ft (half
meter) - raster data can be resampled by combining
adjacent cells - this decreases resolution but saves storage
- eg 1.6 ft to 3.2 ft (1/4 storage) to 6.4 ft
(1/16 storage) - resolution and scale
- generally, increasing to larger scale allows
features to be observed better and requires
higher resolution - but, because of the human eyes ability to
recognize patterns, features in a lower
resolution data set can sometimes be observed
better by decreasing the scale (6.4 ft
resolution shown at 1400 rather than 1200) - resolution and positional accuracy
- you can see a feature (resolution), but it may
not be in the right place (accuracy) - higher accuracy generally costs much more to
obtain than higher resolution - accuracy cannot be greater (but may be much less)
than resolution (e.g. if pixel size is one meter,
then best accuracy possible is one meter)
9Accuracy rests on at least four legs, not one!
- Positional Accuracy (sometimes called
Quantitative accuracy) - Spatial
- horizontal accuracy distance from true location
- vertical accuracy difference from true height
- Temporal
- Difference from actual time and/or date
- Attribute Accuracy or Consistency-- the validity
concept in experimental design/stat. inf. - a feature is what the GIS/map purports it to be
- a railroad is a railroad, and not a road
- A soil sample agrees with the type mapped
- Completeness--the reliability concept from
experimental design/stat. inf. - Are all instances of a feature the GIS/map claims
to include, in fact, there? - Partially a function of the criteria for
including features when does a road become a
track? - Simply put, how much data is missing?
- Logical Consistency The presence of
contradictory relationships in the database - Non-Spatial
- Some crimes recorded at place of occurrence,
others at place where report taken - Data for one country is for 2000, for another its
for 2001 - Annual data series not taken on same day/month
etc. (sometimes called lineage error)
10Sources of ErrorError is the inverse of
accuracy. It is a discrepancy between the coded
and actual values.
- Sources
- Inherent instability of the phenomena itself
- E.g. Random variation of most phenomena (e.g.
leaf size) - Measurement
- E.g. surveyor or instrument error
- Model used to represent data
- E.g. choice of spheroid, or classification
systems - Data encoding and entry
- E.g. keying or digitizing errors
- Data processing
- E.g. single versus double precision algorithms
used - Propagation or cascading from one data set to
another - E.g. using inaccurate layer as source for another
layer
- Example for Positional Accuracy
- choice of spheroid and datum
- choice of map projection and its parameters
- accuracy of measured locations (surveying) of
features on earth - media stability (stretching ,folding, wrinkling
of maps, photos) - human drafting, digitizing or interpretation
error - resolution /or accuracy of drafting/digitizing
equipment - Thinnest visible line 0.1-0.2 millimeters
- At scale of 120,000 6.5 - 12.8 feet
- (20,000 x 0.2 4,000mm 4m 12.8 feet)
- registration accuracy of tics
- machine precision coordinate rounding error in
storage and manipulation - other unknown
11Measurement of Positional Accuracy
- usually measured by root mean square error the
square root of the average squared errors - Usually expressed as a probability that no more
than P of points will be further than S distance
from their true location. - Loosely we say that the rmse tells us how far
recorded points in the GIS are from their true
location on the ground, on average. - More correctly, based on the normal distribution
of errors, 68 of points will be rmse distance or
less from their true location, 95 will be no
more than twice this distance, providing the
errors are random and not systematic (i.e. the
mean of the errors is zero) - e.g. for NTGISC digital orthos RMSE is 3.2 feet
(one meter) - for USGS Digital Ortho Quads RMSE spec. is
approx. 33 feet or 10 meters (but in reality
much better) - -- with GPS, height is 2 or 3 times less
accurate in practice at high precisionthan
horizontal (officially the spec is 1.5, but data
collection errors affect vertical the most) -
12Positional Accuracy
13National Map Accuracy Standards 1941/47
- established in 1941 by the US Bureau of the
Budget (now OMB) for use with US Geological
Survey maps (Maling, 1989, p. 146) - horizontal accuracy not more than 10 of tested,
well defined points shall be more than the
following distances from their true location - 162,500 1/50th of an inch (.02)
- 124,000 1/40th of an inch (amended to
1/50.02 in 1947) - 112,000 1/30 of an inch (.033)
- Thus, on maps with a scale of 163,360 (11
mile) 90 - of points should be within 105.6 feet (63360 X
.02)/12) of their true location. - on USGS quads with a scale of 124,000
(12,000ft) 90 of points should be within 40
feet (24,000 X .02)/12 of their true location. - on a map with a scale of 112,000 (11,000ft),
90 of points should be within 33 feet (1,000 X
.033), approx. 10 meters - gives rise to the loose, but often used,
statement that the NMAS is 10 meters - Inadequate for the computer age
- how many points? how select?
- how determine their true location
- what about attribute completeness?
- Unfortunately, the new standard doesnt
address all these issues either
14National Standard for Spatial Data Accuracy
(NSSDA)1998
- Geospatial Positioning Accuracy Standard
(FGDC-STD-007) - Part 3, National Standard for Spatial Data
Accuracy FGDC-STD-007.3-1998 - replacement for National Map Accuracy Standard
of 1941/47 - specifies a statistic and testing methodology
for positional (horizontal and vertical) accuracy
of maps and digital data - no single threshold metric to achieve (as with
old Standard), but users encouraged to establish
thresholds for specific applications - accuracy reported in ground units (not map units
as in 1941 standard 1/30th inch) - testing method compares data set point coordinate
values with coordinate values from a higher
accuracy source for readily visible or
recoverable ground points - altho. uses points, principles apply to all
geospatial data including point, vector and
raster objects - other standards for data content will adopt NSSDA
for particular spatial objects - copies of the standard available at
http//www.fgdc.gov - Accuracy Standard has 7 parts, of which parts 4-7
apply to specific data types
15GPS and Positional Accuracy
- Global Positioning System satellite positioning
with WAAS (wide area augmentation system)
adjustment gives positional accuracy within about
3 meters (10ft). - This is more accurate than most printed maps and
nautical charts! - It is also more accurate than most digital maps
and charts since these often derive from paper
maps and surveys conducted prior to GPS - Your integrated GPS/digital chart can show you
nicely heading down the center of a channel, but
positional inaccuracy in the chart can leave you
grounded!
16SummaryResolution, Scale, Accuracy
Storageillustrating the relationship
Largest (maximum) scale for given pixel
size. Storage is for USGS 7.5 quad. area (in
Texas, USGS quad is about 7 mi x 8.5 mi60 sq.
miles--16 quads for Dallas County) Source
GPS Technology Corporation
17Examples of Accuracy
- Go to quality_graphics.ppt
18Lineage
- identifies the original sources from which the
data was derived - details the processing steps through which the
data has gone to reach its current form - Both impact its accuracy
- Both should be in the metadata, and are required
by the Content Standard for Metadata (see below) - Michael Goodchild ( the guru of GIS) advocates
- Measurement-based GIS, in which how data
collected and how measurements made are a part of
the record (as in surveying) - Coordinate-based GIS, is the current approach,
and it tracks none of this. - (see Shi, Fisher and Goodchild Spatial Data
Quality London Taylor and Frances, 2002)
19Currency Is my data up-to-date?
- data is always relative to a specific point in
time, which must be documented. - there are important applications for historical
data (e.g. analyzing trends), so dont
necessarily trash old data - current data requires a specific plan for
on-going maintenance - may be continuous, or at pre-defined points in
time. - otherwise, data becomes outdated very quickly
- currency is not really an independent quality
dimension it is simply a factor contributing to
lack of accuracy regarding - consistency some GIS features do not match
those in the real world today - completeness some real world features are
missing from the GIS database
Many organizations spend substantial amounts
acquiring a data set without giving any thought
to how it will be maintained.
20Standards common agreed-to ways of doing
things
- May exist for
- Data itself including process (the way its
produced) and product (the outcome) - Utilities Data Content Standard,
FGDC-STD-010-2000 - Accuracy of data
- Geospatial Positioning Accuracy Standard, Part 3,
National Standard for Spatial Data Accuracy,
FGDC-STD-007.3-1998 - Documentation about the data (metadata)
- Content Standard for Digital Geospatial Metadata
(version 2.0), FGDC-STD-001-1998 - Transfer of data and its documentation
- Spatial Data Transfer Standard (SDTS),
FGDC-STD-002 - For symbology and presentation
- Digital Geologic Map Symbolization
- May address
- Content (what is recorded)
- Format (how its recorded file format, .tif,
shapefile, etc) - May be a product of
- An organizations internal actions private or
organization standards - An external government body (Federal Geographic
Data Committee) or third sector body (Open GIS
Consortium) public or de jure standards - Laissez-faire market-place-forces leading to one
dominant approach e.g. Wintel standard
industry or de facto standards
http//www.fgdc.gov/standards/standards.html
21Who Sets Public Standards ?
- Federal Geographic Data Committee
- Sets standards for geospatial data which all
federal agencies are required to follow - Has representatives from most federal agencies
- National Institute for Standards and Technology
(NIST) sets federal gov. standards for other
things (e.g. IT in general) - national standards bodies
- American National Standards Institute (ANSI)
- has the USs single vote at ISO
- United States InterNational Committee on
Information Technology Standards (INCITS) handles
IT standards for ANSI - Several FGDC standards been submitted for
approval - Most countries in the world have their equivalent
to ANSI - international standards bodies
- ISO (International Organization for
Standardization) - other assorted vendor groups, professional
associations, trade associations, and consortia - Open GIS Consortium (OGC) is the main player in
GIS
22The Process for Setting de jure standards!
Source URISA News Issue 197, Sept/Oct. 2003
Go to the following web site for excellent
overview of standard making process http//www.fg
dc.gov/publications/documents/standards/geospatial
_standards_part1.html
23Adopting Standards What you should do
- Data quality achieved by adoption and use of
standards Do it! - Common ways of doing things essential for using
sharing data internally and externally - only federal agencies required to use FGDC
standards, its optional for any others (e.g.
state, local) - power of feds often results in adoption by
everybody, although there are some noted failures
(e.g.the OSI, GOSIP, POSIX standards in
computing in the 1980s failed and were withdrawn) - FGDC or ISO standards provide excellent starting
point for local standards, and should be adopted
unless there are compelling reasons otherwise - Standards for metadata (documenting your data)
are the most important and should be first
priority. - Content Standard for Digital Geospatial Metadata
(version 2.0), FGDC-STD-001-1998 - ISO Document 19115 Geographic Information-Metadata
(content) and 19139, Geographic
InformationMetadataImplementation
Specification, (format for storing ISO 19115
metadata in XML format) - If not one of these standard for metadata, adopt
some standard!
24Content Standards for Digital Geospatial
MetadataWhat and Why?
- Metadata describes the content, quality,
format, source and other characteristics of data. - Allows you and others to
- Locate data (find, discover)
- Evaluate data (quality, restrictions, reputation)
- Extract (order, download, pay)
- Employ (apply, use)
- and automate this process.
25Main Sections of the US FederalContent Standard
for Digital Geospatial Metadata
- Identification
- Title? Area covered? Themes? Currency?
Restrictions? - Data Quality (5 aspects)
- Positional Attribute Accuracy? Completeness?
Logical Consistency? Lineage? - Spatial Data Organization
- Indirect? Vector? Raster? Type of elements?
Number? - Spatial Reference
- Projection? Grid system? Datum? Coordinate
system? - Entity and Attribute Information
- Features? Attributes? Attribute values?
- Distribution
- Distributor? Formats? Media? Online? Price?
- Metadata Reference
- Metadata currency? Responsible party?
- For more info, go to http//www.fgdc.gov/metadata
/contstan.html
By law (Executive Order 12906, 1994), all federal
agencies must document their data according
to Content Standard for Digital Geospatial
Metadata (version 2.0), FGDC-STD-001-1998
26Traditional Minimum Documentation Requirements
for Maps/GIS
- geodetic datum name (e.g NAD27)--which implies
- ellipsoid/spheroid name (earth model) e.g. Clark
1866 - point of origin (ties ellipsoid to earth) e.g
Meades Ranch - required for all GIS data bases and maps
- projection name and its parameters and its
measurement units - (see terrestrial lecture for exact details)
- Required for all maps since 2-D by nature
- Required for GIS if data is in X-Y projected
form - Source information
- accuracy standard(s) to which built
- author/publisher/creator name and/or data source
- date(s) of data collection/update, and of map/gis
creation - Cartographers demand all maps have
- north arrow
- map scale
- graticule indication
- at least four latitude/longitude tic marks, with
values in degrees - at least four X-Y tic marks, with values and
units of measurement (feet, meters, etc.) -
If GIS data in lat/long, must know datum. If GIS
data in XY, must know datum and projection info)
27Texas Standardshttp//www.dir.state.tx.us/tgic/pu
bs/pubs.htm
- Standards for digital spatial data (raster and
vector) for State agencies in Texas were
established in 1992 - http//www.dir.state.tx.us/tgic/pubs/gis-standards
.htm - Currently (2004), being reviewed by the Texas
Geographic Information Council (TGIC) for
possible update - Apply to map scales of 124,000 and smaller
(e.g., 1100,000 1250,000). - Cover variety of issues including data layers,
datum, projections, accuracy, metadata, etc.. - Two major planning reports on GIS in state gov.
in Texas are - Digital Texas 2002 Biennial Report on Geographic
Information Systems Technology - http//www.dir.state.tx.us/tgic/pubs/gift99-small.
pdf - Geographic Information Framework for Texas (1999)
- http//www.dir.state.tx.us/tgic/pubs/digtex-lowres
.pdf
28Importance of Standards
- Great Baltimore Fire of 1904 - fire engines from
different regions responded only to be found
useless since they had different hose coupling
sizes that did not fit Baltimore hydrants - fire
burned over 30 hours, resulted in destruction of
1526 building covering 17 city blocks. - Fire 1923 - Fall River, MA saved when over 20
neighboring fire department responded to a town
fire since they had standardized on hydrants and
hose couplings sizes. - 9/11 Response in NY and DC severely hampered by
- incompatibilities between GIS data sets, and
lack of data - Also, incompatibilities between communications
systems - The most important standard?
- Railroad track gauge - adopted by US, UK, Canada,
and much of Europe. - South America still hampered by differing
railroad gauges between countries.
29The Best Time to Adopt a Standard?
Now?
Now?
Before!
30Appendix
- FGDC Standards
- (status as of March 2004)
- For latest, go to
- http//www.fgdc.gov/standards/standards.html
31FGDC Metadata Standards
- Metadata
- Content Standard for Digital Geospatial Metadata
(version 2.0) FGDC-STD-001-1998 - Content Standard for Digital Geospatial Metadata,
Part 1 Biological Data Profile
FGDC-STD-001.1-1999 - Metadata Profile for Shoreline Data
(FGDC-STD-001.2-2001) - Content Standard for Digital Geospatial Metadata
extension for remote sensing data
(FGDC-STD-0012-2002) - Encoding Standard for Geospatial Metadata (Draft)
- Metadata Profile for Cultural and Demographic
Data (dropped)
Current thrust is to integrate FGDC Metadata
standards (and other FGDC standards eventually)
into International Standards Organization (ISO)
standards.
32FGDC Data Accuracy Standard
- Geospatial Positioning Accuracy Standard
(FGDC-STD-007) - Part 1, Reporting Methodology FGDC-STD-007.1-1998
- Part 2, Geodetic Control Networks
FGDC-STD-007.2-1998 - Part 3, National Standard for Spatial Data
Accuracy FGDC-STD-007.3-1998 - Part 4 Architecture, Engineering Construction,
and Facilities Management (FGDC-STD-007.4-2002), - Part 5 Standard for Hydrographic Surveys and
Nautical Charts (Review)
- An umbrella incorporating several accuracy
standards. - Part 3 is the general standard.
- It essentially updates the National Map Accuracy
Standard of 1941/47
33FGDC Data Content Standards
- Facility ID Data Standard, (Review)
- Address Content Standard (Review)
- US National Grid (FGDC-STD-0011-2001)
- Earth Cover Classification System, (draft)
- Geologic Data Model, (Draft)
- Governmental Unit Boundary Data Content Standard,
(Draft) - Biological Nomenclature and Taxonomy Data
Standard (draft) - National Hydrography Framework Geospatial Data
Content Standard (proposal) - Environmental Hazards Geospatial Data Content
Standard, (dropped) - NSDI Framework Data layers (under Reviewsee
next slide)
- Cadastral Data Content Standard FGDC-STD-003
- Classification of Wetlands and Deep Water
Habitats FGDC-STD-004 - Vegetation Classification Standard FGDC-STD-005
- Soils Geographic Data Standard, FGDC-STD-006
- Content Standard for Digital Orthoimagery,
(FGDC-STD-008-1999) - Content Standard for Remote Sensing Swath Data,
(FGDC-STD-009-1999) - Utilities Data Content Standard,
(FGDC-STD-010-2000) - NSDI Framework Transportation Identification
Standard, (Review) - Hydrographic Data Content Standard for Coastal
and Inland Waterways, (Review) - Content Standard for Framework Land Elevation
Data, (Review)
34FGDC Framework Data Standards
- establish data content requirements for the seven
layers of geospatial data that comprise the
National Spatial Data Infrastructure (NSDI), the
base layers needed for any geographic area
- geodetic control,
- elevation,
- Orthoimagery
- Hydrography (water)
- Transportation
- Cadastral (landownership)
- governmental unit boundaries
- Goals are to
- Facilitate and promote exchange of framework
layers between producers, consumers, and vendors
thru a common content and way of describing that
content - Lower the cost of data for everyone
- For each layer, specifies an integrated
application schema in Unified Modeling Language
(UML) including feature types, attribute types,
attribute domain, feature relationships, spatial
representation, data organization, and metadata - no standard specified for data format, but an
appendix describes a possible implementation
using the Geography Markup Language (GML) Version
3.0, developed through the Open GIS Consortium,
Inc. (OGC).
35FGDC Data Transfer Standards
- Spatial Data Transfer Standard (SDTS)
FGDC-STD-002 - SDTS, Part 1 Logical Specification (FIPSPUB
173-1, July 1994) - SDTS, Part 2 Spatial Features (FIPSPUB 173-1,
July 1994) - SDTS, Part 3 ISO 8211 Encoding (FIPSPUB 173-1,
July 1994) - SDTS, Part 4 Topological Vector Encoding (FIPSPUB
173-1, July 1994) - SDTS, Part 5 Raster Profile and Extensions
(FGDC-STD-002.5, 2000) - SDTS, Part 6 Point Profile, FGDC-STD-002.6, 2000
- SDTS Part 7 Computer-Aided Design and Drafting
(CADD) Profile (FGDC-STD-002.7, 2000)
- One of the first of the FGDC standards (along
with metadata). - Intended to facilitate transfers between
different GIS systems. - Competitive pressures plus internal weaknesses
hindered adoption.
36FGDC Data Symbology and Presentation Standards
- Digital Geologic Map Symbolization, (Review)