Title: Data Storage and Editing
1Data Storage and Editing
- (Entity and attribute) DeMers Chapter 6
- http//www.iupui.edu/jeswilso/g438/lecture5/
2Introduction
- Any analysis performs must be based on good data,
correctly organized and in the proper format. - In raster, we may need to display each coverage
to isolate illogical or out-of-place grid cells
as we compare them to the input document - In vector systems, we may have to build in
topology after the initial data input, to
pinpoint any digitization errors - In case of entity-attribute agreement, we may
need to output sample portions of our map for
comparison against the original input material
3Storage of GIS Databases
- Raster Attribute values for grid cells are the
primary data stored in the computer. Values make
up the actual grid and positions of grid cells
catalogued relative to the order in which they
appear e.g., if you store the origin of the
grid, cell size, and number of rows and columns,
all you need is the cell values - Vector Common for GISs to store vector entities
and associated attributes in separate files
(reason for RDBMS). For example, in ArcView shape
file format, entities are stored in one file,
attribute in another, and projection info in a
third file and Arc/Info Coverage ( workspace,
entity directory, info directory ) - Tiling - storage of individual sections (tiles)
in predefined subsections. The purpose is to
reduce volume of data needed for analysis of any
particular section e.g., quad boundaries, TR
grid, etc.
4The Importance of Editing the GIS Database
- Most errors result from improper input
- Generally, at least some errors will always occur
and require editing, e.g., pushing the wrong
digitizer button (vertices instead of node),
pushing the wrong keyboard button when entering
attribute information, and position errors in
digitizing (shaky hand) - 3 general types of error
- Entity error - (position error), primarily
associated with vector model (missing entities,
incorrectly placed entities, disordered entities)
- Attribute error ( occurs in both vector and
raster models, typing errors, misspelling, etc. - Entity-attribute agreement error ( a.k.a.,
logical consistency, correctly type codes
attached to wrong entities)
5Accuracy
- The degree to which information on a map or in a
digital database matches true or accepted values - An issue pertaining to the quality of data and
the number of errors contained in a data set or
map - It is possible to consider horizontal and
vertical accuracy with respect to geographic
position - Attribute accuracy - conceptual, and logical
accuracy - Level of accuracy required for particular
applications varies greatly. Highly accurate data
can be very difficult and costly to produce and
compile - e.g., mapping standards employed by the United
States Geological Survey (USGS) "requirements
for meeting horizontal accuracy as 90 per cent of
all measurable points must be within 1/30th of
an inch for maps at a scale of 120,000 or
larger, and 1/50th of an inch for maps at scales
smaller than 120,000."
6Accuracy Standards for Various Scale Maps
- 11,200 3.33 feet
- 12,400 6.67 feet
- 14,800 13.33 feet
- 110,000 27.78 feet
- 112,000 33.33 feet
- 124,000 40.00 feet
- 163,360 105.60 feet
- 1100,000 166.67 feet
7Accuracy Standards for Various Scale Maps
- 11,200 3.33 feet
- 12,400 6.67 feet
- 14,800 13.33 feet
- 110,000 27.78 feet
- 112,000 33.33 feet
- 124,000 40.00 feet
- 163,360 105.60 feet
- 1100,000 166.67 feet
8Precision
- Refers to the level of measurement and exactness
of description in a GIS database (e.g., number of
decimal places) - Precise locational data may measure position to a
fraction of a unit e.g. to the millimeter - Precise attribute information may specify the
characteristics of features in great detail - Important to realize, however, that precise
data--no matter how carefully measured--may be
inaccurate - Level of precision required for particular
applications varies greatly. Engineering projects
such as road construction require very precise
information measured to the millimeter.
Demographic analyses of marketing or electoral
trends can often make do with less, say to the
closest zip code or precinct boundary
9Why be concerned about error? - The Problems of
Propagation and Cascading
- Discussion focused to this point on errors that
may be present in single sets of data - Doing" GIS usually involves comparisons of many
sets of data. If errors exist in one or all of
the data layers, the solution to the GIS problem
generated from them may itself be erroneous - Inaccuracy, imprecision, and error may be
compounded in GIS that employ many data sources
10DIGITIZATION-continue
Tic
3
2
1
4
Geographic features
11Error Propagation and Cascading
- Occurs when one error leads to another
- Means that erroneous, imprecise, and inaccurate
information will skew a GIS solution when
information is combined - DeMers - "error prone data will lead to error
prone analysis" - e.g., if a map registration point has been
mis-digitized in one coverage and is then used to
register a second coverage - Result the second coverage will propagate the
first mistake - In this way, a single error may lead to others
and spread until it corrupts data throughout the
entire GIS project
12Entity Errors Vector
- Six categories identified by DeMers/ESRI
- All entities that should have been entered are
present - No extra entities have been entered
- Entities are in the right place and are of the
correct shape and size - Entities that are supposed to be connected to
each other are all polygons have a single label
point which identifies them - All entities are within the outside boundary
identified
13Nodes and Vertices
- Specific types of entity errors in vector GIS
- can involve points, lines, polygons, nodes,
vertices, label points - nodes - denote ends of lines or point where
polygon closes on itself - vertices - denote change or direction within a
line - points -gt lines -gt polys
- Nodes are used to show specific topological
relationships, e.g. - intersection of roads or streams
- intersection between stream and lake
- node errors include pseudo nodes and dangle
nodes
14Pseudo nodes
- Occur where lines connect with itself or other
line - A line connects with itself to form a polygon,
a.k.a. island pseudo node (fig. 6.1a, p. 161) - Also occur where two lines intersect (rather than
crossing) (fig. 6.1b) - Pseudo nodes are not necessarily errors, but
indicate the potential location of errors - e.g., pseudo node in the middle of a line
representing a node can be used to separate road
into two different speed limit zones - Others may indicate error, (pushed wrong button
when digitizing, placed cursor at wrong location)
15Digitization errors- Pseudo node (Diamond)
Pseudo node connects two and only two arcs
Pseudo node Not representing a serious errors
Pseudo node
Error
16Dangle nodes
- A single node connected to a single line
- Again, not necessarily and error, but may be
- Can result from three possible mistakes (fig.
6.2, p. 162) - Failure to close a polygon
- Undershoot
- Overshoot
- Sometimes result from incorrect placement,
sometimes from fuzzy tolerance and snapping
distance - One method of general error detection is
comparing digitized to original document at
equivalent scales good for broad scale
obvious errors, not for finer scale errors
17DIGITIZATION
- For linear features such as rivers, roads,
railways it is important to digitize each section
separately (start node and end node at a
specified section) or use Route latter
Node
18Digitization errors - Dangle Error (square)
Overshoot
Closed polygon
Undershoot
Natural feature
Acceptable dangle node e.g. end of roads
19Label point and sliver errors
- Polygon label point errors ( points -gt lines -gt
polys) - Label point is used to associate a polygon with
attributes - If label point is missing, or there are more than
one, indicates error e.g., fig. 6.4, p. 163 - Sliver polygon errors
- Commonly result from incorrect practice of double
digitizing - Can also result from overlay or merging
operations which join coverages from different
sources - Can be removed manually or by dissolving polygons
less than a certain area and/comparing intended
number of polys with actual number
(Fig. 6.5, p.164)
20Digitization errors-Labels
- Missing labels or too many labels
too many labels
missing labels
21Sliver polygon errors
22How to correct digitization errors?
- List digitization errors using the command
(Nodeerrors and Labelerrors) - Using ARCEDIT to edit the coverage then use the
commands (edit feature (ef) e.g. ef label, ef
node, ef arc - Use a series of commands such as nodesnap,
arcsnap, reshape, split, add, delete, move, copy,
rotate, extend, and unsplit - For labels use Createlabels
23Topology
- Topology is the process of projecting complex
surfaces to a simple ones - Topology is a procedure for explicitly defining
spatial relationships connecting adjacent
features (e.g., arcs, nodes, polygons, and
points). - Different types of spatial relationships are
expressed as lists of features e.g. - An area is defined by the arcs comprising its
border - An arc is defined by set of points (X,Y)
24Topology-Main Concepts
- The three major topological concepts are
- Connectivity Arcs connected to each other at
nodes - Contiguity/Adjacency Arcs have direction and
left and right sides - Area Definition Arcs connected to surround an
area define a polygon (area)
25Spatial Relationships(Topology)
Area Definition
Connectivity
Adjacency
26PolygonTopology
27Advantages of Topology
- Check for digitization errors (overshoot,
undershoot, unclosed polygon, missing labels, too
many labels) - Store data more efficiently (eliminate data
redundancy-normalization) - Make spatial analysis more faster
28Topology
- Topological data structures dominate GIS
software. - Topology allows automated error detection and
elimination. - Rarely are maps topologically clean when
digitized or imported. - A GIS has to be able to build topology from
unconnected arcs. - Nodes that are close together are snapped.
- Slivers due to double digitizing and overlay are
eliminated.
29Creating topology in Arc/Info
- After digitization and correction to digitization
errors topology can be built - The command BUILD is used for point, line, or
polygon coverages - The command CLEAN is used for line and polygon
coverages - CLEAN never create topology for point coverage
- BUILD never detect intersection of arcs and
polygons
30Topology commands
- C\ARC CLEAN in-cov out-cov dangle-length
fuzzy-tol - C\ARC CLEAN road1 road2 3.4
- C\ARC BUILD in-cov POLY/ LINE/ POINT
- C\ARC BUILD cities POINT
- For features that have no intersection such as
contours, BUILD with line option can be used - For features that have intersection such as roads
and lots, it is better to first use CLEAN and
then use BUILD
31Tables created by topology
- Arc Attribute Table (AAT)
- Polygon Attribute Table (PAT)
- Point Attribute Table (PAT) Area and perimeter
0 - Route Attribute Table (RAT)
- Feature Attribute Table (FAT)
- Node Attribute Table (NAD)
32Hint for topology
- Make a copy of the original data before start
building topology - Make a known strategy for naming of the coverages
- For example, names of raw coverages start with R
e.g Rroads and Rlanduse - Keep coverage names less than or equal 8
characters and without extension (8.3)
33Coordinate Transformation
- The tablet coordinates must be converted to real
world (map) coordinates - The commands that used for coordinate
transformation are - CREATE or GENERATE - used to create a master
coverage - The (X,Y) of the tic file (Tic.dbf) must be set
to map coordinates. - TRANSFORM - used to transform the coverage
34Coordinate Transformation-continue
- Latitude (Ø) and longitude (?) must be converted
to Decimal degrees (DD) e.g. Latitude 13 deg
45 min/6055/360 - Project the decimal degrees to plane coordinate
e.g. UTM
(50,80)
(5,8)
Map coordinates
Digitizer coordinate
(0,0)
(0,0)
35Generate
- Generate can create a coverage from raw
coordinates (Id, X,Y) e.g. from GPS - Create a file of tic coordinates e.g. Tic1 which
is ACII with (TICID, X, Y) - Create a file of polygon coordinates e.g. poly1
- GENERATE INPUT Tic1 TICS
- GENERATE INPUT Poly1 POLYS Quit
36Attribute Errors Raster and Vector
- Attribute errors generally more difficult to
detect - Types include
- Missing attributes perhaps only kind of
attribute error traceable without comparison to
source material e.g., plot all polygons and color
them according to a certain attribute, if color
is missing, attribute is missing - Incorrect attribute values or text more
difficult to detect one method is to plot all
polygons and color them according to a certain
attribute, if only one polygon has a certain
attribute and there should be other, it may stick
out, in general, involves direct comparison with
source material)
37Dealing With Projection Changes
- Often times, regardless of input method, separate
GIS data input for a project will be based on
different projection systems - Necessary to transform all data to common system
before use in integrated modeling examples in
ArcView - Joining Adjacent Coverages Edge Matching
(Union) - Joining two adjacent coverages (usually of the
same theme) together to produce a single data set
that covers a broader region edge matching also
done in raster systems
38Conflation and Rubber Sheeting
- Conflation and Rubber Sheeting Refers to the
registration (georeferencing) of two maps (vector
or raster) in a non-linear way (Ovelay two maps) - Used to make maps of different sources spatially
correspond with one another. Most often used in
raster data using ground control points (GCPs).
Conflation and rubber sheeting are synonymous
terms according to DeMers (Figure 6.1, p. 174) - The need to geo-reference internal objects
themselves not just the map corners (Rubber
Sheeting) - Templating " cookie cutting"
- If you have multiple coverages of different
extents, the template is used to "cookie cut"
them all to the same extent
39Exercise
- Characteristics of data storage in raster and
vector - 3 general types of error in spatial databases
- Accuracy vs. precision
- Error propagation and cascading of error in GIS
- Types of errors in vector GIS
- Types of errors in attribute data
- The concept of topology - what is it, what types
of Relationships are stored for point, line, and
poly features, why do we need it in GIS?