Title: Data Structures
1Data Structures
2Learning objectives
- In this section you will learn
- Understanding of important geographical data
structures and their relevance, - Importance of topology and its utility for
spatial analysis, - Relevance and implementation of thematic
information (attributes) - Recognition of thematic hierarchies and their
robust and efficient representation - Basic concepts of relational databases,
- Advantages and disadvantages of vector and raster
data structures.
3Outline
- Introduction
- Levels of creating a model
- Coding the basic data models for input to the
computer - Data organisation in vector data structures
- Object oriented data structures
- Data organisation in raster data structures
- Images
- Compact storage of raster data
- Storage of vector and raster data in DBMS
- Attributes
- Attributes and topological consistency
- Database management systems (DBMS)
- Vector versus raster data structures
- Data exchange, standardization
- Summary
4Introduction
- Data models are independent of a specific
implementation in a GIS. Also analogue maps are
based on models of the area. - digital coding of the information, in several
stages from the data model, data structures, data
types up to their binary representation - We need efficient data structures, that
- represent the selected data models completely and
unambiguously, - are robust,
- efficiently support the desired analyses and
- use storage space economically.
5Levels of creating a model
- A view of reality the conceptual model.
- Human conceptualization leading to an analogue
abstraction (analogue model). - A formalization of the analogue abstraction
without any conventions or restrictions of
implementation (spatial data model) - A representation of the data model that reflects
how the data are recorded in the computer
(database model) - A file structure (physical computational model)
- Accepted axioms and rules for handling the data
- Accepted rules and procedures for displaying and
presenting spatial data (graphical model)
6Coding the basic data models for input to the
computer
- discrete primitives of geographical data to
create entities as well as continuous fields - Representation of position, relationships and
attributes of different types - Points, lines and polygons are the spatial
primitives in a vector model, pixels in raster
models. - Spatial relationships between entities are
defined by topology
7Data organisation in vector data structures
- Vector-based geographical databases are composing
a complex theme of several layers, each of which
combines a certain class of phenomena. - E.g. hydrological map rivers, lakes, observation
sites, land use, etc., each in a separate layer - Layer (Coverage) consists of entities of one
type, with relationships, different attributes - Layers are handled independently from each other
- E.g. data structure does not force a gauge to be
on the river bank - Vector-GIS use implicit relations (tables) for
storage - Software packages use different structures.
8Points
Data organisation in vector data structures...
- Position of points is defined by a single pair of
coordinates (X, Y) - Additional info type of point, attributes
- Layer of point entities created from simple table
- E.g. event theme in ArcView
9Lines
Data organisation in vector data structures...
- Sequence of (X, Y) coordinate pairs and
connecting straight lines or curves
10Networks
Data organisation in vector data structures...
- Information about connectivity with other line
entities to represent street networks, utilities,
rivers - topological information in the data structures ?
connectivity tables - Topological terms node and arc
- arc-node topology
11Polygons
Data organisation in vector data structures...
- Shape, neighbours, hierarchy
- simple polygons sequence of x,y coordinate pairs
- Border lines between polygons are digitized and
stored twice. Error gaps, overlaps - No information on neighbourhood.
- islands only graphically represented.
- Difficult to validate
12Polygons
Data organisation in vector data structures...
- Polygons with point dictionaries
- Points with ID in a list (table)
- Polygons as ordered list of pointers into list
- Borders between neighbouring polygons are unique.
- no information on neighbourhood
- No islands
13Polygons
Data organisation in vector data structures...
- Polygons by arc-node-topology
- Underlying principle free of redundancy
14Storage without redundancy
Data organisation in vector data structures...
- Sequential transformation of coordinates
- No change of topology
- E.g. plans of Vienna metro
15Triangulation of continuous fields
Data organisation in vector data structures...
- Node list and triangle list
- Optimal TIN by Delaunay criteria
16Triangulation of continuous fields
Data organisation in vector data structures...
- efficient TIN models select nodes in order to
represent surface with a minimum number of points
? dense points where relief of surface is rapidly
varying, scarce points in flat areas - Consideration of breaklines
17Object oriented data structures
- Object oriented systems encapsulate data objects
together with methods applicable to them. Access
to objects is only done by the methods defined
for them - Data structures get a defined behaviour
- Hydrant should delete itself, when the last
pipe connecting it to the network is removed. - inheritance
- Structural object orientation capability, to
create composed objects (Arc Hydro) - Behavioural object orientation behaviour of data
types with specifically defined functions and
procedures - CASE-Tools (Computer Aided Software Engineering)
for design and implementation
18Object oriented data structures
- UML (unified modeling language) diagram of a part
of a geodatabase
19Object oriented data structures
20Data organisation in raster data structures
- Raster resembles photo
- 3 ways to interprete a pixel
- classification a range of values is allocated to
certain objects (gray pixels are roads, blue
pixels are water surfaces,...). - measure the value intensity of a colour,
concentration, etc. - relative height over reference height.
21Raster data structure
Data organisation in raster data structures
- Position is represented by discrete cells
- Types of raster maps
- Nominal data like land use (forest, grassland,
farmland, ...) - Continuous values, concentration, light intensity
- relative measures like elevation.
22Raster data structure
Data organisation in raster data structures
- Entities also in raster model
- Cell size determines resolution cell size max.
50 of smallest recognized object
23Raster data structure
Data organisation in raster data structures
- Topology described implicitly by raster
- Cell raster point raster
24Images
- Pictures can be used as map displays (e.g.
satellite image, orthophoto) or also as attribute
information (e.g. pictures of the measuring
instruments linked to the measuring points on a
measuring point map, photo of the houses in the
information system of a real estate agent).
25Storage of images
Images
- Similar to raster maps, with some specific
properties - Pixel (from picture element)
- Economical storage
- 1, 8, 24 or 32 bit for coding of a colour value
- Number of bits per pixel colour depth
- monochrome, grayscale, RGB, CMY, CMYK
- Use of a lookup table
26Georeferencing of images
Images
- Most image formats, even purely graphic formats
usable (e.g. TIFF, JPEG, PCX, GIF, BMP, etc. ) - If images is oriented along coordinate
axesworld-file, header - Georeferencing can be complex
- distortions, e.g. in images from airborne sensors
27Georeferencing of images
Images
- Depending on the orientation of the coordinate
system, objects equal in nature are represented
differently in the raster model - If distortions of image are small (flat terrain)
?georeferencing by affine projection with gt 4
ground control points (polynomial
transformation). - Pixel are re-computed by interpolation or areal
averaging, according to the type of variable ?
loss of information
28Compact storage of raster data
- Storing full matrices is not economical
- Position, resolution and dimension of a raster
are defined in a header
29Storage without loss of information
Compact storage
- Chain codes
- Cityblock, chequer-board
30Storage without loss of information
Compact storage
- Run-length encoding
- Block codes are an extension of run-length
encoding in 2 dimensions
31Storage without loss of information
Compact storage
- Quadtrees
- Hierarchical data structure
- Parts can be addressed and read directly
- Read only for required resolution
32Storage without loss of information
Compact storage
- Two-dimensional orderings
- Schemes for order of pixels, to avoid reading
excessive data if an arbitrary part of the image
is needed
33Compression of raster data with loss of
information
Compact storage
- Wavelet approximation JPEG not appropriate for
GIS - ECW (Enhanced Compressed Wavelet)
- MrSID (Multi-resolution Seamless Image Database)
34Storing vector and raster data in DBMS
- Efficient access to large volumes of data with
complex relationships - B-Trees
- R-Trees
- Use of DBMS
- Geo-relational model
- Hybrid concept geometry data in vendor-specific
binary format, attributes in RDBMS (INFO, ORACLE,
INGRES, INFORMIX, MS ACCESS) - Storage of attribute data independently from
spatial data - extension, updating, deletion of attribute data
do not influence spatial data - Commercial RDBMS ensure use of latest
developments and standardisation - Use of standard query language like SQL
35Attribute information
- Geoinformation is based on two main elements
- Geometry and topologie (Question Where?) and
- Thematic information (attributes) (Question
What?). - Approach via thematic maps
- Analogy of transparencies
36Attribute information in a raster model
Attribute information
- Geometry and topology determined by definition of
the raster (origin, resolution) - Attribute information is additional dimension
- Spatial query, thematic query, and combined query
37Thematic information in vector models
Attribute information
- A theme is assigned to each topological object
(node, edge, polygon) by one or more attributes
(often tied to a label point) - MN relationship between different thematic
layers ? resolve into n units with 1M
relationship
38Thematic hierarchy
Attribute information
- Hierarchy A catchment of order 1 (counted from
confluence), consists completely and uniquely of
catchments of order 2, etc. - Similarly administrative units Province(Land)
County (Bezirk) Community (Gemeinde)
39Thematic hierarchy
Attribute information
- Appropriate data model
- Simple query of information on each level
- consistent information
- Layer structure separate, technically
independent layers for each level - Tree structure Hierarchy is modelled by building
objects from smallest units (communities ?
counties ? provinces) - Decision, which approach is implemented, depends
on problem and application
40Thematic hierarchy
Attribute information
- Layer structure each layer is a self-contained
dataset (e.g. shapefile), technically independent
from other levels ? quick, direct query, possibly
inconsistent - Tree structure Information on objects in higher
level must be aggregated on demand ? more
compute-intensive, but always consistent
41Thematic attributes
Attribute information
- thematic attributes quantitatively classify
objects, e.g., a land parcel is attributed by ID,
area, prize, owner, address, etc. - Logically organized in tables
- A key-field uniquely relates attributes and
topological objects - obligatory and optional attributes
- computed attributes (area, length)
- Hierarchical inheritance of attributes in
object-oriented systems
42Thematic information and topological consistency
- Consistency of data is one of the most important
criteria of an information system. It must be
warranted when new data are added as well as
after updates
43Thematic information and topological consistency
- Topology-rules in ArcGIS 8.3
- Polygons
44Thematic information and topological consistency
- Topology-rules in ArcGIS 8.3
- Lines
45Thematic information and topological consistency
- Topology-rules in ArcGIS 8.3
- Points
46Database management systems (DBMS)
- GIS (esp. vector-GIS) use DBMS for long term
storage - Strict separation of data and application
programs, - individual view of data for different users,
- Queries, updates, changes only by well-defined
interfaces, together with validation of users
access rights as well as consistency of data - relational database model all data in table
format, relationships only implicitly by values
47Normal forms
DBMS
- stable data structures should follow some
principles, esp. Non-redundancy - 1. normal form, if
- links between data are made by logical pointers,
not by physical address, - each table has a primary key,
- each column (field) has a unique name.
48Normal forms
DBMS
- 2. Normal form, if
- 1. Normal form and,
- Each column functionally fully depends on a key
field
- Delete column Bezirk (county) and create second
table
49Normal forms
DBMS
- 3. Normal form, if
- 1. and 2. normal form and,
- No column is transitively depending on a key field
Must be resolved in 2 tables
50Relational algebra
DBMS
- Elementary operations of relational algebra
- Union is equivalent to union in set theory (all
records from 2 equivalent tables) - Difference of 2 relations A and B is composed of
those records in A which are not in B - Projection is a selection of columns of a table
- Selection is a subset of records which meet given
logical conditions - Cartesian product of 2 relations results in a
table where record of A is related to each record
in B. Important special case JOIN links tables
by common values - standardised query language SQL (Structured
Query Language)
51Vector versus raster data structures
- Decision depends on classes of represented
objects - Linear phenomena are better handled in vector
nodels - Raster model has advantages with areal data
- If high positional accuracy is important, rasters
need too much storage - Applications define the criteria
- Coordinate transformation is easy in vector
models - Coordinate transformation is more difficult for
raster models, because input pixel generally do
not have only a single output pixel ?
irreversible process
52Vector versus raster data structures
53Vector versus raster data structures
54Data exchange, standardization
- De-facto-standards for exchange of geometriy and
attribute information - Topology is not so common
- Meta data
- national, european and international standards
- OpenGIS Consortium (OGC) Interoperability of
GIS
55Summary
- Data structures should represent spatial
phenomena completely and clearly and support
efficient analysis - Discrete primitives for entities and continuous
fields - Topology Relationship between entities. Arc-node
topology supports connectivity, definition of
areas and connectivity - Vector-GIS generally implement a layer concept
with basic elements of points, lines, networks or
polygons - TIN is a vector data structure for continuous
fields - Object oriented data structures encapsulate data
objects together with methods for their behavour.
56Summary
- Raster data structures generally grid of
quadratic cells, aligned with coordinate axes - Images are special raster data sets with high
resolution - Thematic attributes describe what an object is
(semantics) - Thematic hierarchies can be modelled by a layer
or a tree structure - DBMS for robust storage of geo-data
- Adherence to national and international standards
for exchange of geo-data between different
systems and institutions