Data Structures - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Data Structures

Description:

Recognition of thematic hierarchies and their robust and ... monochrome, grayscale, RGB, CMY, CMYK. Use of a lookup table. Images. 26. Georeferencing of images ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 57
Provided by: jfr1
Category:

less

Transcript and Presenter's Notes

Title: Data Structures


1
Data Structures
  • Josef Fürst

2
Learning objectives
  • In this section you will learn
  • Understanding of important geographical data
    structures and their relevance,
  • Importance of topology and its utility for
    spatial analysis,
  • Relevance and implementation of thematic
    information (attributes)
  • Recognition of thematic hierarchies and their
    robust and efficient representation
  • Basic concepts of relational databases,
  • Advantages and disadvantages of vector and raster
    data structures.

3
Outline
  • Introduction
  • Levels of creating a model
  • Coding the basic data models for input to the
    computer
  • Data organisation in vector data structures
  • Object oriented data structures
  • Data organisation in raster data structures
  • Images
  • Compact storage of raster data
  • Storage of vector and raster data in DBMS
  • Attributes
  • Attributes and topological consistency
  • Database management systems (DBMS)
  • Vector versus raster data structures
  • Data exchange, standardization
  • Summary

4
Introduction
  • Data models are independent of a specific
    implementation in a GIS. Also analogue maps are
    based on models of the area.
  • digital coding of the information, in several
    stages from the data model, data structures, data
    types up to their binary representation
  • We need efficient data structures, that
  • represent the selected data models completely and
    unambiguously,
  • are robust,
  • efficiently support the desired analyses and
  • use storage space economically.

5
Levels of creating a model
  • A view of reality the conceptual model.
  • Human conceptualization leading to an analogue
    abstraction (analogue model).
  • A formalization of the analogue abstraction
    without any conventions or restrictions of
    implementation (spatial data model)
  • A representation of the data model that reflects
    how the data are recorded in the computer
    (database model)
  • A file structure (physical computational model)
  • Accepted axioms and rules for handling the data
  • Accepted rules and procedures for displaying and
    presenting spatial data (graphical model)

6
Coding the basic data models for input to the
computer
  • discrete primitives of geographical data to
    create entities as well as continuous fields
  • Representation of position, relationships and
    attributes of different types
  • Points, lines and polygons are the spatial
    primitives in a vector model, pixels in raster
    models.
  • Spatial relationships between entities are
    defined by topology

7
Data organisation in vector data structures
  • Vector-based geographical databases are composing
    a complex theme of several layers, each of which
    combines a certain class of phenomena.
  • E.g. hydrological map rivers, lakes, observation
    sites, land use, etc., each in a separate layer
  • Layer (Coverage) consists of entities of one
    type, with relationships, different attributes
  • Layers are handled independently from each other
  • E.g. data structure does not force a gauge to be
    on the river bank
  • Vector-GIS use implicit relations (tables) for
    storage
  • Software packages use different structures.

8
Points
Data organisation in vector data structures...
  • Position of points is defined by a single pair of
    coordinates (X, Y)
  • Additional info type of point, attributes
  • Layer of point entities created from simple table
  • E.g. event theme in ArcView

9
Lines
Data organisation in vector data structures...
  • Sequence of (X, Y) coordinate pairs and
    connecting straight lines or curves

10
Networks
Data organisation in vector data structures...
  • Information about connectivity with other line
    entities to represent street networks, utilities,
    rivers
  • topological information in the data structures ?
    connectivity tables
  • Topological terms node and arc
  • arc-node topology

11
Polygons
Data organisation in vector data structures...
  • Shape, neighbours, hierarchy
  • simple polygons sequence of x,y coordinate pairs
  • Border lines between polygons are digitized and
    stored twice. Error gaps, overlaps
  • No information on neighbourhood.
  • islands only graphically represented.
  • Difficult to validate

12
Polygons
Data organisation in vector data structures...
  • Polygons with point dictionaries
  • Points with ID in a list (table)
  • Polygons as ordered list of pointers into list
  • Borders between neighbouring polygons are unique.
  • no information on neighbourhood
  • No islands

13
Polygons
Data organisation in vector data structures...
  • Polygons by arc-node-topology
  • Underlying principle free of redundancy

14
Storage without redundancy
Data organisation in vector data structures...
  • Sequential transformation of coordinates
  • No change of topology
  • E.g. plans of Vienna metro

15
Triangulation of continuous fields
Data organisation in vector data structures...
  • Node list and triangle list
  • Optimal TIN by Delaunay criteria

16
Triangulation of continuous fields
Data organisation in vector data structures...
  • efficient TIN models select nodes in order to
    represent surface with a minimum number of points
    ? dense points where relief of surface is rapidly
    varying, scarce points in flat areas
  • Consideration of breaklines

17
Object oriented data structures
  • Object oriented systems encapsulate data objects
    together with methods applicable to them. Access
    to objects is only done by the methods defined
    for them
  • Data structures get a defined behaviour
  • Hydrant should delete itself, when the last
    pipe connecting it to the network is removed.
  • inheritance
  • Structural object orientation capability, to
    create composed objects (Arc Hydro)
  • Behavioural object orientation behaviour of data
    types with specifically defined functions and
    procedures
  • CASE-Tools (Computer Aided Software Engineering)
    for design and implementation

18
Object oriented data structures
  • UML (unified modeling language) diagram of a part
    of a geodatabase

19
Object oriented data structures
  • Arc HydroFramework

20
Data organisation in raster data structures
  • Raster resembles photo
  • 3 ways to interprete a pixel
  • classification a range of values is allocated to
    certain objects (gray pixels are roads, blue
    pixels are water surfaces,...).
  • measure the value intensity of a colour,
    concentration, etc.
  • relative height over reference height.

21
Raster data structure
Data organisation in raster data structures
  • Position is represented by discrete cells
  • Types of raster maps
  • Nominal data like land use (forest, grassland,
    farmland, ...)
  • Continuous values, concentration, light intensity
  • relative measures like elevation.

22
Raster data structure
Data organisation in raster data structures
  • Entities also in raster model
  • Cell size determines resolution cell size max.
    50 of smallest recognized object

23
Raster data structure
Data organisation in raster data structures
  • Topology described implicitly by raster
  • Cell raster point raster

24
Images
  • Pictures can be used as map displays (e.g.
    satellite image, orthophoto) or also as attribute
    information (e.g. pictures of the measuring
    instruments linked to the measuring points on a
    measuring point map, photo of the houses in the
    information system of a real estate agent).

25
Storage of images
Images
  • Similar to raster maps, with some specific
    properties
  • Pixel (from picture element)
  • Economical storage
  • 1, 8, 24 or 32 bit for coding of a colour value
  • Number of bits per pixel colour depth
  • monochrome, grayscale, RGB, CMY, CMYK
  • Use of a lookup table

26
Georeferencing of images
Images
  • Most image formats, even purely graphic formats
    usable (e.g. TIFF, JPEG, PCX, GIF, BMP, etc. )
  • If images is oriented along coordinate
    axesworld-file, header
  • Georeferencing can be complex
  • distortions, e.g. in images from airborne sensors

27
Georeferencing of images
Images
  • Depending on the orientation of the coordinate
    system, objects equal in nature are represented
    differently in the raster model
  • If distortions of image are small (flat terrain)
    ?georeferencing by affine projection with gt 4
    ground control points (polynomial
    transformation).
  • Pixel are re-computed by interpolation or areal
    averaging, according to the type of variable ?
    loss of information

28
Compact storage of raster data
  • Storing full matrices is not economical
  • Position, resolution and dimension of a raster
    are defined in a header

29
Storage without loss of information
Compact storage
  • Chain codes
  • Cityblock, chequer-board

30
Storage without loss of information
Compact storage
  • Run-length encoding
  • Block codes are an extension of run-length
    encoding in 2 dimensions

31
Storage without loss of information
Compact storage
  • Quadtrees
  • Hierarchical data structure
  • Parts can be addressed and read directly
  • Read only for required resolution

32
Storage without loss of information
Compact storage
  • Two-dimensional orderings
  • Schemes for order of pixels, to avoid reading
    excessive data if an arbitrary part of the image
    is needed

33
Compression of raster data with loss of
information
Compact storage
  • Wavelet approximation JPEG not appropriate for
    GIS
  • ECW (Enhanced Compressed Wavelet)
  • MrSID (Multi-resolution Seamless Image Database)

34
Storing vector and raster data in DBMS
  • Efficient access to large volumes of data with
    complex relationships
  • B-Trees
  • R-Trees
  • Use of DBMS
  • Geo-relational model
  • Hybrid concept geometry data in vendor-specific
    binary format, attributes in RDBMS (INFO, ORACLE,
    INGRES, INFORMIX, MS ACCESS)
  • Storage of attribute data independently from
    spatial data
  • extension, updating, deletion of attribute data
    do not influence spatial data
  • Commercial RDBMS ensure use of latest
    developments and standardisation
  • Use of standard query language like SQL

35
Attribute information
  • Geoinformation is based on two main elements
  • Geometry and topologie (Question Where?) and
  • Thematic information (attributes) (Question
    What?).
  • Approach via thematic maps
  • Analogy of transparencies

36
Attribute information in a raster model
Attribute information
  • Geometry and topology determined by definition of
    the raster (origin, resolution)
  • Attribute information is additional dimension
  • Spatial query, thematic query, and combined query

37
Thematic information in vector models
Attribute information
  • A theme is assigned to each topological object
    (node, edge, polygon) by one or more attributes
    (often tied to a label point)
  • MN relationship between different thematic
    layers ? resolve into n units with 1M
    relationship

38
Thematic hierarchy
Attribute information
  • Hierarchy A catchment of order 1 (counted from
    confluence), consists completely and uniquely of
    catchments of order 2, etc.
  • Similarly administrative units Province(Land)
    County (Bezirk) Community (Gemeinde)

39
Thematic hierarchy
Attribute information
  • Appropriate data model
  • Simple query of information on each level
  • consistent information
  • Layer structure separate, technically
    independent layers for each level
  • Tree structure Hierarchy is modelled by building
    objects from smallest units (communities ?
    counties ? provinces)
  • Decision, which approach is implemented, depends
    on problem and application

40
Thematic hierarchy
Attribute information
  • Layer structure each layer is a self-contained
    dataset (e.g. shapefile), technically independent
    from other levels ? quick, direct query, possibly
    inconsistent
  • Tree structure Information on objects in higher
    level must be aggregated on demand ? more
    compute-intensive, but always consistent

41
Thematic attributes
Attribute information
  • thematic attributes quantitatively classify
    objects, e.g., a land parcel is attributed by ID,
    area, prize, owner, address, etc.
  • Logically organized in tables
  • A key-field uniquely relates attributes and
    topological objects
  • obligatory and optional attributes
  • computed attributes (area, length)
  • Hierarchical inheritance of attributes in
    object-oriented systems

42
Thematic information and topological consistency
  • Consistency of data is one of the most important
    criteria of an information system. It must be
    warranted when new data are added as well as
    after updates

43
Thematic information and topological consistency
  • Topology-rules in ArcGIS 8.3
  • Polygons

44
Thematic information and topological consistency
  • Topology-rules in ArcGIS 8.3
  • Lines

45
Thematic information and topological consistency
  • Topology-rules in ArcGIS 8.3
  • Points

46
Database management systems (DBMS)
  • GIS (esp. vector-GIS) use DBMS for long term
    storage
  • Strict separation of data and application
    programs,
  • individual view of data for different users,
  • Queries, updates, changes only by well-defined
    interfaces, together with validation of users
    access rights as well as consistency of data
  • relational database model all data in table
    format, relationships only implicitly by values

47
Normal forms
DBMS
  • stable data structures should follow some
    principles, esp. Non-redundancy
  • 1. normal form, if
  • links between data are made by logical pointers,
    not by physical address,
  • each table has a primary key,
  • each column (field) has a unique name.

48
Normal forms
DBMS
  • 2. Normal form, if
  • 1. Normal form and,
  • Each column functionally fully depends on a key
    field
  • Delete column Bezirk (county) and create second
    table

49
Normal forms
DBMS
  • 3. Normal form, if
  • 1. and 2. normal form and,
  • No column is transitively depending on a key field

Must be resolved in 2 tables
50
Relational algebra
DBMS
  • Elementary operations of relational algebra
  • Union is equivalent to union in set theory (all
    records from 2 equivalent tables)
  • Difference of 2 relations A and B is composed of
    those records in A which are not in B
  • Projection is a selection of columns of a table
  • Selection is a subset of records which meet given
    logical conditions
  • Cartesian product of 2 relations results in a
    table where record of A is related to each record
    in B. Important special case JOIN links tables
    by common values
  • standardised query language SQL (Structured
    Query Language)

51
Vector versus raster data structures
  • Decision depends on classes of represented
    objects
  • Linear phenomena are better handled in vector
    nodels
  • Raster model has advantages with areal data
  • If high positional accuracy is important, rasters
    need too much storage
  • Applications define the criteria
  • Coordinate transformation is easy in vector
    models
  • Coordinate transformation is more difficult for
    raster models, because input pixel generally do
    not have only a single output pixel ?
    irreversible process

52
Vector versus raster data structures
53
Vector versus raster data structures
54
Data exchange, standardization
  • De-facto-standards for exchange of geometriy and
    attribute information
  • Topology is not so common
  • Meta data
  • national, european and international standards
  • OpenGIS Consortium (OGC) Interoperability of
    GIS

55
Summary
  • Data structures should represent spatial
    phenomena completely and clearly and support
    efficient analysis
  • Discrete primitives for entities and continuous
    fields
  • Topology Relationship between entities. Arc-node
    topology supports connectivity, definition of
    areas and connectivity
  • Vector-GIS generally implement a layer concept
    with basic elements of points, lines, networks or
    polygons
  • TIN is a vector data structure for continuous
    fields
  • Object oriented data structures encapsulate data
    objects together with methods for their behavour.

56
Summary
  • Raster data structures generally grid of
    quadratic cells, aligned with coordinate axes
  • Images are special raster data sets with high
    resolution
  • Thematic attributes describe what an object is
    (semantics)
  • Thematic hierarchies can be modelled by a layer
    or a tree structure
  • DBMS for robust storage of geo-data
  • Adherence to national and international standards
    for exchange of geo-data between different
    systems and institutions
Write a Comment
User Comments (0)
About PowerShow.com