Database Design and Decisions GISC 6383 Management and Implementation of GIS - PowerPoint PPT Presentation

About This Presentation
Title:

Database Design and Decisions GISC 6383 Management and Implementation of GIS

Description:

... communication among designers, developers, and users ... in a completely rigorous and unambiguous fashion on which both user and GIS analyst agree ... – PowerPoint PPT presentation

Number of Views:496
Avg rating:3.0/5.0
Slides: 60
Provided by: ronbr
Learn more at: http://www.utdallas.edu
Category:

less

Transcript and Presenter's Notes

Title: Database Design and Decisions GISC 6383 Management and Implementation of GIS


1
Database Designand DecisionsGISC 6383
Management and Implementation of GIS
  • Note some figures in this document are adapted
    from
  • State of New York GIS Development Guides
  • ftp//ftp.sara.nysed.gov/pub/gis/sara.zip
  • chapter 3 conceptual design
  • chapter 6 database planning and design
  • chapter 7 database construction

2
Data Base Design Process steps
Reality
(Review from earlier lecture.)
Conceptual
  • Conceptually Model USERS view
  • (Tomlinson Chapter 6,7)
  • 2. Logically Model the Data
  • Define data entities and their relationships
  • Identify representation of entities and match to
    spatial data model
  • (Tomlinson Chapter 8, 9 thru p.136)
  • 3. Physically Model the Data
  • Create physical database design for selected
    software (Oracle, ArcGIS, etc)
  • 4. Design Process for obtaining and converting
    Data from source

ESRI Steps for Building Geodatabase 1. Model the
users View of the data 2. Define objects and
relationships 3. Select geographic
representation. 4. Match to geodatabase
elements 5. Organize geodatabase
structure (Zeiler, p. 18)
Database Schema in UML Diagram (using Visio)

Geodatabase Generation and Population
Physical
3
  • We covered Conceptual Modeling/Needs Assessment
    in the lecture on Implementation Steps
  • Tonight we focus on Logical Design and Physical
    Design
  • These are the guts of database design/data
    modeling

Loosely corresponds to Tomlinsons Chapter 8
Create a data design and Chapter 9 thru p. 136
4
Objectives of Database Design
  • Satisfies and supports organizations objectives
  • Contains all data but no redundant data
  • Minimizes redundancy across the organization
  • Allows for different users to access same data
  • Consistent and flexible data retrieval and
    analysis
  • Accommodates different views of the same data
  • Based on user and application needs
  • Increases likelihood of users developing
    applications
  • Appropriately represents organizes geographic
    features
  • Maintains the data so its currency is assured
  • Secures data by distinguishing applications (
    users) which
  • create data (add records for new entities)
  • update data (maintain modify existing data
    records)
  • read data (use but cant modify in any way)
  • delete data (remove records from database)
  • (CRUD)

5
Logical Data Modeling
6
Why build a model?
  • We build models of systems in order to break them
    up into well-defined subsystems, because it helps
    us to overcome the difficulties in comprehending
    such systems in their entirety.
  • As the complexity of systems increase, so does
    the difficulty of comprehending them and the
    importance of good modeling techniques to assist
    us in managing them.
  • Good models with well-defined semantics are
    essential for communication among project teams
    and to assure architectural soundness.
  • In other words, developing a model for an
    industrial-strength software system prior to its
    construction or renovation is as essential as
    having a blueprint for a large building.

Luis X. B. Mourão http//cplus.about.com
7
Logical Data Modeling
  • It is an entirely data-driven process
  • It encourages comprehensive understanding of
    business information requirements
  • It enables effective communication among
    designers, developers, and users throughout the
    design process
  • It forms the basis for designing correct,
    consistent, sharable, and flexible databases
    using any database technology
  • Correct
  • an accurate and faithful representation of the
    way information is used in the business
  • Consistent
  • no contradictions in the way information objects
    are named, defined, related, documented
  • Sharable
  • accessible by multiple applications and users to
    meet varying access requirements
  • Flexible
  • easily updateable when implementation or
    business changes

8
Logical Data Modeling in practice
  • purpose of data model is to ensure that data has
    been identified in a completely rigorous and
    unambiguous fashion on which both user and GIS
    analyst agree
  • logical data models define entities (the unit
    about which we collect informatione.g. people,
    companies), the attributes of entities (the
    information collectedage, salary), and the
    relationships between entities (companies pay
    salaries to people)
  • developed through use of
  • entity-relationship diagrams which show
    relationships among all data throughout the
    organization
  • data dictionaries (structured lists) which
    document each entity, its attributes, and its
    relationships

9
Entity Relationship (E-R) Diagrams and Logical
Data Modeling
  • Entity (noun)
  • objects or things to be included in the database
  • person, place thing or concept about which you
    wish to record info.
  • for example employee, company, citizen, street
  • Attribute (adjective) (but they are often
    nounse.g. a name!!)
  • characteristics or measurements to be recorded
    for the entities
  • fact or nondecomposable piece of information
    describing an entity
  • for example age, dob, owner, street type
  • Relationships association between entities
    (verb)
  • employee---works for----company
    company--------has-------employees
  • transformers ----are mounted on----poles
    land parcel---has----owner
  • Cardinality of Relationships (adverb)
  • one to one
  • one to many
  • many to many
  • Business Rules (attribute domains and validation
    rules)
  • requirements that attributes or relationships
    must meet
  • specifications which preserve the integrity of
    the logical data model by governing the values
    attributes may assume or the cardinality
    relationships may take on
  • For example of kids is an integer between 0
    and 24 poles have 0 to 3 transformers

Country ----has--- Capital city (one capital
city per country) Company (one) lt----Work for----
---has------gt (many) Employees Parcels lt
----have----gt owner s (parcels gt1 owner ownersgt1
parcel)
10
Conventional E-R Diagram
GIS E-R Diagram
11
Entity or Attribute? owner as example
  • Perennial problem in modeling
  • Do you think of it as a number or text
    gtgtgtattribute
  • Does it have attributes of its own gtgtgtentity
  • Owner has name, billing address,
  • Does it have a relationship with other things
    gtgtgtentity
  • Owner receives tax bills
  • Does it repeat elsewhere gtgtgtentity
  • Owner could own many parcels
  • Is it important by itself gtgtgtentity
  • An owner IS important
  • If in doubt, make it an entity, Can change
  • later in physical model
  • Some rules for attributes
  • Primitives no meaning in itself
  • e.g 50
  • All values of same kind
  • e.g all integers 4, 6 not 4, SIX
  • Describe entity not another attribute
  • e.g. owner name describes owner of parcel, not
    the parcel
  • Never a list
  • e.g. owner John Smith, Bill Jones
  • Never repeated
  • e.g. owner1, owner2, owner3
  • Undecomposable or composite?
  • e.g. 117 West Plano Road
  • Exceptions are always made!!!
  • e.g. in 911 when speed matters

12
...data dictionaries (structured lists) document
each entity, its attributes, and its relationships
When using UML techniques to create
Entity-Relationship diagrams, entities,
attributes and relationships can all be
documented within the diagram. Data
dictionaries become part of the
metadata. Recording measurement units (meters,
feet) for attributes is critical and
problemattic. Can include in --name
length_in_feet (clumsy) --metadata (metadata
gets lost, forgotten, or separated from data)
Entities which have spatial expression need
additional conceptualization see following
slides
13
Spatial Data Modeling
  • Spatial data differs in two key ways, and these
    must be incorporated
  • entities have a corresponding spatial expression.
    In ESRI terms,
  • Objects entities without spatial expression
    (e.g. owner)
  • Features entities with spatial expression (e.g.
    parcel)
  • An entity, when given spatial properties, becomes
    a GIS spatial data layer
  • relationships may have a spatial expression also

14
in essence, you need to decide how to represent
each entity spatially
  • Note that
  • one entity could be visualized through another
  • footprint on a lot map
  • might be different at different scales
  • airport a point at one scale and polygon at
    another
  • might be different for two applications
  • street as line for routing
  • street as polygon for pavement management
  • attributes might be displayed graphically as
    annotation
  • Point or a point symbol
  • Line or line types
  • Areas or polygons
  • Surfaces or surface drapes
  • Raster format such as scanned paper documents
  • Images such as photos, satellite or clip art
  • or plain old Alpha or numeric

15
..identifying entities spatial
representation (examples)
  • Property associated
  • Legal parcel
  • Assessor parcel
  • Parcel boundary
  • plat map
  • Parcel photograph
  • Owner
  • Address
  • Land value
  • Street associated
  • Street
  • Street segment
  • Intersection
  • Traffic light
  • Traffic analysis zone
  • Bus route
  • Bus stop
  • polygon
  • polygon
  • line string
  • raster
  • image
  • alphanumeric
  • alphanumeric
  • numeric
  • line string
  • line segment
  • node
  • point
  • polygon
  • route
  • point

pixels
in essence, you need to decide how to represent
each entity spatially
abcdef123
110210.67
(And these will need to be supported by the GIS
software you select)
16
Additionally, you will need to identify the
classic spatial data properties for each layer
  • Scale ranges at which data is required to be
    displayed
  • minimum resolution required to support intended
    applications
  • minimum accuracy required to support intended
    applications
  • Projection(s) in which data will be
  • stored and
  • used
  • (may not be the same)

17
identifying spatial relationships, and how they
will be implemented
street segments link to street network
cities common border
lot within floodplain
house nearest fire_hydrant
Valve and gauge same manhole
Possible ER symbol (not standardized)
How GIS system might implement the relationship
18
 
Contains --15 entities --16 relations Attributes
not included for simplicity
Corrected from original source
Example E-R Diagram with Spatial Concepts for
Urban Application
19
E-R Diagram
  • Final E-R diagram should be verified with users
    for
  • required entities
  • appropriate spatial representation of entities
  • required attributes
  • appropriate spatial relations/operations
  • Once verified, the E-R diagram becomes the basis
    for the physical data base design.

20
Creating E-R diagrams Unified Modeling Language
  • E-R diagrams can be drawn by hand or with any
    drawing package
  • CASE (computer aided software engineering) tools
    emerged in the 1980s and early 90s) to aid in the
    process (e.g. TIs Composer)
  • Each used their own proprietary language and
    symbolization
  • UML (Unified Modeling Language) developed in
    mid/late 1990s to provide standardized modeling
    language based on object-oriented concepts.
  • Initiated at Rational Software Corporation in
    1994/95 by merging of Grady Boochs (Booch
    Model), Jim Rumbaughs (OMT--Object Modeling
    Technique) and Ivar Jacobsons (OOSE--Object-Orien
    ted Software Engineering) method
  • Existence of UML standard allows data base
    vendors to support automated conversion of
    conceptual data models to physical data base
    designs
  • ArcGISs support use of MS Visio2000/02/03
    (enterprise edition 2000, professional 02/03
    for full support) see Zeiler, p. 19-20
  • Sample geodatabase schemas (templates) available
    for different industries
  • Other UML based products Rational Rose, Paradigm
    Plus, Oracle2000, ERwin (from Computer
    Associates)

21
Physical Data Modeling
  • Creating physical database design for the
    selected database software
  • Oracle, SQL Server, ArcGIS, Intergraph, etc
  • We will focus on physical database design for an
    ESRI Geodatabase

Loosely corresponds to Tomlinsons Chapter 8
Create a Data Design
22
physical data base design involves
  • For entities and attributes
  • Representing (mapping) all entities and their
    attributes into one or more relational tables in
    a selected RDBMS and determining keys for forming
    relationships
  • For each spatial entity, selecting an
    appropriate
  • GIS data type e.g. polygon, line, point,
    surface
  • GIS data set format for storage e.g.
    geodatabase, coverage, grid, tin, image, etc
  • Spatial Reference System
  • coordinate system (geographic or projected, etc.)
  • Spatial domain
  • Precision (or resolution)
  • Selecting an appropriate Data Type for each field
    (attribute)
  • e.g. for ESRI string, short integer, long
    integer, float, double, blob.
  • For relationships (associations)
  • for regular (non-spatial) relationships,
    identifying which of the RDBMSs normal query
    structures or relational operators will handle
    the relationship
  • if wont do it, develop specs for a custom
    application
  • See Appendix for discussion of relational
    operators
  • for spatial relations, identify which
    capabilities of software will handle the desired
    operation (e.g. nearest neighbor identification)
  • if wont do it, develop specs for a custom
    application
  • Note Computer people talk about mapping
    entities to tables rather than representing.
    Mapping in this sense does not mean producing
    cartographic output!

23
Conceptual E/R diagram
Physical Database design
(primary key)
(foreign key)
(Primary keyforeign key)
(one)
(many)
Example of mapping a relationship for RDBMS
attribute tables --the relation contains is
mapped to a table join --the building table has
a foreign or secondary key, (parcel ID)
to permit a join with the primary key (also
parcel ID) in parcel table The spatial
data structure is shown as two generic layers
24
Example of Spatial Database Mapping for ArcInfo
coverage
spatial expression represented as an ArcInfo
polygon coverage
Attribute List of Entity "Parcel"
Parcel
Attributes represented in Oracle Tables
ARC/INFO Spatial Database Structure (coverage)
Previous Values
Parcel
INFO
Parcel
ARC
AAT
TIC
BND
ETC.
PAT
What is wrong with this design?
PATPolygon Attribute Table
A key field for parcels is formed by
concatenating subdivision, block lot s
25
Example of Database Mapping for geodatabase
  • Design decisions
  • why distribmains and transmains as separate
    feature classes?
  • why not valve with gate and hydrant
    subclasses? (--different attributes)
  • why prodwell1 and prodwell2?

26
ESRI-related DB Design Decisions
27
ESRI DB Design Decision Overview
  • Type for spatial data layer
  • Vector (point, line or polygon), raster, tin ?
  • Format for spatial data
  • Coverage, shapefile or geodatabase ?
  • Geodatabase Design Decisions
  • Feature Datasets
  • Stand-alone feature classes or feature data sets
    (fds)?
  • Spatial Reference system for each feature data
    set
  • Coordinate system lat/long or projected? which
    projection? what parameters?
  • Spatial Domain (extent) and Precision ?
  • Feature Classes (tables)
  • Subtypes or separate feature classes?
  • roads feature class with road_type subtype, or
    separate freeway, arterial, street feature
    classes?
  • Attributes Types and Validation
  • Type (string, long integer, short integer, etc.)
    ?
  • Validation Rule through application of Attribute
    Domain ?
  • Domains and Defaults
  • Relationships and Associations between feature
    classes
  • Implement Geometric networks and/or topology
    rules?

28
Spatial Data Types
  • Overall, ArcGIS 9 supports at least four
    representations of geographic data.
  • Vector data for representing features
  • CAD, Coverages, shapefiles, geodatabases
  • Raster data for images, and surfaces.
  • Image data in .bmp, .tiff, .jpeg, .sid, ERDAS
    formats
  • Raster data in discrete or continuous GRIDS
    (ESRIs native file format for raster)
  • Discrete grids can have an attribute data table,
    continuous do not
  • Raster data in a Geodatabase (as of ArcGIS 9)
  • Triangulated Irregular Networks (TINS) for
    surfaces.
  • Although TINS are a vector format, as of ArcGIS
    9.1, they are not yet supported by the Personal
    Geodatabase and must be stored in coverage
    workspaces or SDE geodatabase
  • Tabular data (sometimes called Event tables).
  • List of X,Y coordinates for points (such as may
    be output from GPS)
  • Locators for finding a geographic position from
    an address.

A decision must be made as to the spatial data
type for each layer.
29
Spatial File Formats--example
Tracts feature class table
Features (rows)
Feature type
Feature ID (key field)
Secondary or Foreign key
30
Geodatabase Design Decisionsexample
Texas geodatabase Dallas County feature data
set feature classes All feature
classes extent within a feature data datum set
must be in the projection same
spatial reference system Plano feature data
set feature class feature
class Stand-alone feature classes Each
stand-alone feature class may be in a
different spatial reference system US
Geodatabase Stand-alone feature class
Rasters and TINs can be stored within a SDE
geodatabase but not a personal geodatabase
31
Anatomy of a Geodatabase
  • Geodatabase
  • Feature datasets
  • Spatial Reference
  • Object classes and subtypes
  • Feature Classes and subtypes
  • Relationship classes
  • Geometric Network
  • Topology
  • Domains
  • Validation Rules
  • Raster Datasets (SDE Only)
  • rasters
  • TIN datasets (SDE only )
  • nodes, edges, faces
  • Geodatabases contain feature datasets, raster
    datasets, TIN datasets (planned 9.2), locators
  • Feature datasets contain various objects which
    all share a common spatial reference
  • Objects (e.g. Jane Blow, land owner) are
    instances of object classes (e.g. land owners)
    and have no spatial form.
  • Features, stored in feature classes, are spatial
    objects (e.g. land parcels) which are similar and
    have same spatial form (e.g. polygon)
  • Object (or feature) classes are tables, with
    objects (or features) in the rows of the table
  • Attributes are in the columns of the table
  • Subtypes are an alternative to multiple object
    (or feature) classes (e.g. concrete, asphalt,
    gravel road subtypes) think of subtype as the
    most significant classification variable
    (attribute) in the class table

32
Anatomy of a Geodatabase contd
  • Geodatabase
  • Feature datasets
  • Spatial Reference
  • Object classes and subtypes
  • Feature Classes and subtypes
  • Relationship classes
  • Geometric Network
  • Topology
  • Domains
  • Validation Rules
  • Raster Datasets (SDE only)
  • rasters
  • Relationship classes are tables containing
    general relationships between objects and/or
    features (e.g. between work order object class
    and roads feature class)
  • Geometric Networks models flows thru linear
    systems such as streams, sewers, raods
  • Topology models relationships among lines and
    areas (e.g. common state/county boundary)
  • Domains are sets of valid and/or default
    attribute values (e.g. road lane count default
    is 2 valid values are integers 1-12 )
  • Validation rules control feature and attribute
    integrity by applying domains. 3 types attribute
    rules (applied to attributes or subtypes e.g
    gravel road 1 or 2 lanes only) connectivity
    rules (e.g. gravel road cannot connect to
    freeway) and relationship rules (constrains
    cardinality of a relationship e.g. gravel road
    can have no more than 4 segments at an
    intersection)
  • Simple behaviors are realized thru domains and
    validation rules
  • Complex behaviors and custom objects are realized
    by extending rules with custom programming

33
Feature classes (FC), feature datasets (fds) and
subtypes
  • Spatial features (e.g. a land parcel) are grouped
    into feature classes a table with spatial data
  • Data in FC must have same topology type (all
    points, all lines, all polygons)
  • Water feature class with lakes (polygon) and
    streams (line) not permitted
  • Minimizing the number of feature classes improves
    performance
  • Use different feature classes only when
    attributes are significantly different
  • Use roads feature class rather than freeway,
    arterial, streets feature classes
  • Use subtype to differentiate freeway, arterials,
    streets (all have similar attributes)
  • Subtypes are subclasses within a feature class
    that allow you to further distinguish objects
    without creating new feature classes
  • based on a single columns values (must be
    integer or long integer)
  • Same subtype has similar attribute values and
    behaviors
  • Use where attributes are the same across all
    subtypes
  • Feature classes can be grouped into feature
    datasets (fds) or spatial folders
  • All feature classes in a fds must have the same
    spatial reference system, but may have different
    topology (can have points and lines and polygons
    in same fds)
  • Organize by thematic similarity e.g
    transportation
  • If you wish to create a geometric network, must
    be in same fds
  • If you wish to create topology, must be in same
    fds
  • If they share geometry (street forms political
    boundary), should be in same fds
  • Security (read/write permissions, etc..) applied
    at the fds not the fc level!!!!

34
Data Types for Attributes
  • For every attribute field, must select a data
    type
  • Each RDBMS stores data slightly differently
  • ESRI generic data types will translate into
    closest RDBMS equivalent
  • Values given below may differ with RDBMS used
  • ESRI Generic Data Types
  • String text field. Be sure its length (number of
    characters), absolute or what you specify, is
    sufficient to record longest data value.
  • Short Integer (or integer) whole numbers (no
    decimal point) generally
  • /-32,767 (2 bytes). OK for size of family, not
    OK for city size
  • Long Integer (or long) only supports integers to
    /- 2,147,483,647 (4 bytes)
  • Float (or single) single precision floating
    point again, be careful-- supports decimal point
    but perhaps only 6 digits long with decimal
    moveable 34 places (E34) (4 bytes)
  • Double double precision floating point the
    safest-- supports 12-15 digits with decimal
    moveable up to 308 places (E308) (8 bytes)
  • Blob binary long decimal for special programming
    applications
  • Note terminology
  • Precision the total number of digits (before
    plus after decimal)
  • Scale number of digits after decimal

35
Domains and Defaults
  • Why Use Them?
  • Data Integrity prevents entry of invalid
    (obviously wrong) data values
  • Data Efficiency choose from a set of valid
    values rather than type in each time
  • Domains define a set of legal values for a
    fields attributes
  • Range domain specifies a valid range of values
    for numerical attributes
  • A water pipe must be between 1 and 100 inches
    wide
  • Coded value domain specifies a valid set of
    values for an attributes. Can apply to any type
    of attributes
  • Parcels can only have RES or VAC land use values
  • Domains are defined as a geodatabase property
    then applied as appropriate
  • Multiple objects in the same database may use the
    same domain
  • May be applied to an entire field (attribute), or
    separately by subtype
  • Defaults are values automatically assigned when
    a feature is created
  • Of course, may be changed during data entry/edit
    process
  • Again, may be applied to an entire field
    (attribute), or separately by subtype
  • Again, the physical design process requires
    decisions about domains and defaults, and to
    what they should be applied.

36
Implementing Associations and Relationships
  • Associations (general term) may be implemented as
  • A relationship class within the GDB.
  • both classes (tables) in the relationship (i.e.
    the related tables) must be within the GDB
  • permanent part of the data model and relationship
    rules are enforced
  • Is itself a class, stored in tables, with
    properties and behaviors
  • support 11 and 1M cardinality, and many to many
    using a key class table.
  • Strictly speaking, everything else is an
    association
  • An ArcMap relate
  • Functionally similar to a relationship class
  • supports both 11 and 1M cardinality
  • Can form a relate to tables outside the GDB
  • INFO tables, Access tables, dbf tables
  • Tables in other databases via ODBC (object data
    base connectivity)
  • Local to the ArcMap document therefore is
    essentially temporary
  • An ArcMap join
  • Supports 11 cardinality only
  • Links matching objects and visualizes them as
    rows in a single table
  • Can match objects either
  • Non-spatially using key attribute fields in each
    table exists only temporarily

--Use relationship class when the referential
integrity of data is important --Joins are
simpler and require less overhead.
37
Some Special Case Relationships
  • Many to Many relationships
  • Many class A objects match many class B (parcels
    owners)
  • Implemented via an attributed relationship
  • Intermediate table is created to store the
    relationship
  • Aggregation v. Composition and Simple v.
    Complex Relationships
  • Aggregation e.g. dog has bowl, collar
    valve has valve box
  • Aggregation implemented through simple
    relationships
  • Peer to peer delete one, the other remains
    Dog dies but bowl and collar remain
  • Composition e.g. dog has feet, tail
    valve has maintenance records
  • Composition implemented through complex
    relationships
  • Enforced dependency delete one, and the other
    goes also Dog dies, feet and tail gone

38
Spatial ReferenceAll feature classes within a
feature dataset must have the same spatial
reference.
  • Coordinate System
  • Datum
  • Geographic (lat/long) or projected?
  • Projection parameters central meridian, standard
    parallels, coordinate system origin (false
    easting and northing)
  • Measurement (map) units dd (for lat/long)
    feet, meters, etc. (for projected)
  • Spatial domain
  • The allowable coordinate range for the geographic
    coordinates
  • X/Y Domain MinX, MaxX, MinY, MaxY (horizontal
    extent)
  • Domain defaults to 3 times the the actual data
    extent (100 on either side)
  • Z Domain Min, Max (vertical extent)
  • M Domain Min, Max (other parameter, e.g.
    distance from river mouth ) (can differ within
    feature data set)
  • Once created, the spatial domain for feature
    dataset/class cannot be changed.
  • Data outside domain will require a new feature
    dataset or standalone feature class.
  • Precision
  • Number of system storage units (SU) per one map
    measurement unit (MU)
  • If precision is 1 and mu 1 meter ( 1 SU per
    MU), cannot record values less than 1 meter
  • If precision is 100 and mu 1 meter (100 SUs
    per MU), can record values

    to 1/100 .01 1 cm

39
Precision and Spatial DomainWilson NC data
Precision is 15,624 Since map units are feet,
will support accuracy to 15624/12 1,302th of an
inch!!! You have 10 significant digits to work
with. Precision in essence controls were you put
the decimal.If map unit is meters and precision
is 1000, you record down to the nearest
millimeter. E.g. the map value 1,123,456.1236 is
stored as 1,123,456,124 With GRS80, world
circumference is 40,075,016 m. therefore can map
world at approx. 1.9 cm accuracy
(40,075,016100)/ 2,147,483,648
X values
793707
Domain range in x
137,438
47734
52555
32328
52555
2302168
2387052
2334496
2249613
Extent range in x
Domain range in y
745972
WilsonCity.shp Shapefile Extent (range of actual
data when fds created)
Extent range in y
41970
704002
137,438
Geodatabase Spatial Domain 100-200 wider on all
sides therefore Domain range is at least 3 times
Extent range (Exact amount depends on Precision)
47734
656268
--Geodatabase coordinates are stored as 4byte
long integer. This provides 10 significant
digits with max value of 2,147,483,648 -- map
value is multiplied by precision when stored (and
converted back when displayed), so min Y
values, for example, actually stored as
793707x156241,2400,888,268 -- the values are
also shifted when stored so that data is centered
in storage space so you only have to ensure that
max. range times precision is less than
2,147,483,648 137438x156242,147,331,300
lt 2,147,483,648 OK, otherwise reduce domain or
precision
Y values
40
Personal versus MultiUser (ArcSDE) Geodatabase
  • Personal Geodatabase.
  • Implemented as a Microsoft Access database (.mdb
    file) by using MS Jet engine which is installed
    along with ArcGIS8.
  • Microsoft Access license not needed, but its
    handy to have for attribute data development
  • Can be placed on local or network drives
  • if on network drive and a user has edit access,
    other users cant access (single user editing).
  • Intended for personal or small work-group use
  • can handle small to moderately sized datasets.
  • Max of 250,000 features per feature class
    (table)
  • maximum size is 2.0 GB
  • In general, has the full functionality of ArcSDE
    geodatabase except
  • versioning.
  • Multi-user editing
  • If a personal ggdb is deleted in ArcCatalog (or
    by Windows Explorer) , its gone.
  • One .mdb file can contain a lot of data. Be
    carefull!!

41
Personal versus MultiUser (ArcSDE) Geodatabase
  • Enterprise Multiuser ArcSDE Geodatabase
  • ArcSDE is a data access extension to ArcGIS 8 9
    that serves geodatabases to ArcGIS applications
    running on PCs connected via a TCP/IP network.
  • supports concurrent editing by multiple users.
  • Supports versioning where multiple users can
    concurrently edit different versions of a layer
    and any conflicts are resolved when versions are
    saved back to the original layer
  • significantly higher speeds for data access than
    shapefiles or personal geodatabases
  • Supports very large databases without the need to
    tile or otherwise subdivide the data
  • ArcCatalog only creates and deletes connections
    to ArcSDE geodatabases, it cant delete the
    database
  • Can be deployed on UNIX or Windows NT.
  • Many use UNIX platform for ArcSDE and DBMS, and
    XP for GIS applications
  • ArcSDE is centrally tuned and managed by a DBA.
  • Back-up and security procedures implemented in
    the DBMS apply to the GIS data.
  • Can build SQL applications to access tables in a
    remote geodatabase.
  • Requires a server with a DBMS (e.g Oracle, SQL
    Server) and ArcSDE.

42
Generating Geodatabase Schema
  • Schema is the definition of objects contained
    within a database
  • For a geodatabse, objects may include Domains,
    Tables, FeatureClasses, Relationships or
    GeometricNetworks.
  • Four solutions for schema generation/management
    in ArcGIS 9
  • Manual creation using ArcCatalog, or
  • Design from scratch in Microsoft Visio, output
    to XMI Repository, and use ArcCatalog's Case
    Tool to import from XMI
  • As above, but begin with one of ESRI existing
    Sample Data Models, and edit
  • GeodatabaseDesigner extension (free ArcScript
    downloadable from www.esri.com) which can be used
    in conjunction with first 2 or, in some cases, as
    an alternative.

XMI is a new standard for storing object models.
Supported by Visio 2002 and later
ESRI has sample data models for a variety of
areas --hydro, parcel, transportation,
utilities, etc.. --download from Web site
43
Pros and Cons
  • ArcCatalog Menu and Wizards
  • Simple and easy to use documented within
    ArcCatalog
  • Easy to introduce gross errors in the schema
    through either omission or addition
  • Visio(UML) and the ArcCatalog Case Tool
  • UML (Unified Modeling Language) is a standard for
    modeling object-oriented objects and their
    properties, thus can be applied to other
    databases
  • object inheritance significantly reduces
    duplication
  • Uses Visio's strong graphical functionality for
    an easy way to visualize the design.
  • However,
  • need to buy it
  • More complex with steep initial learning curve
  • only supports a subset of geodatabase properties
  • Geodatabase Designer (GD)
  • Fast and free to use and distribute (although
    not officially supported)
  • Supports all geodatabase properties and all
    ArcSDE RDBMS's.
  • The only bi-directional solution schema can be
    EXPORTED and IMPORTED
  • However,
  • A proprietary solution which only works with ESRI
    geodatabase
  • Only displays in html text, but can use
    Geodatabase Diagrammer (another free extension)
    to display in Visio

44
Database Normalization
  • An important step in the physical design is
    database normalization
  • Developing a table structure which
  • Reduces or eliminates redundancy
  • Makes tables easy to manage
  • Simplifies changes in the future
  • There is an entire theory of database
    normalization
  • we dont have time to go into it
  • Just present an example
  • The usual goal is to create a table structure
    which is in 3rd normal form (3NF)

45
Unnormalized (flat file)
3rd Normal form --all fields are determined by
primary key field
See Appendix II for more detail
46
Data Importing Vs. Data Loading
  • Importing
  • Creates new features within a new feature class
    or geodatabase table.
  • The features class or table cannot exist before
    importing
  • Database schema is imported at the same time
  • Often involves conversion from other formats e.g.
    coverages
  • Loading
  • Appends features into an existing feature class.
  • Existing feature class must have the same schema
    as the data sources
  • Can be accomplished with
  • Simple Data Loader (ArcCatalog)
  • Object Loader Wizard (ArcMap)

47
Conclusion
  • The outcome of these steps is
  • A rigorous design for our database
  • a database schema
  • The design of a process for obtaining the data
    elements that will populate our database schema
  • Identifying a data source and the necessary
    processing sequence for each layer
  • covered in Implementation Steps lecture
  • Next time, we will go into the lab and look at
    some of this in practice.
  • This will involve many ESRI-specific design
    decisions as outlined in
  • dbdecisions.ppt

48
Appendix I
  • DBMS Relational Operators

for regular (non-spatial) relationships (in the
ER Diagram or UML model), physical database
design involves identifying which of the RDBMSs
normal query structures or relational operators
will handle the relationship
49
RDBMS Relational Operators
  • Select (or Restrict)
  • retrieves a subset of rows from a table based on
    value(s) in a column or columns
  • Project
  • retrieves a subset of columns from a table,
    removing duplicates from the result
  • Product
  • produces the set of all rows that are the
    concatenation of a row from one relational table
    with a row from another relational table
  • (usually an intermediate step not useful
    otherwise)
  • Join
  • horizontally combines (contatenates) rows in one
    table with rows in another (or the same) table,
    including only rows which meet some selection
    criteria relating columns of the two tables
  • Combines product and select
  • Union
  • vertically combines (stacks) rows of one table
    with rows in the same or a different table
  • Intersection
  • results in rows common to two (or more)
    relational tables
  • Difference
  • results in rows that appear in one table but not
    another
  • Division
  • results in common values in one table for which
    there are other matching column values
    corresponding to every row in another table

Examples follow in the next three slides
50
Base Tables
51
Base Tables
52
Terris job function is to check if employees of
Big City X have taken required courses in the
citys employee training program. The process is
to compare courses taken with required courses.
Terri can be replaced by a RDBMS Division
relational operator!
Base Tables
The division operator identifies Fred and Karen
53
Appendix IIDatabase Normalization Detail
  • Developing a table structure which
  • Reduces or eliminates redundancy
  • Makes tables easy to mange
  • Simplifies changes in the future
  • Our goal is normally to have all tables in third
    normal form (3NF)

54
Unnormalized Data (Flat File)
  • Unnormalized form (flat file)
  • You work for the county. In this particular
    state, the county records land ownership, values
    property, and manages all elections held in the
    county.
  • Some of the information you need is shown in the
    flat file above
  • This format has many problems, some of which are
    pointed to above

55
First Normal Form (1NF)
  • First Normal Form (1NF)
  • Each field contains smallest meaningful value
  • Parcel_ad split into two variables (street_no
    street_name), thus can sort on street, then
    number
  • Owner_ad left as complex attribute cos only used
    for mailing
  • No repeating fields (owner1, owner2, etc..)
  • There is now no limit on number of owners per
    parcel
  • However, problems in that
  • Must use multiple primary key fields (parcel_id
    and owner_id) to uniquely identify a record
  • multiple repeating values when there are two (or
    more) owners Street_no, street_name, block,
    precinct,councillor, mayor, city, Owner_name,
    Owner_ad all have repeats. Wastes space, and
  • If an owners address changes, multiple records
    (rows) must be updated
  • If a parcel is sold, and the owner does not own
    any other property (e.g. Kroeger, Adams, M or
    Bertrand), information about that owner is lost

56
Second Normal Form (2NF)concept of 2NF and
problem with 1NF table
  • Second Normal Form (2NF) requires that every
    non-key field (attribute) be functionally
    dependent on the primary key
  • Functional dependency is a relationship between
    attributes such that knowing one attribute
    automatically determines the other
  • Tables with multiple fields making up the primary
    key are not 2NF
  • This usually shows up as repeating values in an
    attribute fields
  • For example, owner_ID repeats, and knowing the
    owner does not determine the councilor
  • knowing owner_ID (part of primary key) is 001,
    does not determine Councilor, which could be
    Jones or Smith.

(1NF table)
57
Second Normal Form (2NF) example of 2NF form
(2NF tables)
  • In each table, there is only one key field, and
    knowing its value determines the value of all
    other attributes
  • Satisfies criteria for 2NF
  • Far fewer repeats and duplicate editing problems
  • Note that there are still shortcomings, for
    example
  • if the mayor of city big changes, we must
    update two records

58
Third Normal Form (3NF)
  • 3NF requires that no non-key field be a fact
    about another non-key field
  • This will be violated if there is transitive
    dependency in a table
  • Transitive functional dependency occurs when the
    value in a non-key field is determined by a value
    in another non-key field
  • The value for city determines mayor, (and neither
    of these is the key field)
  • In 3NF, fields can only be attributes of the
    primary key, and not of some other field
  • Tables not in 3NF usually have repeating values
    in a non-key field (e,g mayor field in PRECINCT
    table)
  • Mayor Green is a fact about city (a non-key
    field), not about precinct (the key field)

Precinct Table (in 2NF) is split into Councilor
and Mayor tables in 3NF
59
4th and 5th Normal Form
  • Our goal is usually to have all tables in at
    least 3rd normal form
  • 4th and 5th normal forms also exist, but these
    can add disadvantages (for example, processing
    inefficiency) as well as advantages
  • For example, 5th Normal form has no duplicated
    data, but requires junction tables to link data
    and form relationships
Write a Comment
User Comments (0)
About PowerShow.com