Title: Database Design and Decisions GISC 6383 Management and Implementation of GIS
1Database Designand DecisionsGISC 6383
Management and Implementation of GIS
- Note some figures in this document are adapted
from - State of New York GIS Development Guides
- ftp//ftp.sara.nysed.gov/pub/gis/sara.zip
- chapter 3 conceptual design
- chapter 6 database planning and design
- chapter 7 database construction
2Data Base Design Process steps
Reality
(Review from earlier lecture.)
Conceptual
- Conceptually Model USERS view
- (Tomlinson Chapter 6,7)
- 2. Logically Model the Data
- Define data entities and their relationships
- Identify representation of entities and match to
spatial data model - (Tomlinson Chapter 8, 9 thru p.136)
- 3. Physically Model the Data
- Create physical database design for selected
software (Oracle, ArcGIS, etc) - 4. Design Process for obtaining and converting
Data from source
ESRI Steps for Building Geodatabase 1. Model the
users View of the data 2. Define objects and
relationships 3. Select geographic
representation. 4. Match to geodatabase
elements 5. Organize geodatabase
structure (Zeiler, p. 18)
Database Schema in UML Diagram (using Visio)
Geodatabase Generation and Population
Physical
3- We covered Conceptual Modeling/Needs Assessment
in the lecture on Implementation Steps - Tonight we focus on Logical Design and Physical
Design - These are the guts of database design/data
modeling
Loosely corresponds to Tomlinsons Chapter 8
Create a data design and Chapter 9 thru p. 136
4Objectives of Database Design
- Satisfies and supports organizations objectives
- Contains all data but no redundant data
- Minimizes redundancy across the organization
- Allows for different users to access same data
- Consistent and flexible data retrieval and
analysis - Accommodates different views of the same data
- Based on user and application needs
- Increases likelihood of users developing
applications - Appropriately represents organizes geographic
features - Maintains the data so its currency is assured
- Secures data by distinguishing applications (
users) which - create data (add records for new entities)
- update data (maintain modify existing data
records) - read data (use but cant modify in any way)
- delete data (remove records from database)
- (CRUD)
5Logical Data Modeling
6Why build a model?
- We build models of systems in order to break them
up into well-defined subsystems, because it helps
us to overcome the difficulties in comprehending
such systems in their entirety. - As the complexity of systems increase, so does
the difficulty of comprehending them and the
importance of good modeling techniques to assist
us in managing them. - Good models with well-defined semantics are
essential for communication among project teams
and to assure architectural soundness. - In other words, developing a model for an
industrial-strength software system prior to its
construction or renovation is as essential as
having a blueprint for a large building.
Luis X. B. Mourão http//cplus.about.com
7Logical Data Modeling
- It is an entirely data-driven process
- It encourages comprehensive understanding of
business information requirements - It enables effective communication among
designers, developers, and users throughout the
design process - It forms the basis for designing correct,
consistent, sharable, and flexible databases
using any database technology
- Correct
- an accurate and faithful representation of the
way information is used in the business - Consistent
- no contradictions in the way information objects
are named, defined, related, documented - Sharable
- accessible by multiple applications and users to
meet varying access requirements - Flexible
- easily updateable when implementation or
business changes
8Logical Data Modeling in practice
- purpose of data model is to ensure that data has
been identified in a completely rigorous and
unambiguous fashion on which both user and GIS
analyst agree - logical data models define entities (the unit
about which we collect informatione.g. people,
companies), the attributes of entities (the
information collectedage, salary), and the
relationships between entities (companies pay
salaries to people) - developed through use of
- entity-relationship diagrams which show
relationships among all data throughout the
organization - data dictionaries (structured lists) which
document each entity, its attributes, and its
relationships
9Entity Relationship (E-R) Diagrams and Logical
Data Modeling
- Entity (noun)
- objects or things to be included in the database
- person, place thing or concept about which you
wish to record info. - for example employee, company, citizen, street
- Attribute (adjective) (but they are often
nounse.g. a name!!) - characteristics or measurements to be recorded
for the entities - fact or nondecomposable piece of information
describing an entity - for example age, dob, owner, street type
- Relationships association between entities
(verb) - employee---works for----company
company--------has-------employees - transformers ----are mounted on----poles
land parcel---has----owner - Cardinality of Relationships (adverb)
- one to one
- one to many
- many to many
- Business Rules (attribute domains and validation
rules) - requirements that attributes or relationships
must meet - specifications which preserve the integrity of
the logical data model by governing the values
attributes may assume or the cardinality
relationships may take on - For example of kids is an integer between 0
and 24 poles have 0 to 3 transformers
Country ----has--- Capital city (one capital
city per country) Company (one) lt----Work for----
---has------gt (many) Employees Parcels lt
----have----gt owner s (parcels gt1 owner ownersgt1
parcel)
10Conventional E-R Diagram
GIS E-R Diagram
11Entity or Attribute? owner as example
- Perennial problem in modeling
- Do you think of it as a number or text
gtgtgtattribute - Does it have attributes of its own gtgtgtentity
- Owner has name, billing address,
- Does it have a relationship with other things
gtgtgtentity - Owner receives tax bills
- Does it repeat elsewhere gtgtgtentity
- Owner could own many parcels
- Is it important by itself gtgtgtentity
- An owner IS important
- If in doubt, make it an entity, Can change
- later in physical model
- Some rules for attributes
- Primitives no meaning in itself
- e.g 50
- All values of same kind
- e.g all integers 4, 6 not 4, SIX
- Describe entity not another attribute
- e.g. owner name describes owner of parcel, not
the parcel - Never a list
- e.g. owner John Smith, Bill Jones
- Never repeated
- e.g. owner1, owner2, owner3
- Undecomposable or composite?
- e.g. 117 West Plano Road
- Exceptions are always made!!!
- e.g. in 911 when speed matters
12...data dictionaries (structured lists) document
each entity, its attributes, and its relationships
When using UML techniques to create
Entity-Relationship diagrams, entities,
attributes and relationships can all be
documented within the diagram. Data
dictionaries become part of the
metadata. Recording measurement units (meters,
feet) for attributes is critical and
problemattic. Can include in --name
length_in_feet (clumsy) --metadata (metadata
gets lost, forgotten, or separated from data)
Entities which have spatial expression need
additional conceptualization see following
slides
13Spatial Data Modeling
- Spatial data differs in two key ways, and these
must be incorporated - entities have a corresponding spatial expression.
In ESRI terms, - Objects entities without spatial expression
(e.g. owner) - Features entities with spatial expression (e.g.
parcel) - An entity, when given spatial properties, becomes
a GIS spatial data layer - relationships may have a spatial expression also
14in essence, you need to decide how to represent
each entity spatially
- Note that
- one entity could be visualized through another
- footprint on a lot map
- might be different at different scales
- airport a point at one scale and polygon at
another - might be different for two applications
- street as line for routing
- street as polygon for pavement management
- attributes might be displayed graphically as
annotation
- Point or a point symbol
- Line or line types
- Areas or polygons
- Surfaces or surface drapes
- Raster format such as scanned paper documents
- Images such as photos, satellite or clip art
- or plain old Alpha or numeric
15..identifying entities spatial
representation (examples)
- Property associated
- Legal parcel
- Assessor parcel
- Parcel boundary
- plat map
- Parcel photograph
- Owner
- Address
- Land value
- Street associated
- Street
- Street segment
- Intersection
- Traffic light
- Traffic analysis zone
- Bus route
- Bus stop
- polygon
- polygon
- line string
- raster
- image
- alphanumeric
- alphanumeric
- numeric
- line string
- line segment
- node
- point
- polygon
- route
- point
pixels
in essence, you need to decide how to represent
each entity spatially
abcdef123
110210.67
(And these will need to be supported by the GIS
software you select)
16Additionally, you will need to identify the
classic spatial data properties for each layer
- Scale ranges at which data is required to be
displayed - minimum resolution required to support intended
applications - minimum accuracy required to support intended
applications - Projection(s) in which data will be
- stored and
- used
- (may not be the same)
17identifying spatial relationships, and how they
will be implemented
street segments link to street network
cities common border
lot within floodplain
house nearest fire_hydrant
Valve and gauge same manhole
Possible ER symbol (not standardized)
How GIS system might implement the relationship
18 Contains --15 entities --16 relations Attributes
not included for simplicity
Corrected from original source
Example E-R Diagram with Spatial Concepts for
Urban Application
19E-R Diagram
- Final E-R diagram should be verified with users
for - required entities
- appropriate spatial representation of entities
- required attributes
- appropriate spatial relations/operations
- Once verified, the E-R diagram becomes the basis
for the physical data base design.
20Creating E-R diagrams Unified Modeling Language
- E-R diagrams can be drawn by hand or with any
drawing package - CASE (computer aided software engineering) tools
emerged in the 1980s and early 90s) to aid in the
process (e.g. TIs Composer) - Each used their own proprietary language and
symbolization - UML (Unified Modeling Language) developed in
mid/late 1990s to provide standardized modeling
language based on object-oriented concepts. - Initiated at Rational Software Corporation in
1994/95 by merging of Grady Boochs (Booch
Model), Jim Rumbaughs (OMT--Object Modeling
Technique) and Ivar Jacobsons (OOSE--Object-Orien
ted Software Engineering) method - Existence of UML standard allows data base
vendors to support automated conversion of
conceptual data models to physical data base
designs - ArcGISs support use of MS Visio2000/02/03
(enterprise edition 2000, professional 02/03
for full support) see Zeiler, p. 19-20 - Sample geodatabase schemas (templates) available
for different industries - Other UML based products Rational Rose, Paradigm
Plus, Oracle2000, ERwin (from Computer
Associates)
21Physical Data Modeling
- Creating physical database design for the
selected database software - Oracle, SQL Server, ArcGIS, Intergraph, etc
- We will focus on physical database design for an
ESRI Geodatabase
Loosely corresponds to Tomlinsons Chapter 8
Create a Data Design
22physical data base design involves
- For entities and attributes
- Representing (mapping) all entities and their
attributes into one or more relational tables in
a selected RDBMS and determining keys for forming
relationships - For each spatial entity, selecting an
appropriate - GIS data type e.g. polygon, line, point,
surface - GIS data set format for storage e.g.
geodatabase, coverage, grid, tin, image, etc - Spatial Reference System
- coordinate system (geographic or projected, etc.)
- Spatial domain
- Precision (or resolution)
- Selecting an appropriate Data Type for each field
(attribute) - e.g. for ESRI string, short integer, long
integer, float, double, blob. - For relationships (associations)
- for regular (non-spatial) relationships,
identifying which of the RDBMSs normal query
structures or relational operators will handle
the relationship - if wont do it, develop specs for a custom
application - See Appendix for discussion of relational
operators - for spatial relations, identify which
capabilities of software will handle the desired
operation (e.g. nearest neighbor identification) - if wont do it, develop specs for a custom
application - Note Computer people talk about mapping
entities to tables rather than representing.
Mapping in this sense does not mean producing
cartographic output!
23Conceptual E/R diagram
Physical Database design
(primary key)
(foreign key)
(Primary keyforeign key)
(one)
(many)
Example of mapping a relationship for RDBMS
attribute tables --the relation contains is
mapped to a table join --the building table has
a foreign or secondary key, (parcel ID)
to permit a join with the primary key (also
parcel ID) in parcel table The spatial
data structure is shown as two generic layers
24Example of Spatial Database Mapping for ArcInfo
coverage
spatial expression represented as an ArcInfo
polygon coverage
Attribute List of Entity "Parcel"
Parcel
Attributes represented in Oracle Tables
ARC/INFO Spatial Database Structure (coverage)
Previous Values
Parcel
INFO
Parcel
ARC
AAT
TIC
BND
ETC.
PAT
What is wrong with this design?
PATPolygon Attribute Table
A key field for parcels is formed by
concatenating subdivision, block lot s
25Example of Database Mapping for geodatabase
- Design decisions
- why distribmains and transmains as separate
feature classes? - why not valve with gate and hydrant
subclasses? (--different attributes) - why prodwell1 and prodwell2?
26ESRI-related DB Design Decisions
27ESRI DB Design Decision Overview
- Type for spatial data layer
- Vector (point, line or polygon), raster, tin ?
- Format for spatial data
- Coverage, shapefile or geodatabase ?
- Geodatabase Design Decisions
- Feature Datasets
- Stand-alone feature classes or feature data sets
(fds)? - Spatial Reference system for each feature data
set - Coordinate system lat/long or projected? which
projection? what parameters? - Spatial Domain (extent) and Precision ?
- Feature Classes (tables)
- Subtypes or separate feature classes?
- roads feature class with road_type subtype, or
separate freeway, arterial, street feature
classes? - Attributes Types and Validation
- Type (string, long integer, short integer, etc.)
? - Validation Rule through application of Attribute
Domain ? - Domains and Defaults
- Relationships and Associations between feature
classes - Implement Geometric networks and/or topology
rules?
28Spatial Data Types
- Overall, ArcGIS 9 supports at least four
representations of geographic data. - Vector data for representing features
- CAD, Coverages, shapefiles, geodatabases
- Raster data for images, and surfaces.
- Image data in .bmp, .tiff, .jpeg, .sid, ERDAS
formats - Raster data in discrete or continuous GRIDS
(ESRIs native file format for raster) - Discrete grids can have an attribute data table,
continuous do not - Raster data in a Geodatabase (as of ArcGIS 9)
- Triangulated Irregular Networks (TINS) for
surfaces. - Although TINS are a vector format, as of ArcGIS
9.1, they are not yet supported by the Personal
Geodatabase and must be stored in coverage
workspaces or SDE geodatabase - Tabular data (sometimes called Event tables).
- List of X,Y coordinates for points (such as may
be output from GPS) - Locators for finding a geographic position from
an address.
A decision must be made as to the spatial data
type for each layer.
29Spatial File Formats--example
Tracts feature class table
Features (rows)
Feature type
Feature ID (key field)
Secondary or Foreign key
30Geodatabase Design Decisionsexample
Texas geodatabase Dallas County feature data
set feature classes All feature
classes extent within a feature data datum set
must be in the projection same
spatial reference system Plano feature data
set feature class feature
class Stand-alone feature classes Each
stand-alone feature class may be in a
different spatial reference system US
Geodatabase Stand-alone feature class
Rasters and TINs can be stored within a SDE
geodatabase but not a personal geodatabase
31Anatomy of a Geodatabase
- Geodatabase
- Feature datasets
- Spatial Reference
- Object classes and subtypes
- Feature Classes and subtypes
- Relationship classes
- Geometric Network
- Topology
- Domains
- Validation Rules
- Raster Datasets (SDE Only)
- rasters
- TIN datasets (SDE only )
- nodes, edges, faces
- Geodatabases contain feature datasets, raster
datasets, TIN datasets (planned 9.2), locators - Feature datasets contain various objects which
all share a common spatial reference - Objects (e.g. Jane Blow, land owner) are
instances of object classes (e.g. land owners)
and have no spatial form. - Features, stored in feature classes, are spatial
objects (e.g. land parcels) which are similar and
have same spatial form (e.g. polygon) - Object (or feature) classes are tables, with
objects (or features) in the rows of the table - Attributes are in the columns of the table
- Subtypes are an alternative to multiple object
(or feature) classes (e.g. concrete, asphalt,
gravel road subtypes) think of subtype as the
most significant classification variable
(attribute) in the class table
32Anatomy of a Geodatabase contd
- Geodatabase
- Feature datasets
- Spatial Reference
-
- Object classes and subtypes
- Feature Classes and subtypes
- Relationship classes
- Geometric Network
- Topology
-
-
- Domains
- Validation Rules
- Raster Datasets (SDE only)
- rasters
- Relationship classes are tables containing
general relationships between objects and/or
features (e.g. between work order object class
and roads feature class) - Geometric Networks models flows thru linear
systems such as streams, sewers, raods - Topology models relationships among lines and
areas (e.g. common state/county boundary) - Domains are sets of valid and/or default
attribute values (e.g. road lane count default
is 2 valid values are integers 1-12 ) - Validation rules control feature and attribute
integrity by applying domains. 3 types attribute
rules (applied to attributes or subtypes e.g
gravel road 1 or 2 lanes only) connectivity
rules (e.g. gravel road cannot connect to
freeway) and relationship rules (constrains
cardinality of a relationship e.g. gravel road
can have no more than 4 segments at an
intersection) - Simple behaviors are realized thru domains and
validation rules - Complex behaviors and custom objects are realized
by extending rules with custom programming
33Feature classes (FC), feature datasets (fds) and
subtypes
- Spatial features (e.g. a land parcel) are grouped
into feature classes a table with spatial data - Data in FC must have same topology type (all
points, all lines, all polygons) - Water feature class with lakes (polygon) and
streams (line) not permitted - Minimizing the number of feature classes improves
performance - Use different feature classes only when
attributes are significantly different - Use roads feature class rather than freeway,
arterial, streets feature classes - Use subtype to differentiate freeway, arterials,
streets (all have similar attributes) - Subtypes are subclasses within a feature class
that allow you to further distinguish objects
without creating new feature classes - based on a single columns values (must be
integer or long integer) - Same subtype has similar attribute values and
behaviors - Use where attributes are the same across all
subtypes - Feature classes can be grouped into feature
datasets (fds) or spatial folders - All feature classes in a fds must have the same
spatial reference system, but may have different
topology (can have points and lines and polygons
in same fds) - Organize by thematic similarity e.g
transportation - If you wish to create a geometric network, must
be in same fds - If you wish to create topology, must be in same
fds - If they share geometry (street forms political
boundary), should be in same fds - Security (read/write permissions, etc..) applied
at the fds not the fc level!!!!
34Data Types for Attributes
- For every attribute field, must select a data
type - Each RDBMS stores data slightly differently
- ESRI generic data types will translate into
closest RDBMS equivalent - Values given below may differ with RDBMS used
- ESRI Generic Data Types
- String text field. Be sure its length (number of
characters), absolute or what you specify, is
sufficient to record longest data value. - Short Integer (or integer) whole numbers (no
decimal point) generally - /-32,767 (2 bytes). OK for size of family, not
OK for city size - Long Integer (or long) only supports integers to
/- 2,147,483,647 (4 bytes) - Float (or single) single precision floating
point again, be careful-- supports decimal point
but perhaps only 6 digits long with decimal
moveable 34 places (E34) (4 bytes) - Double double precision floating point the
safest-- supports 12-15 digits with decimal
moveable up to 308 places (E308) (8 bytes) - Blob binary long decimal for special programming
applications - Note terminology
- Precision the total number of digits (before
plus after decimal) - Scale number of digits after decimal
35Domains and Defaults
- Why Use Them?
- Data Integrity prevents entry of invalid
(obviously wrong) data values - Data Efficiency choose from a set of valid
values rather than type in each time - Domains define a set of legal values for a
fields attributes - Range domain specifies a valid range of values
for numerical attributes - A water pipe must be between 1 and 100 inches
wide - Coded value domain specifies a valid set of
values for an attributes. Can apply to any type
of attributes - Parcels can only have RES or VAC land use values
- Domains are defined as a geodatabase property
then applied as appropriate - Multiple objects in the same database may use the
same domain - May be applied to an entire field (attribute), or
separately by subtype - Defaults are values automatically assigned when
a feature is created - Of course, may be changed during data entry/edit
process - Again, may be applied to an entire field
(attribute), or separately by subtype - Again, the physical design process requires
decisions about domains and defaults, and to
what they should be applied.
36Implementing Associations and Relationships
- Associations (general term) may be implemented as
- A relationship class within the GDB.
- both classes (tables) in the relationship (i.e.
the related tables) must be within the GDB - permanent part of the data model and relationship
rules are enforced - Is itself a class, stored in tables, with
properties and behaviors - support 11 and 1M cardinality, and many to many
using a key class table. - Strictly speaking, everything else is an
association - An ArcMap relate
- Functionally similar to a relationship class
- supports both 11 and 1M cardinality
- Can form a relate to tables outside the GDB
- INFO tables, Access tables, dbf tables
- Tables in other databases via ODBC (object data
base connectivity) - Local to the ArcMap document therefore is
essentially temporary - An ArcMap join
- Supports 11 cardinality only
- Links matching objects and visualizes them as
rows in a single table - Can match objects either
- Non-spatially using key attribute fields in each
table exists only temporarily
--Use relationship class when the referential
integrity of data is important --Joins are
simpler and require less overhead.
37Some Special Case Relationships
- Many to Many relationships
- Many class A objects match many class B (parcels
owners) - Implemented via an attributed relationship
- Intermediate table is created to store the
relationship - Aggregation v. Composition and Simple v.
Complex Relationships - Aggregation e.g. dog has bowl, collar
valve has valve box - Aggregation implemented through simple
relationships - Peer to peer delete one, the other remains
Dog dies but bowl and collar remain - Composition e.g. dog has feet, tail
valve has maintenance records - Composition implemented through complex
relationships - Enforced dependency delete one, and the other
goes also Dog dies, feet and tail gone
38Spatial ReferenceAll feature classes within a
feature dataset must have the same spatial
reference.
- Coordinate System
- Datum
- Geographic (lat/long) or projected?
- Projection parameters central meridian, standard
parallels, coordinate system origin (false
easting and northing) - Measurement (map) units dd (for lat/long)
feet, meters, etc. (for projected) - Spatial domain
- The allowable coordinate range for the geographic
coordinates - X/Y Domain MinX, MaxX, MinY, MaxY (horizontal
extent) - Domain defaults to 3 times the the actual data
extent (100 on either side) - Z Domain Min, Max (vertical extent)
- M Domain Min, Max (other parameter, e.g.
distance from river mouth ) (can differ within
feature data set) - Once created, the spatial domain for feature
dataset/class cannot be changed. - Data outside domain will require a new feature
dataset or standalone feature class. - Precision
- Number of system storage units (SU) per one map
measurement unit (MU) - If precision is 1 and mu 1 meter ( 1 SU per
MU), cannot record values less than 1 meter - If precision is 100 and mu 1 meter (100 SUs
per MU), can record values
to 1/100 .01 1 cm
39Precision and Spatial DomainWilson NC data
Precision is 15,624 Since map units are feet,
will support accuracy to 15624/12 1,302th of an
inch!!! You have 10 significant digits to work
with. Precision in essence controls were you put
the decimal.If map unit is meters and precision
is 1000, you record down to the nearest
millimeter. E.g. the map value 1,123,456.1236 is
stored as 1,123,456,124 With GRS80, world
circumference is 40,075,016 m. therefore can map
world at approx. 1.9 cm accuracy
(40,075,016100)/ 2,147,483,648
X values
793707
Domain range in x
137,438
47734
52555
32328
52555
2302168
2387052
2334496
2249613
Extent range in x
Domain range in y
745972
WilsonCity.shp Shapefile Extent (range of actual
data when fds created)
Extent range in y
41970
704002
137,438
Geodatabase Spatial Domain 100-200 wider on all
sides therefore Domain range is at least 3 times
Extent range (Exact amount depends on Precision)
47734
656268
--Geodatabase coordinates are stored as 4byte
long integer. This provides 10 significant
digits with max value of 2,147,483,648 -- map
value is multiplied by precision when stored (and
converted back when displayed), so min Y
values, for example, actually stored as
793707x156241,2400,888,268 -- the values are
also shifted when stored so that data is centered
in storage space so you only have to ensure that
max. range times precision is less than
2,147,483,648 137438x156242,147,331,300
lt 2,147,483,648 OK, otherwise reduce domain or
precision
Y values
40Personal versus MultiUser (ArcSDE) Geodatabase
- Personal Geodatabase.
- Implemented as a Microsoft Access database (.mdb
file) by using MS Jet engine which is installed
along with ArcGIS8. - Microsoft Access license not needed, but its
handy to have for attribute data development - Can be placed on local or network drives
- if on network drive and a user has edit access,
other users cant access (single user editing). - Intended for personal or small work-group use
- can handle small to moderately sized datasets.
- Max of 250,000 features per feature class
(table) - maximum size is 2.0 GB
- In general, has the full functionality of ArcSDE
geodatabase except - versioning.
- Multi-user editing
- If a personal ggdb is deleted in ArcCatalog (or
by Windows Explorer) , its gone. - One .mdb file can contain a lot of data. Be
carefull!!
41Personal versus MultiUser (ArcSDE) Geodatabase
- Enterprise Multiuser ArcSDE Geodatabase
- ArcSDE is a data access extension to ArcGIS 8 9
that serves geodatabases to ArcGIS applications
running on PCs connected via a TCP/IP network. - supports concurrent editing by multiple users.
- Supports versioning where multiple users can
concurrently edit different versions of a layer
and any conflicts are resolved when versions are
saved back to the original layer - significantly higher speeds for data access than
shapefiles or personal geodatabases - Supports very large databases without the need to
tile or otherwise subdivide the data - ArcCatalog only creates and deletes connections
to ArcSDE geodatabases, it cant delete the
database - Can be deployed on UNIX or Windows NT.
- Many use UNIX platform for ArcSDE and DBMS, and
XP for GIS applications - ArcSDE is centrally tuned and managed by a DBA.
- Back-up and security procedures implemented in
the DBMS apply to the GIS data. - Can build SQL applications to access tables in a
remote geodatabase. - Requires a server with a DBMS (e.g Oracle, SQL
Server) and ArcSDE.
42Generating Geodatabase Schema
- Schema is the definition of objects contained
within a database - For a geodatabse, objects may include Domains,
Tables, FeatureClasses, Relationships or
GeometricNetworks.
- Four solutions for schema generation/management
in ArcGIS 9 - Manual creation using ArcCatalog, or
- Design from scratch in Microsoft Visio, output
to XMI Repository, and use ArcCatalog's Case
Tool to import from XMI - As above, but begin with one of ESRI existing
Sample Data Models, and edit - GeodatabaseDesigner extension (free ArcScript
downloadable from www.esri.com) which can be used
in conjunction with first 2 or, in some cases, as
an alternative.
XMI is a new standard for storing object models.
Supported by Visio 2002 and later
ESRI has sample data models for a variety of
areas --hydro, parcel, transportation,
utilities, etc.. --download from Web site
43Pros and Cons
- ArcCatalog Menu and Wizards
- Simple and easy to use documented within
ArcCatalog - Easy to introduce gross errors in the schema
through either omission or addition - Visio(UML) and the ArcCatalog Case Tool
- UML (Unified Modeling Language) is a standard for
modeling object-oriented objects and their
properties, thus can be applied to other
databases - object inheritance significantly reduces
duplication - Uses Visio's strong graphical functionality for
an easy way to visualize the design. - However,
- need to buy it
- More complex with steep initial learning curve
- only supports a subset of geodatabase properties
- Geodatabase Designer (GD)
- Fast and free to use and distribute (although
not officially supported) - Supports all geodatabase properties and all
ArcSDE RDBMS's. - The only bi-directional solution schema can be
EXPORTED and IMPORTED - However,
- A proprietary solution which only works with ESRI
geodatabase - Only displays in html text, but can use
Geodatabase Diagrammer (another free extension)
to display in Visio
44Database Normalization
- An important step in the physical design is
database normalization - Developing a table structure which
- Reduces or eliminates redundancy
- Makes tables easy to manage
- Simplifies changes in the future
- There is an entire theory of database
normalization - we dont have time to go into it
- Just present an example
- The usual goal is to create a table structure
which is in 3rd normal form (3NF)
45Unnormalized (flat file)
3rd Normal form --all fields are determined by
primary key field
See Appendix II for more detail
46Data Importing Vs. Data Loading
- Importing
- Creates new features within a new feature class
or geodatabase table. - The features class or table cannot exist before
importing - Database schema is imported at the same time
- Often involves conversion from other formats e.g.
coverages - Loading
- Appends features into an existing feature class.
- Existing feature class must have the same schema
as the data sources - Can be accomplished with
- Simple Data Loader (ArcCatalog)
- Object Loader Wizard (ArcMap)
47Conclusion
- The outcome of these steps is
- A rigorous design for our database
- a database schema
- The design of a process for obtaining the data
elements that will populate our database schema - Identifying a data source and the necessary
processing sequence for each layer - covered in Implementation Steps lecture
- Next time, we will go into the lab and look at
some of this in practice. - This will involve many ESRI-specific design
decisions as outlined in - dbdecisions.ppt
48Appendix I
- DBMS Relational Operators
for regular (non-spatial) relationships (in the
ER Diagram or UML model), physical database
design involves identifying which of the RDBMSs
normal query structures or relational operators
will handle the relationship
49RDBMS Relational Operators
- Select (or Restrict)
- retrieves a subset of rows from a table based on
value(s) in a column or columns - Project
- retrieves a subset of columns from a table,
removing duplicates from the result - Product
- produces the set of all rows that are the
concatenation of a row from one relational table
with a row from another relational table - (usually an intermediate step not useful
otherwise) - Join
- horizontally combines (contatenates) rows in one
table with rows in another (or the same) table,
including only rows which meet some selection
criteria relating columns of the two tables - Combines product and select
- Union
- vertically combines (stacks) rows of one table
with rows in the same or a different table - Intersection
- results in rows common to two (or more)
relational tables - Difference
- results in rows that appear in one table but not
another - Division
- results in common values in one table for which
there are other matching column values
corresponding to every row in another table
Examples follow in the next three slides
50Base Tables
51Base Tables
52Terris job function is to check if employees of
Big City X have taken required courses in the
citys employee training program. The process is
to compare courses taken with required courses.
Terri can be replaced by a RDBMS Division
relational operator!
Base Tables
The division operator identifies Fred and Karen
53Appendix IIDatabase Normalization Detail
- Developing a table structure which
- Reduces or eliminates redundancy
- Makes tables easy to mange
- Simplifies changes in the future
- Our goal is normally to have all tables in third
normal form (3NF)
54Unnormalized Data (Flat File)
- Unnormalized form (flat file)
- You work for the county. In this particular
state, the county records land ownership, values
property, and manages all elections held in the
county. - Some of the information you need is shown in the
flat file above - This format has many problems, some of which are
pointed to above
55First Normal Form (1NF)
- First Normal Form (1NF)
- Each field contains smallest meaningful value
- Parcel_ad split into two variables (street_no
street_name), thus can sort on street, then
number - Owner_ad left as complex attribute cos only used
for mailing - No repeating fields (owner1, owner2, etc..)
- There is now no limit on number of owners per
parcel
- However, problems in that
- Must use multiple primary key fields (parcel_id
and owner_id) to uniquely identify a record - multiple repeating values when there are two (or
more) owners Street_no, street_name, block,
precinct,councillor, mayor, city, Owner_name,
Owner_ad all have repeats. Wastes space, and - If an owners address changes, multiple records
(rows) must be updated - If a parcel is sold, and the owner does not own
any other property (e.g. Kroeger, Adams, M or
Bertrand), information about that owner is lost
56Second Normal Form (2NF)concept of 2NF and
problem with 1NF table
- Second Normal Form (2NF) requires that every
non-key field (attribute) be functionally
dependent on the primary key - Functional dependency is a relationship between
attributes such that knowing one attribute
automatically determines the other - Tables with multiple fields making up the primary
key are not 2NF - This usually shows up as repeating values in an
attribute fields - For example, owner_ID repeats, and knowing the
owner does not determine the councilor - knowing owner_ID (part of primary key) is 001,
does not determine Councilor, which could be
Jones or Smith.
(1NF table)
57Second Normal Form (2NF) example of 2NF form
(2NF tables)
- In each table, there is only one key field, and
knowing its value determines the value of all
other attributes - Satisfies criteria for 2NF
- Far fewer repeats and duplicate editing problems
- Note that there are still shortcomings, for
example - if the mayor of city big changes, we must
update two records
58Third Normal Form (3NF)
- 3NF requires that no non-key field be a fact
about another non-key field - This will be violated if there is transitive
dependency in a table - Transitive functional dependency occurs when the
value in a non-key field is determined by a value
in another non-key field - The value for city determines mayor, (and neither
of these is the key field) - In 3NF, fields can only be attributes of the
primary key, and not of some other field - Tables not in 3NF usually have repeating values
in a non-key field (e,g mayor field in PRECINCT
table) - Mayor Green is a fact about city (a non-key
field), not about precinct (the key field)
Precinct Table (in 2NF) is split into Councilor
and Mayor tables in 3NF
594th and 5th Normal Form
- Our goal is usually to have all tables in at
least 3rd normal form - 4th and 5th normal forms also exist, but these
can add disadvantages (for example, processing
inefficiency) as well as advantages - For example, 5th Normal form has no duplicated
data, but requires junction tables to link data
and form relationships