Title: Review of Cartographic Data Types and Data Models
1Review ofCartographic Data Types and Data Models
2GIS Data Models
3Raster Versus Vector in GIS Analysis
- Fundamental element used to represent spatial
features - Raster pixel or grid cell.
- Vector x,y coordinate pair.
- Area for which data values are stored in each
system - Raster must store value for each cell of the
grid, which covers the entire study area. - Vector stores locational data only for objects
of interest within the study area.
4Modeling Geospatial Reality
Raster Model
Vector Model
Real World
5Coding Vector GIS
6Coding Vector GIS
Vector Mode Model of Reality
Reality
7Coding Raster GIS Data
8Coding Raster GIS Data
1 1 1 1 2 3 4 4 1 1 1 2 2
3 4 4 1 2 2 2 3 3 4 4 2 2
2 3 3 4 4 4 3 3 3 3 5 5
5 5 1 1 1 1 6 5 5 5 1 1 1
1 1 5 5 5 1 1 1 1 1 1 5 5
Raster Mode Model of Reality
Reality
9Minimum Mapping Unit
- Your MMU should influence what real-world
features make it into your digital
representation. - Raster 51 of a cell?
- Vector is a stream wide enough to be its own
polygon? Should a house be the footprint as the
polygon or should it be a point?
10Representing Value and Location in Space Points
- Raster points
- location cell position as specified by row and
column position within grid, which should be
geo-referenced. - value specified by the number stored for the
cell. - Vector points
- location position specified by single x, y
coordinate pair. - value stored as data values in attribute file
and tied to the point by means of a geo-code.
11Raster Encoding
Points in the World Out There
Points Encoded as Raster
Resulting Image
121
Point X Y
2
1 X1 Y1
2 X2 Y2
3
3 X3 Y3
4 X4 Y4
4
Vector Encoding
Points in the World Out There
Points Encoded as Vector
Resulting Image
13Representing Value and Location in Space Lines
- Raster lines
- location linear set of contiguous cells, each
identified by a row and column location and each
having the same data value. - value data value of each linear feature is
represented by the cell value stored for each
cell. - Vector lines
- location ordered set of x,y coordinate pairs.
- value a geo-code assigned to the line is tied
to a geo-code in an attribute file where the
computer stores the data value or values for the
line.
14Raster Encoding
Lines in the World Out There
Lines Encoded as Raster
Resulting Image
15 Line X Y
1 X11 Y11 X12
Y12 . . .
. . . X1n Y1n
1
2 X21 Y21 X2n
Y2n
3
2
3 X31 Y31 X32
Y32 . . .
. . . X3n Y3n
4
Lines in the World Out There
4 X41 Y41 X4n
Y4n
Vector Encoding
Lines Encoded as Vector
16(No Transcript)
17Representing Values and Location in Space Areas
- Raster areas
- location region of contiguous cells all of
which have the same data value. - value data value stored for each cell is the
data value for the area e.g., for population
density, a density of 589 would be represented by
assigning each cell comprising the area the value
589. - Vector areas
- location closed set of x,y coordinate pairs.
- value point within area is tied by means of a
key to an attribute file value or values to be
assigned to the area defined by the x,y
coordinates.
18Raster Encoding
Areas in the World Out There
Areas Encoded as Raster
Resulting Raster Image
19Capturing Vector Data
2034005
Digitizing Polygon 1
Digitizing Polygon 2
Digitizing Polygon 3
Digitizing is funfor a short time
21Area X Y
1 X11 Y11
X1i
Y1i X11 Y11
2 X21 Y21 X2i
Y2i X21 Y21
3 X31 Y31 X3i
Y3i X31 Y31
Areas in the World Out There
Vector Encoding
Areas Encoded as Vector
22(No Transcript)
23Representing Values and Location in Space Volume
- Raster volume
- location row and column position represents
position on the surface. - value cell data value represents the height of
the surface at the location of the cell. - Vector volume
- location x,y coordinates position triangles that
comprise a TIN. - value z data value stored for each x,y
coordinate position.
24(No Transcript)
25Whether Raster or Vector
- All Layers Must Be Geo-referenced and Rectified
26Set of Layers Comprise Geodatabase Layers Must be
Rectified
Earth
Assign Coordinate Values to Locations
27Raster Conventions
28Raster Database Conventions
- Divide entire study area in a regular grid of
cells. - Assign one and only one data value to each cell.
- Database consists of a set of maps or layers each
of which depicts the same well-defined region or
study area Washington Township in Gloucester
County. - Each layer describes a single characteristic of
each cell within the study area e.g., land use. - Describe multiple features with multiple layers.
29Raster Definitions
- Orientation angle between true north and the
direction defined by the columns of the raster.
30Raster Definitions
- Region within a single layer, a set of
contiguous cells that all have the same value. - Zone all of the regions within a layer that
have the same value.
In much of the GIS literature Arc Maps region
is called a zone and Arc Maps zone is called a
class.
31Raster Zone and Region
All of the forest taken together represents a
single zone.
Each individual set of contiguous forest cells
represents a single region.
32Cell Value Assignment Qualitative Data
- Predominant type or majority rules--category
taking up largest proportion of cell determines
cell value land use. - Cell center value-cell gets value of category at
its center. - Presence or absence e.g., if phenomenon is
present, cell takes value--road - Precedence of types assign cell a value
reflecting the most important category present. - Number or proportion e.g., cell value number
of items present in cell--wells
33Majority Rules Assignment
34Presence / Absence
35Precedence of Type
36Ratio Value
37Interpolated Value
38Objects
- GIS Features as Objects is a recent method of
representing aspects of the real-world in GIS - Example of the shift from specialty data to DBMS
that are spatially-aware - Non-planar, temporally shifting, topologically
linked, rule-based actions
39Vector Geometry as Objects
- Parcels
- Planar geometries with attribute information
- Parcels as objects in a Cadastral carpet
- Objects with topology rules (dont overlap,
unless) - Members of regional features (zoning,
municipality) - Composed of surveyed parts (COGO, benchmarks)
- Keys that link to attribute tables (owner(s),
assessments, plans, etc)
40Cadastre Example
benchmark
41Attributes as Objects
- Not only can multiple sets of geospatial features
interact with rules, the attributes can be linked
with one another, with their own set of rules and
actions - Ownership record linked to GIS parcel
- Search on multiple owners, records
- Removal of parcel warns about orphan owner
- Functions that can be performed by GIS analyst
can be embedded in the actual database
42GIS Models
43GIS Models Over Time
- Simple Representation
- CAD model
- Data Analysis
- Raster model
- Data Collection
- Vector model
- Relational and Rules
- Object model
44Geodatabases
45Geodatabase vs Other Formats
- Coverages and Shapefiles stored geospatial and
attribute data in different locations in
different formats - .shp (proprietary binary format)
- .dbf (dBase database format)
- Geodatabases store both geospatial and attribute
data in the same structure
46Benefits and Drawbacks
- Benefits
- GIS data can now be handled like most other data,
and stored in a RDBMS - Greater flexibility and functionality
- Enterprise level of managing data
- Drawbacks
- Speed hit
- Even more rope to hang yourself with
47ESRI Geodatabases
- File Geodatabase
- Introduced in 9.2, the File Geodatabase is the
latest, greatest file-based format from ESRI - Personal Geodatabase
- Introduced in 8.x
- Based on Microsoft Access/Jet Engine
- ArcSDE
- Software (now part of ArcGIS core) that allows
RDBMSs to act as GIS data stores.
48File Geodatabase
- Latest format
- Best modern format for large datasets
- What you should be using for significant work
- Stores data on disk in several files within a
directory named geodatabase.gdb
49Personal Geodatabase
- Based on Microsoft Access
- Great for bringing outside data into ArcGIS
- Limited to 2GB
- Can become clunky and slow as amount of data
increases - Stores data in one file called geodatabase.mdb
50ArcSDE/Enterprise database
- Most likely stored on an entirely different
machine from the one youre running ArcGIS on - Same basic functionality as other GDBs
- Concurrent users
- Managed (hopefully) by a DB administrator
51Working with Geodatabases
- At a minimum, consider it similar to a
subdirectory with shapefiles - Unlike shapefiles, you can enforce extents,
storage types, projections, topology rules,
connectivity rules, network-specific rules, and
so on - This additional functionality is implemented
through Feature Datasets
52Feature Datasets
- A folder within the GDB, it preserves
projection and extent information for data within
the folder (feature classes) - To make it useful, you must set extent and
projection information - Put some forethought into it before specifying
projection and extent!
53Feature Datasets
- After creating a GDB, right click and choose New
gt Feature Dataset - The dialog boxes will step you through setting
the variables for the Feature Dataset
54Importance of Extent
- The Geodatabase will only bother with the
information within the extent - It will throw an exception if you attempt to put
something that doesnt fit in the box - ArcGIS can preserve the difference between two
points down to the molecular level - Setting the extent allows you to control the
precision at which ArcGIS handles data - Needlessly too precise, and youll have errors
thatll never show up on the screen, but will
still impact your data -
55Defining New Jersey
- ProjectionNJ State Plane (feet)
- Extent ?
- Should it be tight?
- Should it extend outside the boundaries?
56Defining New Jersey
- In this case, Arc defaults to a grid of 0.00328
feet - Roughly 4/100ths of an inch
- About a hairs width
- 0.2 feet is slightly smaller than 1/4
57Balancing Precision and Functionality
- Your extent match the scale in which you are
working - Leave a little wiggle room
- Working in New Jersey? Some of NY, PA, DE should
fall into your box. - Greenland fits? Your box is a little too big.
58Additional Functionality
- In your Feature Dataset, right click and see what
pops up under New gt - Topology
- Geometric Network
- Network Dataset
- Etc
59Geodatabase as a container
- Each of these special datasets uses the GDB to
store data specific to its framework - Topology stores associated attribute tables,
rules, and error information - Network stores network edge attributes, turn
tables, and driving/routing directions
60Normalization
- A normalized database is one that has little
redundancy within its tables - Record ID or some other key links to a table with
those values - Instead of storing Modified Agricultural
Wetlands numerous times as text, store it once
as text and refer to it using a key (2140)
61Normalization
- Work in a normalized environment
- Analogs
- Non-normalized Excel Spreadsheet
- Normalized well made Access DB (lookups)
- When distributing for the public, flatten the
database out to one table per layer - Make it a shapefile
62Geodatabase Environment
- Important to work in a GDB whenever possible
- Assured extents, projections, etc
- Quality control
- Greater number of tools at your disposal
- Export to other format (.shp) for distribution
63Going Further
64Standard Query Language
- SQL is the standardized method of interacting
with a database - Even Access allows you to use SQL
- Insert (update into a DBMS)
- Update (existing records in DBMS)
- Delete (remove records from DBMS)
- Where (limits your results)
65Select Statements
- Most common SQL query you will encounter
- Select By Attributes has this as the foundation
- Nothing more than SELECT FROM gis_layer WHERE
66Joins
- In ArcGIS or Access, you join two (or more)
tables together using a primary key. - If the keys match, the secondary tables are
tacked on to the first - Again, geospatial is special, so GIS has another
type of join
67Spatial Joins
- Relationship not determined by key, but by
proximity or connectivity - Contains/Within/Overlaps
- One feature falls entirely within another
- Touches/Intersects/Crosses
- One feature touches another
- Equals or Disjoint
68Transactions
- Geodatabase edits are either committed or rolled
back - Edits performed in a multi-user environment are
integrity checked - Atomic-level editing and revisioning
- Race condition
69Versioning
- GIS tracks edits made and maintains a journal of
all changes to the database - This record keeping allows for roll backs to any
date on record - Keep one set of records while reverting another
- Same database methodology as Wikipedia
70Data, data, everywhere
- In the Internet age, massive amounts of data are
compiled, transmitted and analyzed every second - Understanding the storage and retrieval methods
are critical - Difference between drinking and drowning