Title: Introduction to the Geospatial Data Content Area Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries
1Introduction to the Geospatial Data Content Area
Steve MorrisHead of Digital Library
InitiativesNorth Carolina State University
Libraries
Preservation Issues Related to Digital Geospatial
Data
Apr. 21, 2008
2Outline
- Digital geospatial data types and formats
- Standards (metadata, interoperability)
- Mass market geospatial industry directions
- Not covered
- Types of spatial analysis
- Developing GIS services
- Discussion of available data resources
- Data reference interviews and data selection
criteria - Specific approaches to data preservation
3What is a GIS?
- A geographic information system is a system used
to capture, store, manipulate, analyze, and
display all types of spatially referenced
geographic information about what is where on the
earths surface and how they relate to each
other (Fischer and Nijkamp, 1992).
4Local Applications Where GIS Is Used
Source NC OneMap Data Inventory 2004
5State and Local Government Geospatial
Data Problem Scope The North Carolina Example
- 98 of 100 North Carolina Counties have GIS
systems as do many municipalities - Over 30 state agency data producers
- Exceptional value
- Detailed, current, accurate
- Exceptional risk
- Inconsistent or nonexistent archiving practices
- Complicated formats and complex objects
Source NC OneMap
6Carrboro, NC Population 17,797 (2005 est.)
22 downloadable GIS data layers
10 web mapping applications
3 OGC WMS services (web services)
9 downloadable PDF map layers
7Key Geospatial Data Types
- Vector data
- Raster data
- Tabular data
http//www.lib.ncsu.edu/gis/data.html
8Geospatial data types Vector data
9Vector Data
Vector Representation
Real World
10Vector Linkage to Tabular Data
- Products approximate hand drawn maps
- Better description of individual objects
- Topology allows more spatial analyses networks,
adjacency
11Individual data layers are overlayed on top of
one another to create customized maps.
12Time series vector data Parcel Boundary Changes
2001-2004, North Raleigh, NC
13NC OneMap Initial Data Layers Produced by Cities
and Counties
Source NC OneMap Data Inventory 2004
14County Street Centerline Specifics
Source NC OneMap Data Inventory 2004
15County Cadastral Specifics
16Some Common Vector GIS Formats
- ArcInfo Coverages (ESRI)
- ESRI Export file (.e00)
- Shapefiles (ESRI)
- MapInfo MID/MIF
- TIGER files
- Spatial Data Transfer Standard (SDTS)
- Digital Line Graphs (DLG)
- Many more
17Some Common Vector GIS Formats
- ArcInfo Coverages (ESRI)
- ESRI Export file (.e00)
- Shapefiles (ESRI)
- MapInfo MID/MIF
- TIGER files
- Spatial Data Transfer Standard (SDTS)
- Digital Line Graphs (DLG)
- Many more
18Vector Data Standards Issues
- No widely-adopted, open standard for geospatial
vector data - SDTS intended as an open exchange standard but is
difficult to implement and not widely supported - Geography Markup Language (GML) is not a format
a language to define industry specific
application schemas adhering to specific profiles - Shapefile is widely supported and openly
documented (though proprietary) - Functions as de facto lingua franca of vector
data - Lacks some functionality (topology, annotation,
..) - Vector data conversions are complex, lossy
19Geospatial data types Raster data
Downtown Pittsboro, NC 10 meter SPOT imagery
20Geospatial data types Raster data
Downtown Pittsboro, NC 1 meter DOQQ
21Geospatial data types Raster data
Downtown Pittsboro, NC 2 foot county orthophoto
22Geospatial data types Raster data
Downtown Pittsboro, NC 6 inch county orthophoto
23Raster Data
Real World
Raster Representation
24Raster Linkage to Attribute Data
- Simple data structure of grid cells
- All types of features share one data structure
- Simple to analyze several layers at once
Advantage frequent data reacquisition
25Geospatial Data Types Raster to Vector
Source NCCGIA
26Geospatial data types Raster data
27Time series Ortho imagery Vicinity of
Raleigh-Durham International Airport 1993-2002
28County Digital Orthophotography Specifics
Source NC OneMap Data Inventory 2004
29Image re-processing Example of the 1993 Digital
Orthophoto Quarter Quadrangles
USGS JPEG Unclipped UTM
Und.Systems State Plane (f) BMP
USGS Unclipped BIP UTM
NCSU Libraries MrSID UTM Unclipped
NCDOT TIFF State Plane (m) Clipped
NCDOT JPEG State Plane (m) Clipped
NCSU Libraries MrSID UTM County Mosaic
NCDOT JPEG Thumbnail Clipped
Reprojecting Image Conversion Retiling
(clipping, mosaics) Resampling
NCDOT MrSID State Plane (m) County Mosaics
30Project Status
Increasing Commercial Options for High Resolution
Satellite Imagery
31Some Common Raster GIS Formats
A couple key acronyms DOQQ Digital Orthophoto
Quarter Quadrangle Nationwide orthophoto series,
typically at one meter resolution DRG Digital
Raster Graphic Scanned image of a U.S.
Geological Survey (USGS) standard series
topographic map, including all map collar
information.
- TIFF/GeoTIFF
- BIP/BIL/BSQ
- JPEG
- JPEG 2000
- MrSID
- ESRI Grid
- Many more
32Geospatial data types Tabular data (w/vector)
33Geospatial data types Spatial database
Geodatabase Availability in NC Local Govt.
Agencies
- Local agencies, especially municipalities, are
increasingly turning to the ESRI Geodatabase
format to manage geospatial data. - According to the 2003 Local Government GIS Data
Inventory, 10.0 of all county framework data and
32.7 of all municipal framework data were
managed in that format.
34Feature Datasets contain feature classes (vector
data) Topology rules to ensure data
integrity Geometric Network rules to manage
connectivity Tabular Data Attributes of spatial
data Relationship Class links geographic
features to tabular data Metadata XML format,
for each dataset Survey Data Coordinate System,
measurements, etc. Raster Datasets
Inside the Geodatabase
Slide from Amanda Henley, UNC-CH
35Geospatial data types Cartographic
Counterpart to the map is not just the dataset
but also models, symbolization, classification,
annotation, etc.
36Geospatial data types Cartographic
- GIS Software
- Software project file (.mxd, .apr, )
- Data layer file (.avl, .lyr, )
- PDF map exports
- Web Services-based representations
37Other Data Issues That I Dont Have Time to Go
Into
- Coordinate Systems and Projections
- The world is not flat but maps are there are
various ways to describe the earths surface as a
two dimensional place - Vertical and Horizontal Datums
- Establishing starting points for describing the
earths surface - Tiling Schemes
- Method of data organization (e.g., county, state,
tax map grid, river basin, hydrologic unit) - Rights Issues
- Public domain vs. commercial
- Varied interpretations of public records law
- Ambiguous rights with web services
- GeoDRM
38Versioning and Updating
- Orthophotos
- County digital orthophotos reflown every 2-7
years - Statewide digital orthophoto plan every 5 years
(alternating BW and color infrared) - Vector Data
- State agency vector data some static, some
periodically updated, relatively fewer
continuously updated - County/City/COG vector data many data layers
continuously or periodically updated - Old versions supplanted, exist on relatively
inaccessible backups
39Geospatial Metadata Standards
- Federal Geographic Data Committee (FGDC) Content
Standard for Digital Geospatial Data (CSDGM) - Version 1 1994, Version 2 1998
- Mandated for use by federal agencies from 1995
- Widespread state govt. use, spotty local agency
use - Widespread tool availability from late 1990s
- 334 Elements Descriptive, technical,
administrative - Next generation standard
- ISO 19115 Geographic information - Metadata
- ISO 19139 XML schema implementation
- North American Profile of ISO 19115 as
implemented under 19139 near finalization - Industry and vendor profiles (ESRI, NBII, )
40Data/Metadata Workflow
- Data
- Orthophoto work contracted out to commercial
firms - Some vector data contracted out (notably parcels)
- Most other vector data produced in-house
- Early, middle, late, and late-late stage products
- Metadata
- Metadata published by producer, with NC Metadata
Outreach Program support - Metadata published to NC NSDI clearinghouse,
Geospatial One-Stop, and NC OneMap
41NC Local Government Metadata Availability
42Metadata Availability
43Preservation Metadata Issues
- FGDC Metadata
- Many flavors, incoming metadata needs processing
- Cross-walk elements to PREMIS, MODS?
- Metadata wrapper/Content packaging
- METS (Metadata Encoding and Transmission
Standard) vs. other industry solutions - Need a geospatial industry solution for the
METS-like problem - GeoDRM a likely triggerwrapper to enforce
licensing (MPEG 21 references in OGIS Web
Services 3)
44Geospatial Data Discovery
- National Spatial Data Infrastructure (NSDI)
Clearinghouse development from 1995 - From mid-1990s metasearch centered approach,
using geo profile of Z39.50 - Early-mid 2000s shift to harvest-based catalog
approach, development of Geospatial One-Stop
(GOS) - Harvest protocols supported Z39.50 (modified
profile), OAI-PMH, Web Accessible Folder (WAF) - Direct search/browse at producer or state
clearinghouse sites still prominent - Integration with Google Earth, etc.
- Metadata problems
- Absent or incomplete, asynchronous with the data
- Inconsistently structured (no encoding standard,
until 19139)
45Data Sources
International High Low Low (1500,00)
Federal
State
Local Low High High (124,000)
Coverage Area Accuracy Scale
46Choosing the Right Data
- What do you want to do with the data?
- mapping, search, analysis, geocode
- What specific geographic features will you need?
- major highways vs. detailed streets
- What is the geographic extent of your
- area of interest?
- local, regional, state, national, international
- What attributes of those features will you need?
- unique IDs, names, address ranges
47Additional Factors in Choosing Data
- Source - Fed, state, local, international, other
- Age - 1-2 years old vs. 3-7 vs. 8 or more
- Data accuracy and scale - positional and
Attribute - File size - How much free space do you have?
- Metadata availability
- File/Image Format
- Projection and Datum
- Use Restrictions
- How Soon?
Free, Fast, and Accurate Pick Two
48Geospatial Web Services
- Image services
- Deliver image resulting from query against
underlying data - Limited opportunity for analysis
- OGC Web Map Service Specification (WMS), from
2000 widely deployed - Feature services
- Stream actual feature data, greater opportunity
for data analysis - OGC Web Feature Service Specification (WMS), from
2002 not as widely deployed - Other
- OGC Web Coverage Services (raster)
- Geocoding services
- ArcXML, etc. commercial web service specs
49NC OneMap Cascading WMS Services
50NC OneMap State Govt. Vector Data
51NC OneMap State Govt. Ortho Images
52NC OneMap County and City Data
53NC OneMap Multi-County requests
54Concordance of layer naming, attribute naming,
classification, and symbolization come from
community development of best practices -- not
from the WMS spec itself
County Boundary
NC OneMap Multi-County requests
55WMS Services accessed through desktop GIS (ArcGIS)
56Services Metadata WMS Capabilities File
57New Mapping Environments
- Online Mapping APIs or Environments
- Google Maps
- Yahoo Maps
- MSN Virtual Earth
- OpenLayers
- More
- Desktop Client Systems
- Google Earth - KML
- NASA WorldWind
- More
- Also a multitude of systems that build on other
systems
58Changes in the Domain Mashups, Google Earth, Map
APIs, and More
- Huge new audience for geospatial content/services
- Massive crossover of mainstream IT to geospatial,
spurring open source activities - Rapid development of lightweight interoperability
specifications - Good enough approaches to data (formats,
quality, standards)
59Lightweight Spec Example GeoRSS
- Encode locations in RSS feeds
- Describe information in an interoperable manner
so that applications can request, aggregate,
share and map geographically tagged feeds. - GeoRSS Flavors
- GeoRSS Simple
- GeoRSS GML (GML Application Profile)
- W3C
- Micro
- Varied industry adoption
60Changes in the Domain New Information Ecosystem
of Static, Tiled Map Data
- Web mashup/AJAX interactions with existing
systems spur creation of intermediate content
layers e.g., tiling and caching of web map
services - Ongoing development of a tiling services spec
creates a new preservation opportunity
61Changes in the Domain More Place-based (versus
spatial) Data
Oblique Imagery
- Mobile, LBS, and, social networking applications
drive demand for place-based data - Long-term cultural heritage value in non-overhead
imagery more descriptive of place and function
Street View Images
Tax Dept. Photos
DOT Videologs
62Relevant Organizations
- International
- Open Geospatial Consortium (OGC) a non-profit,
international, voluntary consensus standards
organization that is leading the development of
standards for geospatial and location based
services. Coordinates with ISO. - National
- Federal Geographic Data Committee (FGDC) -
Coordinates the development of the National
Spatial Data Infrastructure (NSDI). Participates
in OGC, applies (profiles) OGC specs to U.S.
environment. - Open Source
- Open Geospatial Foundation (OSGEO) New, not a
standards organization (focus on open software)
but acts as a coordinator and incubator for
grassroots interoperability efforts.
63Preservation Points of Engagement with the Open
Geospatial Consortium (OGC)
- GML for archiving
- GeoDRM -- Adding preservation use cases
- Content Packaging -- Industry solution?
- Decision Support Systems supporting past views
of data - Content Transfer
- Persistent Identifiers
OGC Data Preservation Working Group formed in
Dec. 2006
64Spatial Metaphor for Repository Search
- Beginning to see map-based interfaces using map
APIs (Google Maps, Yahoo Maps) on top of
repository software such as Dspace - Gazetteer protocol work (UCSB, etc.) going back
several years - Text mining for place names (Metacarta, EDINA)
- Many other applications
65(No Transcript)
66(No Transcript)
67(No Transcript)
68Questions?
Contact Steve Morris Head of Digital Library
Initiatives NCSU Libraries Steven_Morris_at_ncsu.edu
Phone (919) 515-1361 http//www.lib.ncsu.edu/ncg
dap
69(No Transcript)