Title: North Carolina Geospatial Data Archiving ProjectNDIIPP: Collection and preservation of atrisk digita
1North Carolina Geospatial Data Archiving
Project/NDIIPP Collection and preservation of
at-risk digital geospatial data PartnersNCSU
Libraries Project Lead Steve MorrisNC Center
for Geographic Information Analysis Project
Lead Zsolt Nagy
NSDI Partnership Community Meeting
March 1, 2006
2Outline
- Risks to Digital Geospatial Data
- Overview of NC Geospatial Data Archiving Project
and NDIIPP - Preservation Challenges and Possible Solutions
- Points of Engagement with Spatial Data
Infrastructure and Industry
3Risks to Digital Geospatial Data
.shp
.mif
.gml
.e00
.dwg
.dgn
.bsb
.bil
.sid
4Risks to Digital Geospatial Data
- Producer focus on current data
- Archiving data does not guarantee permanent
access - Future support of data formats in question
- Need to migrate formats or allow for emulation
- Data failure
- Bit rot, media failure
- Preservation metadata requirements
- Descriptive, administrative, technical, DRM
- Shift to streaming data for access
5Time series vector data Parcel Boundary Changes
2001-2004, North Raleigh, NC
Temporal data to support business needs in
Real estate analysis Land use change
analysis Economic planning
6Time series Ortho imagery Vicinity of
Raleigh-Durham International Airport 1993-2002
Even static orthophotos are at risk.
7Todays geospatial data as tomorrows cultural
heritage
Future uses of data are difficult to anticipate
(as with Sanborn Maps).
8NC Geospatial Data Archiving Project
- Partnership between university library (NCSU) and
state agency (NCCGIA), with Library of Congress
under the National Digital Information
Infrastructure and Preservation Program (NDIIPP) - One of 8 initial NDIIPP partnerships (only state
project) - Focus on state and local geospatial content in
North Carolina (state demonstration) - Tied to NC OneMap initiative, which provides for
seamless access to data, metadata, and
inventories - Objective engage existing state/federal
geospatial data infrastructures in preservation
9Targeted Content
- Resource Types
- GIS data (vector, etc.)
- Digital orthophotography
- Digital maps
- Tabular data (e.g. assessment data)
- Content Producers
- Mostly state, local, regional agencies
- Some university, not-for-profit, commercial
- Selected local federal projects
10Work plan in a Nutshell
- Work from existing data inventories
- NC OneMap Data Sharing Agreements as the
blanket, individual agreements as the quilt - Partnership work with existing geospatial data
infrastructures (state and federal) - Technical approach
- Metadata FGDC, METS, PREMIS?, GeoDRM?
- Repository-independent Dspace initially
- Web services consumption for archival development
(in future?)
11NCGDAP Philosophy of Engagement
Provide feedback to producer organizations/ inform
state geospatial infrastructure
Take the data as is, in the manner in which it
can be obtained
Wrangle and archive data
Note the Project in North Carolina Geospatial
Data Archiving Project the process, the
learning experience, and the engagement with
industry and infrastructure are more important
than the archive
What is the long term solution?
12Big Technical Challenges
- Format migration paths
- Management of data versions over time
- Preservation metadata
- Harnessing geospatial web services
- Preserving cartographic representation
- Keeping content repository-agnostic
- Preserving geodatabases
- More
13Vector Data Format Issues
- Vector data much more complicated than image data
- Archiving vs. Permanent access
- An open pile of XML might make an archive, but
if using it requires a team of programmers to do
digital archaeology then it does not provide
permanent access - Piles of XML need to be widely understood piles
- GML need widely accepted application schemas
(like OSMM?) - The Geodatabase conundrum
- Export feature classes, and lose topology,
annotation, relationships, etc. - or use the Geodatabase as the primary archival
platform (some are now thinking this way)
14Managing Time-versioned Content
Continuously updated data Frequency of
snapshots? Different for various framework
layers?
15Metadata Availability Limited at Local Level
February 2005
16Harnessing Geospatial Web Services
Image atlases from WMS services? Capturing
cartographic representation? Recording records
from decisions-making processes? Later data
transfer via WFS GML?, Other?
17Web mash-ups and the New Mainstream Geospatial
Web Services
How does temporal data fit into emerging WMS
caching and tiling schemes? Capture of tiles and
caches for archive?
18Preserving Cartographic Representation
Counterpart to the map is not just the dataset
but also models, symbolization, classification,
annotation, etc.
19Needed Efficient Content Replication
- Content replication also needed for
- Disaster preparedness
- State and federal data improvement projects
- Aggregation by regional geospatial web service
providers - WFS, e.g. efficiency in complete content
transfer? - Need rsync-like function, informed by rights
management, inventory processes, metadata
management, data update cycles - Archiving delta files vs. complete replication
need to avoid requiring digital archaeology in
the future
20Points of Engagement with the Open Geospatial
Consortium (OGC)
- GML for archiving (PDF/A version of GML?)
- GeoDRM
- Adding preservation use cases
- Content Packaging
- Will there be an industry solution?
- Web Map Context Documents
- Can we save data state as well as application
state? - Content Replication
- Is this a layer in the overall architecture?
- Persistent Identifiers
21Points of Engagement with Spatial Data
Infrastructure
- Framework data communities
- Snapshot frequency, naming schemes,
classification, GML application schemas, format
strategies - Metadata standards and outreach
- Persistent identifiers, versioning, feedback on
metadata quality - Content replication/transfer
- For data improvement projects, disaster
preparedness, aggregation by regional service
providers, and archives - Where does archiving and preservation fit into
the NSDI, GOS, etc?
22Points of Engagement with Industry
- Software vendors
- Better support for temporal data management
- Tools for retrospective data conversion
- Web mashup and open source communities
- WMS caching schemes
- Standard tiling schemes with temporal component?
- Data vendors
- Cultivate market for older data (scaled pricing?)
- Tech transfer on archiving practices?
23Project Status
Cultivating a market for older data.
24Project Status
Cultivating tools for retrospective conversion.
25Expected Project Outcomes
- Demonstration archive
- Outreach activity planting seeds
- International, national, state, local, commercial
- Learning experience, informing
- Spatial data infrastructure
- Commercial vendors (data/software/consulting)
- Repository software communities
- Metadata practice (both GIS preservation)
- Rights management developments
- Data and interoperability standards
26Project Status
- Storage system and backup deployed
- DSpace deployed
- FGDC Metadata workflow finalized
- Ingest workflow near finalization
- Content migration workflow plan near finalization
- Regional site visits planned for coming months
- Wide range of outreach/collaboration FGDC, ESRI,
EDINA (JISC), USGS, OGC, TRB, etc. - Pilot project, georegistering digital archival
geologic maps
27Questions?
Contact Steve Morris Head, Digital Library
Initiatives NCSU Libraries Steven_Morris_at_ncsu.edu
Web site http//www.lib.ncsu.edu/ncgdap/