Title: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris North Carolina State University Libraries
1Collection and Preservation of At-Risk Digital
Geospatial DataNDIIPP Project Updateon the NC
Geospatial Data Archiving Project
(NCGDAP)Steven P. MorrisNorth Carolina State
University Libraries
DLF Fall Forum NDIIPP Roundtable
November 8, 2006
2NC Geospatial Data Archiving Project
- Partnership between university library (NCSU) and
state agency (NCCGIA) - Focus on state and local geospatial data in North
Carolina (state demonstration) - Tied to NC OneMap initiative, which provides for
seamless access to data, metadata, and
inventories - Objective engage existing state/federal
geospatial data infrastructures in preservation - Project approaches Technical and Social
- 520,000 over 3 years
Serve as catalyst for discussion within industry
3Risks to State/Local Geospatial Data
- Producer focus on current data
- Data overwrite as common practice
- Future support of data formats in question
- No open, supported format for vector data
- Shift to web services-based access
- Data becoming more ephemeral
- Inadequate or nonexistent metadata
- Impedes discovery and use
- Increasing use of spatial databases for data
management - The whole is greater than the sum of the parts
4Different Ways to Approach Preservation
- Technical solutions How do we archive acquired
content over the long term? - Build a data repository not as an end in itself
but as a catalyst for discussion within the data
community - Develop a repository ingest workflow create
technical points of engagement with the digital
preservation community
5Different Ways to Approach Preservation
- Cultural/Organizational solutions How do we make
the data more preservableand more prone to be
archivedfrom point of production? - Engage data producer community and spatial data
infrastructure through outreach and engagement
influence practice - Sell the problem to software vendors and
standards development - Find overlap with more compelling business
problems disaster preparedness, business
continuity, road building, etc. - Start a discussion about roles at the local,
state, and federal level
6NCGDAP Technical Approach
- Receive data as is variety of distribution
methods - Migration of some at-risk formats
- Metadata remediation, normalization, and
synchronization - Distilling complex objects into repository ingest
items (not easy) - Using DSpace for demonstration purposes (keeping
repository platform at arms length) - In the development use METS record as dormant
item brain within the repository
Some unsustainable activities for learning
experience
7Building Data Bundles The Zip Codes Example
8Where is the Dataset?
9Heres One!
- Files
- Multi-file dataset
- Georeferencing
- Metadata file
- Symbolization file
- Additional
- documentation
- License
- Disclaimer
- More
- Metadata
- FGDC
- Acquisition metadata
- Transfer metadata
- Ingest metadata
- Archive rights
- Archive processes
- Collection metadata
- Series metadata
10Hub-and-Spoke Metadata Workflow
11Hub-and-Spoke Metadata Workflow
12Metadata Going Beyond a Passive Role
- Feedback to the NC OneMap Metadata Outreach
Program vis-à-vis metadata quality problems
encountered in repository ingest - Engage standards body (Open Geospatial Consortium
-- OGC) in discussions about - content packaging standards for geospatial
- better practices for time-versioned data
- persistent identifier schemes
- contributing archive use cases to GeoDRM
- Meetings with major software vendor development
teams
13Social Issues Changing Industry Thinking
- Is the geospatial industry temporally-impaired?
- Lack of access to older data
- Lack for tool/model support for temporal analysis
- Metadata poor support for changing data
- Education building class projects around
available data (i.e., not temporal) - Increased interest now in temporal applications?
- Increased demand for temporal data?
- Improved tool support ArcGIS 9.2 animation
tools Geodatabase History, etc.
IMPORTANT Gathering business cases for using
older data
14Social Issues Content Exchange Networks
- Solving the present-day problems of data sharing
is a pre-requisite to solving the problem of
long-term access - Leveraging more compelling business problems
disaster preparedness and business continuity
needs can put the data in motion (siphon off to
the archive) - Geospatial data large data volumes, frequent
data update, complex datasets, ambiguous rights - Content exchange network technical challenges
- Rights management
- Large-scale transfers on network
- Content packaging (MPEG 21 DIDL, XFDU, METS, )
15Content Issues Frequency of Capture Survey
- Survey objective
- Document current practices for obtaining archival
snapshots of county/municipal geospatial vector
data layers - Seek guidance about frequency of capture
- Survey topics
- General questions about data archiving practice
- Specific questions about parcels, street
centerlines, jurisdictional boundaries, and
zoning - Survey subjects
- All 100 counties and 25 municipalities -- 58
response rate - Survey conducted September 2006
Added benefit Survey socialized the preservation
issue
16Project Status
Content Issues What About Commercial Data?
Cultivating a commercial market for older data.
Part of permanent access is marketing,
advertising, and putting older data into the path
of the user
17New ChallengesPlatial vs. Spatial Imagery
- Mobile, LBS and, social networking applications
drive demand for placed-based data - Example sources
- Oblique Imagery
- Street-view Imagery (e.g., A9.com)
- Transportation Dept. Videologs
- Long-term cultural heritage value in non-overhead
imagery more descriptive of place and function
Emerging Tricorder applications
18New Challenges Ajax Applications, Google Earth
and All That
- Emerging online environments are increasingly
used to make decisions, how are these decisions
documented? - Web mashup/AJAX interactions with existing
systems spur creation of intermediate content
layers e.g., tiling and caching of WMS services - Formulation of a standard tiling scheme may
create a new preservation opportunity (temporal
axis on caches?)
19Working with the NDIIPP Network
- Partners meetings provide opportunities to
cross-fertilize with other efforts - Cross-fertilization examples
- Maturing thinking about metadata transformations
(inspiration from the UIUC/OCLC hub and spoke
model) - Stanford work with METS/PREMIS/FGDC informs
NCGDAP metadata strategy - NDIIPP-wide discussions about mutual use of tools
in ingest workflow (JHOVE, ClamAV, noid, MD5,
etc.) - Discussions about repository exchange issues
- Affiliation with a national preservation effort
helps get traction in attracting additional
partners
20Working with New Partners
- State Archives now an informal member of the
NCGDAP project - Collaboration with NARA
- Working with the Open Geospatial Consortium on
standards issues - Associate Partnership with JISC-funded UK-wide
project - Site visits with ESRI (major software vendor)
development groups - Participation in a variety of content exchange
network activities - More
21Next Steps
- Working with NARA and the OGC Interoperability
Institute to develop an OGC Data Preservation
Working Group charter - Evaluating results for the frequency of capture
survey - Stepping up data acquisition and repository
ingest - Evaluating initial data acquisition efforts (time
factors, content variety, technical/legal
barriers) - Partnership with content exchange network
activities - Ramping up partnerships with broader
(non-geospatial) data repository efforts
22Questions?
Contact Steve Morris Head, Digital Library
Initiatives NCSU Libraries ph (919)
515-1361 Steven_Morris_at_ncsu.edu http//www.lib.nc
su.edu/ncgdap