Title: Microdata Management Toolkit Tools to facilitate archive and dissemination of surveys
1Microdata Management ToolkitTools to facilitate
archive and dissemination of surveys
Session E2 - Thursday, 26 May Tools for
Preservation Integration and Assessment Preservin
g and improving the access to large and complex
household surveys
- A PDF for Data?
- Metadata Editor / Nesstar Publisher 3.5
- CD builder
- Guidelines for Archiving Dissemination
2Background
- Sponsored by World Bank / International Household
Survey Network - Presented earlier this week
- Created in September 2004
- International organizations actively sponsoring
household surveys - Marrakech Action Plan for Statistics
- http//www.surveynetwork.org
- Survey often under-used limited access for users
which leads to poor return on investment limited
impact on the ground, difficulties in policy
making - Common obstacles quality, technical capacity,
legal/political issues - Common problems
- Accessibility, Timeliness, Coherence
- Lack of metadata / documentation / data
- Poorly organized archives
- To address technical issues Need for new tools
and guidelines ? Microdata Management Toolkit
3Toolkit Requirements
- User friendly software suite and guidelines to
archive and disseminate microdata - Facilitate metadata exchange compliant with
common XML specifications (DDI, Dublin Core) - Facilitate archiving put together metadata and
data, address common quality control issues - Facilitate dissemination simple to redistribute
on cd/dvd and the web, answer producer/depositor
needs (subset, anonymization, quality control) - Works with common data formats (spss, sas, stata,
statistica, cspro/imps/issa) - Multilingual support
- Free or Inexpensive
- Availability of technical support and training
- Accompanied with guidelines and training program
- Supported by national, international and research
communities
4Core file format - A PDF for Data?
- How can we carry around the information?
- Looking at documents ? PDF
- Can we do the same for data?
- Yes, a Nesstar file holds data metadata!
- Partner with Nesstar Ltd to develop new tools
- Why strong tool for metadata management,
available today, community acceptance, technical
support, past experience - Development agreement
- Enhance existing publisher software and make
available as a stand alone product - Open binary file format (not a black box) and
availability of API - Free data reader (like pdf) that allows user to
access at the data and metadata and convert to
their favorite format - Special licensing agreement for developing
countries
5Toolkit Components
- Archiving Metadata Editor (World Bank / Nesstar
Ltd.) - To compile survey data, documentation and
metadata in a standard format (Nesstar/DDI). Free
data reader for users. - Built on Nesstar Publisher
- Dissemination CD Builder (World Bank / Mark
Diggory) - To facilitate the publication of survey data,
documentation and metadata on CD-ROM and on the
web (transforms DDI into HTML based navigation) - Based on Eclipse Platform, open source
- Guidelines Handbook (World Bank / ICPSR)
- To provide data producer with information on
policies and legal aspect of data dissemination,
guidelines to document datasets and
recommendations in setting up a data archive
6The Toolkit Process
1
Import data and compile metadata
3
Generate HTML based CD-ROM
2
Import metadata and prepare CD-ROM
7What is the Nesstar Publisher?
- Advanced data management program
- DDI /DC Metadata authoring tool
- Import/Export to common data formats
- Standalone or w/Nesstar server
- http//www.nesstar.com
- Easy editing/creation of DDI documented datasets.
No need to know XML. - Full DDI import and export for single
file/language studies. - Templates which lets your organization
standardize the use of the DDI. - Default texts in templates.
- Local controlled vocabularies.
- Possible to share the documentation work between
different persons. - A Category Repository which lets you share
categories within a dataset and between datasets.
- Variable groups.
- Easy setting of weights.
- Frequency and summary statistics output, with
options for each variable. - Import and export to the most common statistical
formats.
8What is the Metadata Editor?
- Nesstar Publisher 3.0
- A tool to prepare and publish surveys to a
Nesstar Server - Sold as a component of the Nesstar Software
Suite - Multiple components (editor, hierarchy, cube,
resources) - ? New Model for Version 3.5
- All components integrated under one interface
- A study is stored in a single Nesstar file
- Enhanced and new functionalities
- Quality control, computed variables, recodes,
anonymize, subset - Availability of a free Nesstar Data Reader
- Produce DDI / Dublin Core (DC) XML documents
- Available as a stand-alone software package
9Editor key features (1)
Template driven metadata editor allows for users
to decide which DDI/DC elements to use.
All surveys stored as projects in a single tree
hierarchy
10Editor key features (2)
Easy to use interface for document, survey, file
and variable metadata editing
11Editor key features (3)
Data import preserves existing dictionary and
generates summary statistics
DDI and Dublin Core Metadata import/export
Manage variable groups
12Editor key features (4)
Description of a dataset primary keys and
hierarchy
Support for survey documentation as Dublin Core
resources
and validation of dataset relationships
Automatic randomization of primary key variables
AND MORE
13Data Reader
- Free software
- PDF philosophy
- Access to survey metadata
- Access to data (no need for specialized software)
- Export to common formats
- Single file holds data and metadata
14What is the CD Builder?
- Purpose is to publish survey metadata, documents
and data on a CD-Rom (or web site) - Transforms DDI into an HTML based interface
- User can customize the layout (branding) and
content of the CD (single or multi-surveys) - Open source application
- Build on the Eclipse Framework
- Based on DDI / Dublin Core
- Integrates with Metadata Editor
- Easy to use
15CD Builder Process
1
Create new CD-ROM Project
- Selecting a survey consist in opening the
DDI-XML or Nesstar file - The survey branding determines the overall
look and feel of the CD - The survey type determines the default
metadata content
Add a survey to the project and select its type
and branding
2
Click the Save button to generate the HTML
interface
3
After a few minutes, your CD Project is ready for
publishing!
4
16Key Features
Content of CD pages is fully customizable
A CD-ROM project can hold several surveys
- Branding customization
- Can be published to web
- Multilingual support
- Automatic updates
- and more
17Sample output
18Handbook
- Handbook on the Documentation, Dissemination,
and Preservation of Microdata - Part I Policy, legal and ethical issues and
recommendations. Benefits and costs of microdata
dissemination - Part II Technical guidelines documenting,
disseminating and preserving a dataset - Part III Setting-up a central data archive
19Benefits and Users (1)
- What will the toolkit improve?
- Documentation (based on standards, guidelines and
validation) - Preservation data and metadata stay together, CD
archiving - Cataloguing facilitate metadata exchange
- Dissemination CD, DVD, Web
- Quality validation procedures, use of common
language, adoption of best practices
20Benefits and Users (2)
- Potential users?
- Survey producers at national level preservation,
dissemination, harmonize framework - International survey sponsors
- Data archives
- Who will benefit?
- Data producers
- National International survey sponsors
- Survey data repositories
- Data analysts
- Policy makers and population
- DDI Community
21Status Availability
- Publisher 3.5
- Beta version available
- Nesstar commercial release during the summer
- CD Builder
- Beta version available
- Public release expected in September (Open
Source) - Guidelines
- Draft completed
- Review over the summer
22Next?
- Distribution, training and adoption of the
toolkit - User acceptance tests and pilot sites
- Release of open source components (Sourceforge,
DDI) - Future developments
- Translations in other languages
- Plug-ins for Publisher and/or Reader (open
source) - Availability of API library
- Basic analytical functionalities (tabulation,
graphs, etc.) - Evaluation of disclosure risks / anonymization
procedures - Embed document in archive file (?)
- Plan for DDI 3.0 support
- Bug fixes / enhancements / new features (based on
user feedback) - And more based on feedback from users, DDI open
source community - Integration of other tools
- Argus confidentiality
- CSPro production
- Virtual Data Center (VDC) web based
dissemination - Strong collaboration and participation of the
community
23Thank you!