Title: The Data Documentation Initiative DDI: Metadata Standards to Support Access, Sharing and Preservatio
1The Data Documentation Initiative (DDI)Metadata
Standards to Support Access, Sharing and
Preservation of Data
- Wendy Thomas
- Minnesota Population Center
- TCRG Presentation
- 8 April 2009
2Metadata provides support for
- Survey and data collection preparation
- Data collection
- Data processing
- Analysis
- Data discovery and access
- Replication
- Repurposing (secondary data use)
3Metadata
- Metadata is essential information for research
and reuse of data - The further data gets from its source, the
greater the importance of the metadata - Content is critical
- Structure is becoming increasingly important in a
networked world
4Why Standards?
- Standards provide structure for
- Accurate transfer of content between systems
- Increased automation of ingest, reducing costs
- Interoperability between systems and software
- Structural base for discovery and comparison
5Example Dublin Core
- Print card catalogs
- Standalone databases
- WorldCat and Google
- Static
- stationary
- Proprietary structure
- Little cross-site searching
- Standardized content
- Cross-site searching
6Interacting Standards for Data
- Dublin Core
- ISO/IEO 11179
- ISO 19118 Geography
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
- Citation structure
- Coverage
- Temporal
- Topical
- Spatial
- Location specific information
7Interacting Standards for Data
- Dublin Core
- ISO/IEO 11179
- ISO 19118 Geography
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
- Structure and content of a data element as the
building block of information - Supports registry functions
- Provides
- Object
- Property
- Representation
8Interacting Standards for Data
- Dublin Core
- ISO/IEO 11179
- ISO 19118 Geography
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
- US FGDC and MN standard
- Focus is on describing spatial objects and their
attributes
9Interacting Standards for Data
- Dublin Core
- ISO/IEO 11179
- ISO 19118 Geography
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
- Proprietary standards
- Content is generally limited to
- Variable name
- Variable label
- Data type and structure
- Category labels
- Translation tools used to transport content
10Interacting Standards for Data
- Dublin Core
- ISO/IEO 11179
- ISO 19118 Geography
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
- Digital Library Federation
- Consistent outer wrapper for digital objects of
all type - Contains a profile providing the structural
information for the contained object
11Interacting Standards for Data
- Dublin Core
- ISO/IEO 11179
- ISO 19118 Geography
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
- Preservation information for digital objects
12Interacting Standards for Data
- Dublin Core
- ISO/IEO 11179
- ISO 19118 Geography
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
- Developed for statistical tables
- Supports well structured, well defined data,
particularly time-series data - Contains both metadata and data
- Supports transfer of data between systems
13Interacting Standards for Data
- Dublin Core
- ISO/IEO 11179
- ISO 19118 Geography
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
- Version 3.0 covers life-cycle of data and
metadata - Data collection
- Processing
- Management
- Reuse or repurposing
- Support for registries
- Grouping Comparison
14Metadata Coverage
- Packaging
- Citation
- Geographic Coverage
- Temporal Coverage
- Topical Coverage
- Structure information
- Physical storage description
- Variable (name, label, categories, format)
- Source information
- Methodology
- Detailed description of data
- Processing
- Relationships
- Life-cycle events
- Management information
- Dublin Core
- ISO/IEO 11179
- ISO 19118
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
15Metadata Coverage
- Packaging
- Citation
- Geographic Coverage
- Temporal Coverage
- Topical Coverage
- Structure information
- Physical storage description
- Variable (name, label, categories, format)
- Source information
- Methodology
- Detailed description of data
- Processing
- Relationships
- Life-cycle events
- Management information
- Dublin Core
- ISO/IEO 11179
- ISO 19118
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
16Metadata Coverage
- Packaging
- Citation
- Geographic Coverage
- Temporal Coverage
- Topical Coverage
- Structure information
- Physical storage description
- Variable (name, label, categories, format)
- Source information
- Methodology
- Detailed description of data
- Processing
- Relationships
- Life-cycle events
- Management information
- Dublin Core
- ISO/IEO 11179
- ISO 19118
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
17Metadata Coverage
- Packaging
- Citation
- Geographic Coverage
- Temporal Coverage
- Topical Coverage
- Structure information
- Physical storage description
- Variable (name, label, categories, format)
- Source information
- Methodology
- Detailed description of data
- Processing
- Relationships
- Life-cycle events
- Management information
- Dublin Core
- ISO/IEO 11179
- ISO 19118
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
18Metadata Coverage
- Packaging
- Citation
- Geographic Coverage
- Temporal Coverage
- Topical Coverage
- Structure information
- Physical storage description
- Variable (name, label, categories, format)
- Source information
- Methodology
- Detailed description of data
- Processing
- Relationships
- Life-cycle events
- Management information
- Dublin Core
- ISO/IEO 11179
- ISO 19118
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
19Metadata Coverage
- Packaging
- Citation
- Geographic Coverage
- Temporal Coverage
- Topical Coverage
- Structure information
- Physical storage description
- Variable (name, label, categories, format)
- Source information
- Methodology
- Detailed description of data
- Processing
- Relationships
- Life-cycle events
- Management information
- Dublin Core
- ISO/IEO 11179
- ISO 19118
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
20Metadata Coverage
- Packaging
- Citation
- Geographic Coverage
- Temporal Coverage
- Topical Coverage
- Structure information
- Physical storage description
- Variable (name, label, categories, format)
- Source information
- Methodology
- Detailed description of data
- Processing
- Relationships
- Life-cycle events
- Management information
- Dublin Core
- ISO/IEO 11179
- ISO 19118
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
21Metadata Coverage
- Packaging
- Citation
- Geographic Coverage
- Temporal Coverage
- Topical Coverage
- Structure information
- Physical storage description
- Variable (name, label, categories, format)
- Source information
- Methodology
- Detailed description of data
- Processing
- Relationships
- Life-cycle events
- Management information
- Dublin Core
- ISO/IEO 11179
- ISO 19118
- Statistical Packages
- METS
- PREMIS
- SDMX
- DDI
22DDI Full content coverage for survey and
administrative data
- Conceptual coverage
- Methodology
- Data Collection
- Processing cleaning, paradata
- Recoding and derivations
- Variable and tabular content
- Internal relationships
- Physical storage
- Data management
23Plus Relationships between studies
- Comparison by design
- Study series can inherit from earlier metadata
- Capture changes only
- Data integration
- Mapping of codes between source and target
- Capture comparison information
- Comparison of abstract content models
- Publication of reusable materials (code schemes,
concept schemes, geographic structure, etc.)
24Why bother?
- Researcher perspective
- Improved data mining between and across systems
- Increased explicit implicit comparison
- Interoperability
- Improved access to detailed metadata
- Ability to reuse rather than repeat metadata
content
25Why bother?
- Data Collector
- Support for internal consistency
- Early capture of a broad range of metadata
- Interoperability
- Reuse of metadata inheritance
- Retention of explicit relationships between data
collection and the resulting data files
26Why bother?
- Knowledge-based organization
- Interoperability
- Supports consistent use concepts, questions,
variables, etc. throughout organization - Supports implicit comparison through reuse of
content - Supports explicit comparison by mapping content
between studies and to standard content - Captures metadata/knowledge at point of creation
27Why bother?
- Data Manager
- Interoperability
- Flexibility in data storage
- Reuse of content
- Strong data typing
28DDI does not replace good content
- DDI structures metadata to leverage content
- Collection and processing
- Discovery and access
- Analysis and repurposing
- Registries
- Comparison
- DDI is not a software application
- Supports and informs software applications
- DDI is a neutral archival structure
- Preserving content and relationships
29INDEPTH/DSS Example
- 38 Demographic Surveillance Sites in 19 countries
spanning Africa, South Asia, Central American and
Oceania - Diverse yet similar health research portfolios
- Data management goals
- Standardize and harmonize data collection tools
- Cross-site comparability of information
- Sharing data effectively and efficiently
30Reasons for choosing DDI
- It will be ideal to describe our data for the
purposes of the Data Repository - It has really powerful features that will enable
us to standardise several facets of our work. - I originally underestimated the usefulness DDI
will have as a means to harmonised data
collection between sites. - Ability to expand comparison and harmonization
with additional groups such as AIDS research team
31Future DDI Developments
- Controlled vocabularies to improve machine
actionability - Data collection methodology and process expansion
for more depth and detail - Qualitative data
- Increased comparison coverage
- Tools
32Contacts
- DDI Alliance
- http// www.ddialliance.org
- Link to DDI Technical Specification
- http//www.ddialliance.org/ddi3/index.html
- DDI Users Group Sign-up
- http//www.ddialliance.org/DDI/codebook/listserv.h
tml - Wendy Thomas, Chair, DDI Technical Implementation
Committee - wlt_at_pop.umn.edu