Title: Putting DDI 3'0 to Work for You
1Putting DDI 3.0 to Work for You!
- Sanda Ionescu,
- Documentation Specialist, ICPSR
- Mary Vardigan,
- DDI Alliance Director
- IASSIST Conference Stanford UniversityMay 27,
2008
2Todays Schedule
- 900 915 Brief DDI History and Intro
- 915 930 Life Cycle Early Stages
- 930 1045 Life Cycle Exercise
- 1045 1100 Break
- 1100 1150 Life Cycle Archive Beyond
- 1150 1200 Questions and Answers
3First Half of Morning
- We will be moving through the data life cycle of
a real study and will document it as we go. - We will use a tool to produce markup for seven
life cycle stages. - Sanda will guide us through the exercise and Mary
will go step by step onscreen. - End result is DDI documentation deposited into an
archive.
4Second Half
- Once our sample data and documentation are
deposited, we review the changes made by the
archive. - Then we discuss DDI 3.0 in the archival context
and why it makes sense to use it. - Finally, assuming we have convinced you, we
discuss how to move to DDI 3.0!
5DDI History
- Effort began in 1995 when ICPSR convened a small
international group at IASSIST in Quebec City. - Standard began as SGML, then converted to
Web-friendly XML. - 2000 DDI Version 1.0 published as a DTD, mainly
document- and codebook-centric.
6DDI History
- 2003 DDI Version 2.0 published with extended
scope including aggregate data coverage and
geography. - Versions 1.0 through 2.1 (latest published) are
backwards compatible, and based on the same
structure.
7DDI History
- February 2003 Formation of the DDI Alliance, a
self-sustaining membership organization whose
members have a voice in the development of the
DDI specification. - http//www.ddialliance.org/
8DDI History
- Version 3.0
- 2004-2006 Planning and Development
- November 2006 Internal Review
- February 2007 Public Review
- July 2007 Candidate Draft Release
- April 2008 Proof of Concept and Vote
- April 28, 2008 Official Publication of DDI 3.0
- http//www.ddialliance.org/ddi3/index.html
9DDI 3.0 Features
- Full implementation of XML Schemas
- Emphasis on metadata reuse
- Modular structure
- Use of schemes
10DDI 3.0 FeaturesModular structure
- Allows increased flexibility in using the
specification. - Main modules
Instance
Study Unit
Resource Package
Group
Conceptual Components
Data Collection
Logical Product
Physical Instance
Physical Data Product
Comparative
Archive
11DDI 3.0 FeaturesUse of Schemes
- Facilitates reuse of information
- Categories
- Codes
- NCubes
- Physical Structures
- Record Layouts
- Organizations
- Concepts
- Universes
- Geographic Locations
- Geographic Structures
- Questions
- Interviewer Instructions
- Variables
12DDI 3.0 Features
- Machine-actionable
- Grouping and comparison features
- Registries now possible
- Versioning clarified
- Multi-lingual support
13DDI 3.0 Features
- Compatibility with other metadata standards
- MARC, DC, but also
- SDMX (Statistical Data and Metadata Exchange)
- ISO 11179 (Metadata Registries)
- FGDC (Digital Geospatial Metadata)
- ISO 19115 (Geographic Information Metadata)
- PREMIS, METS forthcoming
- Life cycle orientation
14Life Cycle Orientation
- DDI 3.0 documents all stages in the life cycle of
a data collection - pre-production production
post-production secondary use
new research effort
15DDI 3.0 Use Cases
- Documenting an on-going, original research
project. - Documenting secondary use of data.
- Creating concept/question/variable banks.
- Generating multiple delivery formats for data
dissemination/discovery. - Metadata mining for comparison, etc.
16DDI 3.0 to Document an On-going Research Project
- DDI 3.0 can be used to document a research
- project in real time, from its inception
(study - proposal, design) through data collection,
processing, - and initial data production.
17Research Staff
Principal Investigator
Collaborators
ltDDI 3.0gt Questions Instrument
ltDDI 3.0gt Variables Physical Stores
ltDDI 3.0gt Purpose Concepts Universe Geography Peop
le/Orgs
ltDDI 3.0gt Funding Revisions
ltDDI 3.0gt Data Collection Data Processing
Data
Archive/ Repository
Submitted Proposal
Publication
18DDI 3.0 to Document an On-going Research Project
- Advantages
- Richer, contextual information made available and
preserved. - Increased accuracy, as life cycle stages are
documented at the source. - No loss of information as study progresses
through its life cycle. - Changes in documentation preserved through
versioning. - Ultimately gives data analysts more information
to understand and assess data quality.
19DDI 3.0 to Document an On-going Research Project
- Use case exercise
- Academic environment.
- Faculty member/researcher initiates an original,
independent research project. - Small-scale effort.
- No use of computer-assisted interviewing
software. - Resulting data and documentation to be deposited
to a data center/archive. - Archive provides incentives and support for
documenting all activities in DDI as they happen.
20DDI 3.0 to Document an On-going Research Project
- Incentives for entering documentation at the
source - Information easy to enter use of data entry tool
hides complexities of xml code. - Underlying DDI structure provides prompts and
pre-organizes information. - DDI may also serve as a management/diagnostic
tool to assist in data processing and cleaning
operations, or revising the documentation. - Real-time entries and standardized content ensure
high-quality documentation that facilitates
primary data analysis and preparing reports.
21DDI 3.0 to Document an On-going Research Project
- Use case exercise
- Based on a real study in the ICPSR archive
(ICPSR study No. 9413, Survey of Three
Generations of Mexican Americans, 1981-1982) - Study documentation is laid out sequentially
according to the life cycle. - http//www.icpsr.umich.edu/DDI/ddi3/workshop
- Data entry tool provides a user-friendly
interface and is projected to produce DDI 3.0
output follows life cycle, but may also be used
retrospectively. -
22Life Cycle StagesStudy Proposal
WHO? (Principal Investigator)
When?
(November 1st, 1979)
WHO? (Co-authors)
Research Question(s) Hypotheses Population
Geographic Area Provisional Title
23Life Cycle StagesStudy Proposal Input
http//www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_
TransformerTOOL/DDIv2dot2.html
24Life Cycle StagesStudy Proposal DDI 3.0 Output
DDI
WHO? (Principal Investigator)
Archive Individual
Life Cycle Event Responsibility Date
When?
WHO? (Co-authors)
Study Unit Creator (s) Title Purpose Universe
Ref. Spatial Coverage
(Provisional Title) Research Question(s) Hypothese
s Population Geographic Area
Conceptual Component Universe Geographic
Structure
http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/iassist_stdyprop.pdf
25Life Cycle StagesStudy Funding
WHO? Funding Agency
WHEN?
(June 1st, 1980)
Proposal
Grant 5-R01-AG-01573
26Life Cycle StagesStudy Funding Input
http//www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_
TransformerTOOL/DDIv2dot2.html
27Life Cycle StagesStudy Funding DDI 3.0 Output
DDI
Archive Organization
WHO? Funding Agency
Study Unit Funding Agency Grant Number
Life Cycle EventResponsibility Date
Proposal
http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/iassist_stdyfunding.pdf
28Life Cycle StagesDefining Concepts
WHO?
WHEN?
(July 1st, 1980)
Question/Concept Bank
Research Questions
()
Study Concepts
29Life Cycle StagesDefining Concepts Input
http//www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_
TransformerTOOL/DDIv2dot2.html
30Life Cycle StagesDefining Concepts DDI 3.0
Output
DDI
Life Cycle Event Responsibility, Date
DDI Concept Scheme
(Ref.)
Question/Concept Bank
Research Questions
()
Study Concepts
http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/iassist_concepts.pdf
31Life Cycle StagesQuestionnaire Design
WHO?
WHEN?
(July 25, 1980)
Question/Concept Bank
Study Concepts
()
Questions, Responses
32Life Cycle StagesQuestionnaire Design Input
http//www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_
TransformerTOOL/DDIv2dot2.html
33Life Cycle StagesQuestionnaire Design DDI 3.0
Output
DDI
Life Cycle Event Responsibility, Date
DDI Question Scheme
(Ref.)
Question/Concept Bank
Study Concepts
()
Logical Product Category Scheme(s) Code Schemes
Questions, Responses
http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/iasssist_questions.pdf
34Life Cycle StagesQuestionnaire Translation
WHO?
WHEN?
(September 1st, 1980)
Original Language Questions, Responses
Translated Questions, Responses
35Life Cycle StagesQuestionnaire Translation Input
http//www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_
TransformerTOOL/DDIv2dot2.html
36Life Cycle StagesQuestionnaire Translation DDI
3.0 Output
DDI
Life Cycle Event Responsibility, Date
DDI Question Scheme -Bilingual Version-
Original Language Questions, Responses
Logical Product Category Scheme(s) -Bilingual
Version-
Translated Questions, Responses
http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/iassist_transl_qstns.pdf
37Life Cycle StagesData Collection
WHO?
WHO?
(1981-1982)
REPORT
SAMPLE
(October 15, 1980 April 1st, 1981)
38Life Cycle StagesData Collection Input
http//www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_
TransformerTOOL/DDIv2dot2.html
39Life Cycle StagesData Collection DDI 3.0 Output
DDI
Life Cycle Events Responsibility, Dates
Data Collection Responsibility Date Sampling Mode
Of Collection Note
http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/iassist_datacoll.pdf
40Life Cycle StagesData Production
WHO?
WHEN?
(1983)
QA
DATA
41Life Cycle StagesData Production Input
http//www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_
TransformerTOOL/DDIv2dot2.html
42Life Cycle StagesData Production DDI 3.0 Output
DDI
Life Cycle Event Responsibility, Date
Data Collection (Processing Operations)
Logical Product Variable Scheme Additional
Code/Category Schemes Missing Data
Physical Data Product Record Structure Variables
Locations
QA
Physical Instance (Processing Checks) Number of
Cases Number of Records
DATA
http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/iassist_dataprod.pdf
43 44Life Cycle Stages Data Cleaning and Processing
DDI as diagnostic/management tool
- The presence of standardized documentation
facilitates data processing. - DDI documentation can be used as a project
dashboard to identify problems and keep track
of operations. - Queries can address
- Data errors missing values, out-of-range values
(incorrect computation or recode logic),
inconsistent or undocumented codes - Missing documentation question text, description
- Editing errors missing labels, misspelled
variable names
45Life Cycle StagesDeposit to Archive
- At the time of deposit, both the research process
and the data are already documented in DDI - Advantages
- The presence of standardized information
facilitates archival processing, enabling
procedure streamlining and automation. - Richer, more accurate information made available
for preservation, archival processing and
dissemination enhances data discovery and
secondary analysis.
46Life Cycle StagesDeposit to Archive
- Richer, more accurate information. Examples
- Original / working title preserved (may be found
in early reports, published prior to any title
changes). - Authors affiliation and position at the time of
research. - Responsible agencies and dates made available for
all life cycle events. - Parallel / associated research efforts and
publications accurately documented.
47Life Cycle StagesDeposit to Archive
- Richer, more accurate information. Examples
- Presence of concepts represents an important
added value for data discovery, appraisal, and
further analysis. - Documented source of concepts and questions
(original or re-used) is relevant for secondary,
and particularly comparative analysis efforts. - For bi- or multilingual studies, multiple
language versions of descriptive elements are
made available side-by-side, facilitating
comparison, analysis and/or filtered specific
language(s) retrieval. - http//www.icpsr.umich.edu/cocoon/DDI3/worksh
op/9413_CR3_2_DataProd.xml?displayvarshighlight-
tokenno
48Life Cycle StagesDeposit to Archive
- Use of DDI throughout the study life cycle
prevents loss of information. - Preservation of successive versions allows
early-bound information retrieval. - To meet specific goals and needs, the archive may
create its own version(s) of the documentation,
but will also preserve the originally deposited
version. - The DDI format enables easy, automated navigation
among all existing versions. -
49Life Cycle StagesArchival Processing Data and
Documentation
- The archive becomes the maintaining agency and
creates its own instance - The archive is described as organization, as
owner/maintainer of collection, and specified as
(new) publisher and/or distributor, with
appropriate date(s). - Original archive (depositor to present archive)
referenced in the archive module. - Reference may also be included to originally
deposited DDI that is preserved and also made
accessible.
50Life Cycle StagesArchival Processing Data and
Documentation
- The archive edits or adds information and
populates new DDI fields to support archival
operations - Edits title to conform to archives standards
(ICPSR adds study date) - Updates authors affiliation according to current
position, and adds/updates contact information
(telephone, e-mail, current address, etc.) - Adds subject headings and keywords to assist data
discovery (searches at study level)
51Life Cycle StagesArchival Processing Data and
Documentation
- The archive edits or adds information
- Adds study abstract, integrating purpose with
description of data collection and the final data
product. - Adds structured methodological information,
enabling more granular, targeted searches (e.g.,
temporal coverage, analysis unit(s) covered, kind
of data, data source).
http//www.icpsr.umich.edu/cocoon/DDI3/workshop/94
13_CR3_2_ARCHIVE.xml?highlight-tokenyes
52Life Cycle StagesArchival Processing Data and
Documentation
- The archive documents any in-house,
post-production processing as well as resulting
changes in the data - New data file identification, to reflect archive
location. - Description of processing checks performed by
archive. - Description of added variables (archive-specific,
indexes, recodes, etc.) if appropriate. - Variable- and category-level statistics may be
calculated and added to the DDI documentation to
enhance variables description.
53Life Cycle StagesArchival Processing Data and
Documentation
- The archive adds an itemized description of the
entire distribution package associated with a
study, including archival-specific information
like availability, access conditions/restrictions,
and collection completeness, as well as
item-level identification, URI, format, medium,
etc.
http//www.icpsr.umich.edu/cocoon/DDI3/workshop/94
13_CR3_2_ARCHIVE.xml?highlight-tokenyes
54Integrating DDI 3 into Archives
- What is in it for us?
- Standardized study descriptions provide for
integration and consistency between collection
catalog and documentation products. - Standardized documentation supports automated
generation of multiple delivery formats,
including PDF and HTML.
55Integrating DDI 3 into Archives
- What is in it for us?
- DDI 3 enables the creation of an expanded
scientific record covering the full life cycle,
including instrument documentation. - DDI 3 supports streamlining and increased
automation of archival operations. - DDI 3 instances can carry data inline.
- DDI 3 has improved functionality for
complex/hierarchical files.
56Integrating DDI 3 into Archives
Improved functionality for complex/hierarchical
files. Example
https//www.icpsr.umich.edu/DDI/ddi3/workshop/
57Integrating DDI 3 into Archives
- What is in it for us?
- DDI 3 facilitates grouping and comparison from
the highest level to the lowest - Mechanism to organize series information, showing
only what changes over time. - Variable harmonization and comparison.
58Integrating DDI 3 into Archives
- What is in it for us?
- Modular structure and use of schemes allow
creation of meta-resources, offering additional
functionality - Question/concept/variable banks
- Geography databases
- Organizations/Individuals registries
59Integrating DDI 3 into Archives
- What is in it for us?
- Concept/question/variable banks
- Metadata reuse
- Cross-study variable/question/concept searches
and analyses - Cross-study comparisons
- Track questions/variables over time
- Register an organizations official measures
60Integrating DDI 3 into Archives
Concept/question/variable banks
.
61Integrating DDI 3 into Archives
Concept/question/variable banks
.
62Integrating DDI 3 into Archives
Concept/question/variable banks
.
63Integrating DDI 3 into Archives
- Geography databases /registries
- Automatically match locations with appropriate
geographic level - Keep track of historical changes
- Information always accurate and up-to-date
- Facilitate data entry
64Integrating DDI 3 into Archives
- Organizations/Individuals registries
- Keep track of historical changes (names,
affiliations, contact information, etc.) - Information always accurate and up-to-date
- Facilitate data entry
65Integrating DDI 3 into Archives
- What is in it for us?
- Preservation
- Life cycle orientation of documentation means
that a chain of custody is provided to meet
preservation requirements. - Archives can use the life cycle events to track
data processing activities (data transformation). - The structure of DDI 3.0 integrates well with
FEDORA (Flexible Extensible Digital Object
Repository Architecture) a digital repository
management system used by many archives. - Separate instances can be created to follow the
OAIS model SIP, AIP, DIP.
66Integrating DDI 3 into Archives
- Information sharing
- Use of DDI 3 facilitates information sharing and
collaborative projects among archives - Example SRO-ICPSR Data Documentation and
Dissemination project implements a common, DDI
3.0 compliant, database model to allow a smooth
data transfer between the two organizations.
67Integrating DDI 3 into ArchivesSRO-ICPSR
collaboration project
ICPSR
SRO
SAS/SPSS/Stata files
DDI 3.0
Blaise output
DDI 2.x
Other
Common RELATIONAL DATABASE model for data
documentation - Compliant with DDI 3.0 -
Client Applications
Web Applications
ICPSR Variable-level Search
ICPSR projects will be able to use documentation
generated by SRO projects
68ArchivesMoving the collection to DDI 3.0
- Catalog records
- Archive standard -gt map to DDI 3.0
- Dublin Core -gt map to DDI 3.0
- DDI 2.x -gt map to DDI 3.0
- Conversion by simple programming script or XSLT.
69ArchivesMoving the collection to DDI 3.0
- Catalog record conversions
- Examples
- ICPSR -gt DDI 2.1 -gt DDI 3.0
- http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/Template_DDI2_toDDI3_Mapping_S.pdf - Dublin Core -gt DDI 2.1 -gt DDI 3.0
- http//www.icpsr.umich.edu/DDI/ddi3/workshop/files
/Dublin_Core_DDI2_toDDI3_20Mapping.pdf -
- ICPSR Stylesheet DDI 2.1 -gt DDI 3.0
- http//www.icpsr.umich.edu/DDI/ddi3/workshop/
70ArchivesMoving the collection to DDI 3.0
- Legacy studies
- Tools
- Stats to DDI 3.0
- DDI 3.0 editor
- XML editor
- DDI 2.x codebooks
- Tools
- DDI 2.x to DDI 3.0 converter
- (may be stylesheet, or simple script, based on
DDI 2.x to 3.0 mapping)
71Resources
- DDI 3.0 Proof of Concept -
- Use Cases and Implementations
- http//www.ddialliance.org/DDI/ddi3/use-cases.html
- DDI Tools
- http//tools.ddialliance.org/
- Workshop materials
- http/www.icpsr.umich.edu/DDI/ddi3/workshop
72Contact Information
- Sanda Ionescu sandai_at_umich.edu
- Mary Vardigan vardigan_at_umich.edu
- Matthew Richardson matvey_at_umich.edu
- DDI users list
- http//www.ddialliance.org/codebook/listserv.html
73Questions?
74The End.