Trials and Tribulations of creating DDI Codebooks at the University of Guelph - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Trials and Tribulations of creating DDI Codebooks at the University of Guelph

Description:

Trials and Tribulations of creating DDI Codebooks at the University of Guelph ... Used Maddie to develop initial template. Edited the template to add tags as required. ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 42
Provided by: amichell8
Category:

less

Transcript and Presenter's Notes

Title: Trials and Tribulations of creating DDI Codebooks at the University of Guelph


1
Trials and Tribulations of creating DDI Codebooks
at the University of Guelph
  • A.Michelle Edwards and Carol Perry,
  • Data Resource Centre,
  • University of Guelph
  • Guelph, Ontario

2
Current Search Function
3
Search Results
4
Current Documentation
5
Identifying Variables
6
Rationale for Change
  • 522 datasets to date.
  • No comprehensive metadata search function.
  • No current variable search within dataset.
  • Limits researchers autonomy.

7
XML tags
  • Started with approx. 30 or so tags
  • As of June 5, 2002
  • 101 tags
  • 59 are filled
  • Information contained inside tags

8
Codebook Templates
  • Used Maddie to develop initial template.
  • Edited the template to add tags as required.
  • Filled in fields common to all codebooks.

9
Codebook Templates
  • Statistics Canada data
  • ICPSR data
  • B2020 data format

10
Statistics Canada Codebook
11
Differences between Codebook Templates
  • Authoring entity
  • Distributor (DLI vs. ICPSR)
  • Licenses
  • Other material ICPSR abstract link
  • B2020
  • No direct link to database
  • No variables

12
How do we move our information from an HTML
readme file to an XML file???
13
Readme to XML
  • Document Description
  • Study Description
  • Data Files Description

14
Readme to XML
  • Currently copy and paste information from the
    Readme (html) file into the XML Codebook.
  • Script extracts metadata from html and places
    into XML.
  • Same amount of time.

15
Variable Information
16
Variable Information
  • Sources of Variable information
  • Variable names, labels, and position from the SAS
    program.
  • Frequencies for each variable value from SAS
    output.

17
Variable Information
  • Sources of Variable information
  • Literal questions from questionnaires if
    available.

18
Variable Information
  • Script
  • Looks into the SAS program pulls out the
    variable names, labels and positions.
  • Looks into a SAS output file for frequencies and
    variable value labels.

19
Variable Information
  • Script
  • If questionnaire is available seeks out
    questions and matches with variables.

20
Variable Information
  • Problems with Script
  • SAS programs must be consistent in their format.
  • SAS output and questionnaires matching
    variables.

21
SAS to XML
  • SAS 8.2 - XML engine and ODS XML.
  • Can create XML SAS output.
  • Variable names, labels, value labels, and
    frequencies.
  • Variable positions with the input statement and
    Proc Print ? XML.

22
SAS to XML Frequency Output
23
SAS to XMLProc print output
24
SAS to XML
25
SAS to XML
  • Advantages
  • SAS programs do not need to be consistent.
  • Use one program from start to finish SAS.
  • Still in development.

26
XML to Viewable Document
  • Saxon to render our XML documents to HTML using
    XSL Stylesheets.
  • XSL pull out info from XML document and display
    with HTML tags.

27
XSL Templates
  • Set for each
  • Statistics Canada
  • ICPSR
  • B2020
  • Initial templates from University of Virginia
    samples.

28
XSL Templates
  • Abstract
  • Study Info
  • Methodology File Dimensions
  • Questions
  • Variables Frequencies
  • Other Documents

29
XSL Stylesheets
30
Search
  • Uses SAS IntrNet to call and run the UNIX SGREP
    search.
  • Creates an XML file with results.
  • Calls Saxon to render the file with the Variable
    XSL Stylesheet.

31
Final Product
  • Frames to put it all together.
  • Links to each component (abstract, etc.).
  • Returns the rendered HTML on the fly.

32
Final Product
33
Final Product
  • Sun Exposure Survey 1996
  • http//tdr.uoguelph.ca/DATA/WWWDOCS/XML/SES2/ses96
    cbk.html

34
Finished Product
  • 522 datasets to date.
  • 35 Completed DDI-compliant codebooks.
  • Fall completion ???

35
Final Product
36
Final Product
37
Final Product
38
Final Product
39
Final Product
40
Final Product
41
Final Product
Write a Comment
User Comments (0)
About PowerShow.com