Title: Kirsten Barber, Computer Resource Specialist, ECCSE, SDSU
1May 11, 2001 NPACI DICE Group Meeting San Diego
Supercomputer Center
Kirsten Barber, Computer Resource Specialist,
EC/CSE, SDSU Lindsay Stocks, NPACI REU Student
Programmer , SDSU
2- An online statistical tool for analysis of
sociological survey data - For use with custom datasets
- Designed for categorical data
- Functions
- Frequency table
- Cross tabulation table
- Rules analysis table
3Differences Between SWB V.1 and SWB V.2
SWB Version 1
SWB Version 2
4SWB Architecture XML Related
Client
Function Query
XML Survey Document
XML Output
XML Ouput with Table Format Stylesheet
SWB Interface
Table Querying SQL statements (via JDBC)
XML Survey Document
Table output in XML format
XML Parser
Oracle
XML Tree
XML Processor
SQL statements for table creation and the
insertion of all question information (via JDBC)
5- Developed by the Inter-University Consortium for
Political and Social Research at the University
of Michigan - Development began in May 1995
- Version 1 of the ddi DTD released in March 2000
- Following a meeting of the DDI Working Group in
November 2000, work started on version 1.01 in
early 2001
6- Overview of the ddi DTD
- Document Description
- Study Description
- Data Files Description
- Variables Description
- Other Study-Related Materials
- SWB main focus is on the Variable Description
7- Challenges working with the ddi
- All tags in the Variable Description portion of
the DTD are optional - Flexibility of the ddi DTD allows for extensive
personal judgement in tag usage - Our Solution
- Place restrictions on tags by making them
mandatory for our application. For a listing of
SWB mandatory tags and their usage
http//edcenterdev.sdsu.edu/SOURCE/ddi.html
8Example of a ddi compliant XML document
- lt?xml version"1.0" ?gt -
- ltcodebookgt-
- ltdocDscrgt
- ltguidegt Sample Data From SWB lt/guidegt
- lt/docDscrgt-
- ltstdyDscrgt-
- ltstdyInfogt ltabstractgt
- This file contains the listing of
sample questions from SWB that is a mix of test
and 2825_numOnly - lt/abstractgt lt/stdyInfogt
- lt/stdyDscrgt-
- ltdataDscrgt-
- ltvar name"SPANKING" format"String"
ID"q1"gt- - ltvalrnggt ltrange min"0" max"9"
/gt lt/valrnggt - lttxtgtFavor Spanking to Discipline
Childlt/txtgt - - ltcatgrygt ltcatValugt1lt/catValugt
lttxtgtAgreelt/txtgt lt/catgrygt- - ltcatgrygt ltcatValugt2lt/catValugt
lttxtgtNeutrallt/txtgt lt/catgrygt- - ltcatgrygt ltcatValugt3lt/catValugt
lttxtgtDisagreelt/txtgt lt/catgrygt- - lt/var
- lt/dataDscrgt
9Processing of XML Documents
- Parsing
- Using Suns XML Parser for Java
- Parses the XML document into a tree structure
- Processing
- EdCenter team developed a driver that recursively
traverses the tree pulling out pertinent info on
each question and returns this information as a
linked list of questions - EdCenters Create Class generates the Oracle
tables necessary for the survey and inserts all
question information into the tables - EdCenter Oracle table design for the SWB includes
n2 tables, where n is the number of questions in
a survey
10Oracle Table Structure
- One set of tables for each survey that is
uploaded to the SWB - Total of n2 tables for each survey
Question Description Table Short Description Long
Description Minimum Response Value Maximum
Response Value Minimum Exclusive Response
Value Maximum Exclusive Response Value Question
ID Variable Type (categorical/numeric)
n Value Description Tables Response
Value Response Description
Response Table n columns, associated with the
Question IDs in the Question Description Table
(q1, q2, q3, , qn)
11Categorical vs. Numeric Variables
- SWB is unable to handle numeric variables. All
numeric variables in a survey must be converted
to categorical. - EdCenter team decided to deal with this
conversion during the data upload process for new
datasets. - SWB provides several options for this conversion
- SWB will automatically split all numeric
variables into 10 equal sized categories - SWB will split a numeric variable into user
defined number of equal sized categories - SWB will split a numeric variable into user
defined categories unevenly sized categories (ie
0-18, 19-25, 26-45, 45-55, 55-100) - SWB will ignore numeric variables all together
http//edcenterdev.sdsu.edu/SOURCE/login/login.htm
l
12XML Output from Oracle
- API - oracle.xml.sql.query.OracleXMLQuery
- http//technet.oracle.com/docs/tech/xml/oracle_xsu
/doc_library/oracle/xml/sql/query/OracleXMLQuery.h
tml - In order to produce XML output from Oracle the
SWB team had to create SQL functions to organize
the tables in a way in which we can easily
convert from Oracle tables to an XML document.
This document can either be displayed to the
screen or stylized to create a table. - Output Tags- The tags are determined by the SQL
query, and vary with each type of table.
13XML Output Current Uses
- Display XML - The XML output does not conform to
any DTD, it is organized for ease of creation of
tables.
lt?xml version"1.0" ?gt - ltROWSETgt - ltROW
num"1"gt ltDESCRIPgtMissing Datalt/DESCRIPgt
ltAMOUNTgt979lt/AMOUNTgt ltAMOUNT_PERCENTgt33.71lt/AMOU
NT_PERCENTgt lt/ROWgt lt/ROWSETgt
14XML Output Current Uses
- Display Table - Use the same XML output but
associates it with a XSLT stylesheet. Using the
OracleXMLQuery API.
DOMParser parser new
DOMParser() parser.setPreserveWhites
pace(true) FileInputStream
xslstream new FileInputStream(stylesheet)
parser.parse(xslstream)
XMLDocument xsldoc parser.getDocument()
FileInputStream xslstream.close()
OracleXMLQuery qry new OracleXMLQuery(conn,
query) XMLDocument domDoc
(XMLDocument)qry.getXMLDOM()
XSLStylesheet xsl new XSLStylesheet(xsldoc,null)
XSLProcessor processor new
XSLProcessor() XMLDocumentFragment
result processor.processXSL(xsl, domDoc)