Title: Next on OPRAH
1- Next on OPRAH
- Bringing Data Out of the Closet
OLA SuperConference Friday, 1 February, 2002
Walter Giesbrecht, Data Librarian York
University Jeff Moon, Head, Documents
Unit Queens University
2Not this Data
3 but these kinds!
4(No Transcript)
5Lets take a look at Data and Statistical
Analysis have you ever seen the movie Twins?
6Think of Arnie as the Data continuum
Raw Survey Data
Tables, Charts, Graphs
A number
(from books, journals, the web, etc...)
French Mother Tongue (1996) in Ontario
Employment levels by occupation class
Annual inflation rate from 1914 to present
Coded responses of surveyed individuals
Aggregate Data
Microdata
7Aggregate Data
Canada - Employment Telecommunication Equipment
Industry
479,285
A Number
Tables, Charts, Graphs
Time Series
8Sources of Aggregate Data
- Statistics Canada is generally the first stop for
Canadian Data - The Canada Year Book (print)
- The Daily (web)
- Canadian Social Trends (web/print)
- CANSIM / E-Stat (web) time series
- Canadian Statistics (web)
- Beyond 20/20 Files multidimensional tables
9Survey Data (microdata)
variables
respondents
Statistical analysis software is used to generate
meaningful results e.g. SPSS, SAS.
10Sources of Survey Data
- Once again, Statistics Canada is generally the
first stop for Canadian Data - The Data Liberation Initiative (DLI) provides
access to hundreds of publicly released survey
data files. - Polling Companies (Environics, CROP, etc.)
produce microdata files as well. - For US International data, the
Inter-university Consortium for Political
Social Research (ICPSR)
11Survey Data
Aggregate Data
Postcard
Camera
Fixed
Flexible
12 Think of Danny as the Statistical Analysis
continuum
Tests of
Percentages
Standard
Counts
Deviations
Significance
Averages
Descriptive Statistics
Inferential Statistics
13Aggregate / Descriptive
Microdata / Inferential
Data continuum
A number
Tables, Charts, Graphs
Raw Survey Data
Statistical Analysis continuum
Significance testing
Percentages
Counts
Standard Deviations
Averages
14To review
Data Aggregate Survey Data (Microdata)
Statistical Analysis Counts, Percentages,
Averages, Standard Deviations, Cross-tabulations,
t-tests, Regression, etc.
15Reference Question Example
How many of you have had a patron arrive at the
Reference Desk with a newspaper article reporting
Statistics Canada data?
16Globe Mail, Dec 17, 2001, p A15
71 of 15- to 17-year-olds use online chat
rooms, double the proportion of the only slightly
older 20- 24-year-olds.
17First, note that the article says Statistics
Canada, in a study released last week So
where do you go from here?
18First Lets try
http//www.statcan.ca/start.html
19Which leads you to the following
20Which leads, in turn to
Canadian Social Trends, Winter 2001
Here is the statistic quoted in the Globe
and here is the source
21So how do we check out this source?
General Social Survey, 2000
DLI Web Site (or Local Data Centre)
http//www.statcan.ca/english/Dli/dli.htm
22(No Transcript)
23Documentation
and Data
24So going to your campus Data Centre
http//library.queensu.ca/webdoc/ssdc/key.htm
25(No Transcript)
26AGEGR5 less than or equal to 3
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Results
31?
Canadian Social Trends
Our cross-tab
vs
32Reply from Statistics Canada
The difference in the numbers is because I used
the variable H19 while your client is using the
variable H20. H19 asked respondents who had used
the Internet in the last year, if they had ever
used the Internet to connect to an ONLINE CHAT
SERVICE. H20 asked respondents how often they
used the Internet to connect to an online chat
service in the last month.
An errata will be issued for the table appearing
in CST because the table does not show
percentages for those who used the Net in the
last month but for those who used the Net in the
last year.
So lets try again with H19
33So we need
34(No Transcript)
35The numbers match!
AND youll note the table now says last 12
months
36Original Table
Dec 2001
Revised
Jan 2002
37So We can use survey files to verify published
results.
But We can also use survey files to expand on
published results and explore new avenues of
research.
- For example
- What is the influence of gender, education, or
income on Internet use? - Are there differences between provinces? Between
URBAN and RURAL dwellers? - Or any number of other dimensions any question
asked in the survey.
38Survey Data
Aggregate Data
Postcard
Camera
Fixed
Flexible
39Sources of Aggregate Data
- print
- e.g., Canada Year Book, STC print publications
- CD-ROM
- e.g., 1996 Census Profiles, LFHR, other DSP
products - Web-based
- The Daily
- Canadian Statistics
- PDF versions of print publications
- Beyond 20/20 Files multidimensional tables
- CANSIM / E-Stat time series
40Beyond 20/20 what is it?
- Used to display multidimensional data, i.e., more
than 3 dimensions or characteristics at once - e.g., age, sex (usually 3!), geography, date,
etc. ... - allows user to customize the display of the data
- very useful for aggregate data, less so for
microdata
41Beyond 20/20what is it used for/in?
- used in an increasing number of STC products,
- many CD-ROM DSP products,
- e.g., LFHR, ITC, Profiles, Nation Series,
Dimensions, etc. - one of available formats on E-Stat
42(No Transcript)
43CANSIM
- acronym for CANadian Socio-Economic Information
Management System - time-series data
- available
- direct from STC ()
- via E-Stat (free to registered institutions)
- via DLI (from UofT)
44CANSIM II via E-Stat
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55Dealing with data really isnt that hard ...
56Dont be afraid to ask for help!
57(No Transcript)