Title: Surfing the education wave with official statistics
1Surfing the education wave with official
statistics
- Sharleen Forbes
- Statistics New Zealand
- School of Government, Victoria University
2To cover
- The role of a National Statistics Office in
education - why surf at all? - Prioritising - what can we afford and where
should we invest? - Current initiatives
- - Community groups
- Schools
- - Tertiary education
- Playing with official statistics
- Examples for classroom use
- Where to in the future?
- Providing more sets of real data
- New ways of visualising data
3Role of Statistics New Zealand
- Lead the state sector in production of official
statistics (official statistics system
responsibility) - Employ large number of statisticians
- Not funded specifically for education (promote,
partner or facilitate rather than provide) - Need to provide easily understood statistics
(Public Good requirement) - Should target informal / second chance education
(NSO Workshop ICOTS 6, Singapore) - Focus on official statistics
4- Differences between official and other statistics
5Prioritising - what can we afford
- where should we invest?
- Need to balance external demands with internal
training needs - Limited funds (need to pick the wave - where
can we make a difference?)
6Current initiatives -
community groups
- State sector (Official Statistics System)
- Certificate of Official Statistics (Level 4)
- School of Government and ANZSOG courses
- Workshops and seminars
- Journalists
- JTO compulsory statistics unit(s)
- Statistics prize
- Small businesses
- GoStats!
- Maori communities
- Pilot projects
7Current initiatives -
schools
- Resources to support the new curriculum
- Schools Corner on Statistics New Zealand website
(http//www.stats.govt.nz/schools-corner) - CensusAtSchools
- Joint funder (http//www.censusatschool.org.nz)
- Dataset provision
- Census
- Official Statistics Surveys
- Synthetic Unit Record Files (SURFs)
8Current initiatives -
tertiary education
- Network of Academics in Official Statistics
- To provide training and research
- Undergraduate student prizes (1000)
- Official Statistics Research Fund
- Partnerships with researchers
- Vice-Chancellors agreement
- Confidentialised Unit Record Files (CURFs)
- Half-time Professor of Official Statistics
- School of Government, Victoria University
9Playing with official statistics - Examples
- Census data
- Official Statistics Survey data
- Specially constructed data sets
- Confidentialised Unit Record Files (CURFS)
- Synthesised Unit Record Files (SURFS)
10The statistical investigation (PPDAC)
cycle(Creators Wild and Pfannkuch, Auckland
University,1999)
- Problem statement of the research
questions - Plan procedures used to carry out the study
- Data data collection process
- Analysis summaries and analyses of the data to
answer the questions posed - Conclusion about what has been learned.
111. Census data example
- Problem (Question)Is Hamilton greener than
Wellington? - Plan / DataUse 2006 Census data on the way
people travel to work to indicate how green a
city is. (www.stats.govt.nz/census/)
12 13Definitions (Re)classifications
- How many and what classes of green shall we
have? - Have defined green-ness by mode of travel to
work - Lets have only 3 classes of green-ness
- Not green Driving private or company vehicles
- Green Passenger in private vehicle or using
public transport - Very green Walking, biking or working at home
- Omit other categories
14 More analysis
15 Conclusion (and classroom questions)
- Conclusion
- Wellington is greener than Hamilton
- Questions
- Is mode of travel to work a good indicator of
green-ness? - What other variables might affect mode of
travel? - Should we use more than one indicator?
16Official Statistics Survey data
- Problem (questions)
- Are fewer people unemployed now than in previous
years? - Are you less likely to be unemployed if you have
a high level of education ? - Plan / Data
- Analyse time series data on national
unemployment rates - Statistics New Zealands Household Labour Force
Survey (www.stats.govt.nz)
17 Analysis - Question a).
Time series plots
18 Conclusions (and classroom questions)
- Conclusions
- Unemployment has been lower since 2004 than in
previous years - Since 2004 unemployment has stayed at roughly the
same level (about 4) - Seasonality is not marked
- Questions
- What was the cause of the peaks (1991-3 and 1999)
in unemployment? - What do the small peaks in 2004 - 2007 reflect?
- Should we answer a count question (number
unemployed) with a rate (percent unemployed in
the labour force)?
19 Analysis - Question b).
Time series plots
20 Conclusions (and classroom questions)
- Conclusions
- Pattern over time is similar for all
qualification groups. - Unemployment rate always highest for workers with
no educational qualifications. - Questions
- Which group appears to be the most disadvantaged
when unemployment is high? - What appears to be different in recent (compared
to past) years between the qualification groups?
21Another sample survey example - a
simple look at seasonality
- Problem (question)
- Is there an annual pattern in retail sales?
- Plan / data
- Check for seasonality in quarterly summary time
series data for monthly retail trade sales (in
dollars) - Statistics New Zealands Retail Trade Survey
- (www.stats.govt.nz)
22 Analysis Time series
plot
23 Conclusions (and classroom questions)
- Conclusions
- Annual seasonality - peak every December /
January - Rising trend over time - plateau in last 3
quarters - Questions
- What components of retail trade would contribute
most to the December peaks? - What does it mean when the seasonally adjusted
and trend lines lie virtually on top of each
other? - Easter fell in the March rather than June quarter
in 2008. Is there any evidence that this affected
the pattern of retail sales?
243. Specially constructed data sets -
Confidentialised datasets
(e.g. 2004 Income Survey)
25SURFING Classroom Examples (SURF creator
Pauline Stuart, Statistics NZ)
- Using 2004 Income Survey SURF data.
- Data available on CD or downloaded from Schools
Corner on the Statistics New Zealand website
(www.stats.govt.nz/schoolscorner/). - Dataset has 200 records and seven variables
- gender (male, female)
- highest education qualification (none, school,
vocational, degree) - marital status (married, never, previously,
other) - ethnic group (European, Maori, Other)
- age (15-45)
- hours worked weekly (0-79)
- weekly income (0-2000).
26Example
- Background
- In this example we let the SURF dataset represent
a companys employees. - Every employee creates the same administration
costs regardless of how many hours are worked. - The company is concerned that its staff
administration costs are too high. - Problem (questions)
- Do most employees work a normal (40 hour) week?
- What variables are related to the number of hours
worked?
27Specific questions for secondary school classrooms
- What proportion of employees work at least 40
hours per week? (Summary) - 2. Are these proportions different for males
and females? (Comparison) - 3. Do males tend to work more hours per week
than females? (Comparison) - 4. What is the relationship between hours
worked and income? (Relationship between two
measurement variables)
28 Plan / Data (a).Take a random sample of 35 from
the SURF
Table Sample Summary Statistics
29 Conclusions (and classroom questions)
- Conclusions
- Only half of all employees work 40 hours or more.
- On average (mean) males work longer hours than
femalesHours females work vary (standard
deviation, inter-quartile range) more than hours
males work. - Questions
- Are samples of size 17 and 18 large enough?
(beware of categorical data) - What does it indicate when the mean and the
median are different?
30Plan / Data (b). - Resample
- Compare between students samples (summary
statistics) - Combine students samples and create new summary
statistics - Sample (another 35 say) and compare (or combine)
summary statistics
31 Plan / Data (c). - Use all the
SURF data
- How do sample statistics compare with total SURF?
- Would a graph be easier to interpret than the
table?
32- Analysis
- Graphs of SURF data
33 Conclusions (and classroom questions)
- Conclusions
- Use tables for reference, graphs to tell a story.
- Females bimodal? at 5-25 hours (part-time) and
35-50 hours (full-time)? - Males tri-modal? small at 10-15 hours
(part-time), large at 35-55 hours (full-time),
small at 60-75 hours (maybe managers)? - Proportions of males and females working 40 hours
or more are different. About half of the males do
but only about a quarter of the females do. - Questions
- What is the clumping at 40 hours?
- Given the size of the SURF do you think the above
patterns will be similar if other SURFs are
taken?
34 Analysis - Question 4. Relationship between
hours worked and income?
35 Conclusion (and classroom questions)
- Conclusion
- Income increases as work more hours.
- Questions
- What is the estimated income for someone who
doesnt work? - What extra income (on average) is expected if
work an extra hour per week? - Is the (regression) line a good fit to the data?
36Other factors related to hours worked?(Sex /
Highest qualification / Ethnicity, etc.)Example
from a first-year university course Creator
John Harraway, Otago University
- Plan / Data
- Recategorise highest qualification
- Secondary None OR Secondary (105) S
- Tertiary Vocational OR Tertiary (95) T
- Do a linear regression in SPSS(equivalent to
t-test for difference in means)
37 AnalysisSPSS regression output
- Weekly Income (414 344Tertiary)
- 95 confidence interval for increase in income if
have a tertiary qualification is257 - 431 - T 7.8, p 0.000..
- R2 0.24 (only about quarter of the variation in
the points explained by the best-fitting line)
38 Conclusion (and classroom question)
- Conclusion
- Income is higher on average (by 344) if have a
tertiary qualification. - Question
- Is qualification a good explanator of income
earned?
39Are there multiple factors related to income?
- Problem (Question)Are both qualification and
hours worked related to income? - Plan / DataDo a multiple regression (main
effects model - no interaction terms) in SPSS
using SURF data
40 AnalysisScatterplot Income by hours worked
and qualification
(S secondary, T tertiary)
41SPSS regression output (values extracted
rounded for all 3 models)
42 Conclusions
- Weekly Income (-19 15xHours Worked
183xTertiary) - Conclusions
- Both hours worked and highest qualification are
related to weekly income earned - Mean increase in income per hour worked is
reduced (from 17 to 15) if tertiary also
considered - Mean increase in income if have a tertiary
qualification is also reduced (from 344 to 183)
when adjusted for number of hours worked - 95 confidence interval for the intercept (income
when no hours are worked) still contains zero
43 Classroom questions
- Questions
- Is there any interaction between hours worked
and qualification? - Which of the above models fits the data best?
- Are there any outliers?
- What does a scatterplot of the residuals
(distances from the line) indicate?
44More resampling
- Use SURF as sample from CURF population
- Bootstrapping
- Take repeated samples with replacement (of same
size as original, n200). - Jack-knifing
- Take repeated samples dropping one value from
original sample each time (n199). - Calculate mean and standard deviation of sample
means - Compare summary statistics with CURF (or full
2004 Income Survey).
45Where to from here?
- Continue and develop partnerships (academics,
teachers, community groups) - More CURFs and SURFs(official launch 1 September
2008 - 2001 Savings Survey SURF
www.stats.govt.nz/schools-corner) - Increased free access to data for post-graduate
students - Data visualisation (dynamic graphs)
- More across-discipline outputs
46Animated population pyramids(Creator Martin
Ralphs, Statistics NZ)
47Economic structure population pyramid(Office of
National Statistics UK)
48Gapminder www.gapminder.orgGeography, history,
demography, econometrics(Creator Hans Rosling)
49Questions and comments
- What are your ideas for the future?
- Contact sharleen.forbes_at_stats.govt.nz
- Thank you.