Title: Philip Clarke and Denise Silva
1Development of Small Area Estimation at ONS
- Philip Clarke and Denise Silva
2Outline
- Small Area Estimation Problem
- History and current provision
- Development in progress
- Wider research
- Consultancy service
31. Small Area Estimation Problem
- Official statistics provide an indispensable
element in the information system of a democratic
society (Fundamental Principles of Official
Statistics, UNSD ) - Sample surveys are used to provide estimates for
target parameters on population (or National)
level and also for subpopulations or domains of
study - However implementation in a Small Area Context is
challenging
4Small Area Estimation Problem
- In small areas/domains sample sizes are usually
not large enough to provide reliable estimates
using classical design based methods. - Small area estimation problem refers to SMALL
SAMPLE SIZES (or no sample in the domain or area
of interest) - In sample surveys an area or domain is regarded
as small if the sample within the area is not
sufficiently large enough to provide direct
survey estimates of adequate precision
52. History
- Small Area Estimation in UK begun as research
project in late 1990s. - In response to calls for locally focussed
information in many different areas - Environmental
- Business
- Social, e.g. health, housing, deprivation,
unemployment. - Also calls for more general domain estimation
- e.g. cross classifications by age/sex,
occupation. - Initial experimental studies on mental health
estimation for DoH.
6Developing alternative methodology
- Purpose
- To enable production of reliable estimates of
characteristics of interest for small areas or
domains based on very small or no sample. - To asses the quality (precision) of estimates.
- Several years of research and development (since
1995) - UK (ONS, UoS , IoE) Finland ( Statistics
Finland , UoJ) - The EURAREA project
- Research programme funded by Eurostat to enhance
techniques to meet European needs (from
2001-2004)
7Basis of Approach Relax the Survey Restriction
- Borrow strength by removing the isolation of
depending solely on the survey and solely on
respondents in a given area. - Widen the class of respondents for a given area
by pooling - together similar areas.
- e.g. by Region - consider all respondents within
a region when estimating for any of its areas. - Widen the class of respondents by taking past
period respondents into account. - Take advantage of other related data sources
which are not sample survey based. Known as
auxiliary data. - e.g. Administrative data or census data which
are available for all areas/domains.
8Model based estimation
- All approaches detailed are based on an implicit
or explicit model. - The auxiliary data and use of data from all areas
is the approach currently adopted. - Borrows strength nationally.
- Uses an explicit statistical model to represent
the relationship between the survey variable of
interest and auxiliary data. - Dependent variable is survey variable of
interest. - Independent variables are certain auxiliary data
variables known as covariates. - Model fitted using sample data and assumed apply
generally. - Model then used in the obtaining of area/domain
estimates.
9Outline of a model structure
- Suppose variable of interest, Y, in an area j is
linearly related to a variable X which is
available from a non-survey source in
approximately the following manner - This is a deterministic structure, so we need to
add some random variability
10- Obtain
- uj represent random area differences from the
deterministic value. - represents variability between areas.
11Model fitting
- Fit the model using direct survey estimates for
each area. - This introduces additional sampling variability.
- Unit level sampling variability
- giving rise to additional area level sampling
variability
12Estimating from the model
- Once the model is fitted, estimate for area j by
using parameter estimates - Estimate of mean squared error given by
- Modelling success measured by obtaining estimates
with high precision based on low mean squared
errors.
13Current provision
- SAEP a generic methodology for application to
variables from household based surveys. - Mean household income based on Family Resources
Survey published as Experimental Statistics for
wards in 1998/99, 2001/02 and for middle layer
super output areas 2004/05 - Specialised methodology for labour market
estimation of unemployment from Labour Force
Survey. - Unemployment levels and rates routinely published
quarterly as National Statistics for Local
Authority Districts in Great Britain.
14SAEP methodology and income estimation
- SAEP methodology is -
- derived from outlined model-based approach,
- BUT is
- based on a unit (household)/area multilevel
model - borrows strength across areas using multivariate
area level auxiliary data (covariates) - can model transformation of variable of interest
if required - adapted for estimating at ward/middle layer super
output area (MSOA) from customary ONS clustered
design household sample surveys - estimator is synthetic depends only on fixed
effect model terms.
15Application to income estimation- Respondent
Variable
- Income value for each household sampled in Family
Resources Survey (FRS). - 3,000 wards in England and Wales with sample in
2001/02, - 23,000 total responding households.
- But not a random sample.
- Clustered design with primary sampling units as
postcode sectors, - 1,500 sampled postcode sectors.
16Coping with design clustering
- Samples are random samples of postcode sectors
- So random terms are around postcode sectors,
indexed by j - Estimation is required for geographically
distinct wards or middle layer super output
areas - So covariates are for these areas, indexed by d
- For estimation, covariates must be known for all
areas not just sampled areas.
17SAEP model and estimator structure for income
estimation
- Multilevel structure gives rise to unit level
random term replacing area sampling
variability - Logarithmic transformation of income taken
because of positive skewness of income
distribution - Model
- Estimator
18SAEP model fitting procedure
- Create a dataset containing
- Variable of interest from individual household
responses to survey. - values of a large number of administrative and
census variables for the particular household
area of residence which we believe could impact
on variable of interest, eg census variables, DWP
social benefit claimant rates, council tax band
proportions
19SAEP model fitting procedure (cont.)
- Starting with a null model, fit covariates in a
stepwise manner in order of significance by using
specialised multilevel software eg. MLwiN or
SAS PROC MIXED. - In this way select a set of significant
covariates and fit an accepted model. - Use diagnostic techniques to investigate model
against assumptions eg. Randomness of residuals,
unbiasedness of predictions.
20Converting to raw income scale
- Need to make allowance for
- mean(log) log(mean)
- Area estimate
- Confidence interval
21Actual model for ward estimation of income in
2001/02
pmanprof proportion of people professional and
managerial phhtyp7 proportion of two person h/h
with non-dependent children pjsa proportion of
people claiming job seekers allowance enggh
proportion of council tax band GH dwellings for
England walegh proportion of council tax band
GH dwellings for Wales
22Income estimation outputs
- Estimates obtained of sufficient precision for
publication and acceptable to user community. - Placed on Neighbourhood Statistics website
together with user guides and technical
documentation. - Accredited as Experimental Statistics
23Estimation of unemployment at local authority
level
- BACKGROUND
- Unemployment is a key indicator and is used for
policy making and resource allocation - Official UK measure of unemployment follows the
International Labour Organisation Definition
(ILO) - ILO unemployment is estimated via the Labour
Force Survey (national level) - Small (locally) sample sizes in the LFS
24Features of Labour Force Survey
- A rolling longitudinal survey
- Roughly 60,000 households surveyed each quarter
- Each household remains in sample scope for 5
quarters, known as waves 1 to 5 - Then household drops out
- Waves 1 and 5 respondents for last four quarters
used to obtain an annual local labour force
survey dataset of about 90,000 independent
households. - Unclustered survey design giving a sample in
each LAD.
25Features of unemployment modelling
- Unclustered LFS design means
- direct estimates available for each LAD though
small sample sizes mean they have low precision
hence a need for better precision model-based
estimates - no separate sampling and estimation areas as for
SAEP modelling with clustered surveys - Availability of estimated random area terms in
LAD estimation. - Availability of a highly correlated covariate
number of claimants of unemployment benefit/job
seekers allowance - Eliminates need for model fitting to a range of
possible covariates on each occasion.
26Specification of model
- Modelling is by six age/sex groups in each LAD.
- age groups 16 to 24 25 to 49 50 and over
- Survey data variable
- proportion of responders in each age/sex group in
each LAD who are unemployed. (logit transformed) - Covariate data
- Proportion of people claiming job seekers
allowance in particular age/sex group in LAD
(logit transformed) - Proportion of people claiming job seekers
allowance in LAD as whole. (logit transformed) - dummy variables
- Indicator of sex/age group
- Indicator of Government Office Regions for LAD.
- Indicator of Supergroup of National Statistics
Area Classification for LAD.
27- is the LFS proportion of individual
respondents who were unemployed in age/sex group
i and LAD d - is the vector of area by age/sex class
level covariates - Random term specified at LAD level, d
28Estimator from model
- The model-based estimator of proportion
unemployed in each age/sex group of each LAD is
then given after fitting model by - Note the use of the term in the estimator
as it is now available for each LAD.
29Determination of LAD unemployment level
- Model has estimated a proportion at each age/sex
group. - This is converted into an estimate of
unemployment level at each LAD by - multiplying each proportion estimate by the LFS
estimate of population unsampled - adding those sampled and found unemployed
- summing the age/sex group estimates
30Determination of LAD unemployment rate
- The estimation of unemployment rate is obtained
by dividing the estimate of level by the sum of
the direct estimate of unemployment and the
model-based estimate of unemployment
31Unemployment estimates publication
- The standard errors of the model based estimates
found to be smaller than the corresponding direct
standard errors in each LAD. - Model-based estimates have been accredited as
National Statistics and now published quarterly
in Labour Market statistics releases.
323. Developments in progress
- Labour Market area
- Unemployment estimation at Parliamentary
constituency level - Issue here is to ensure consistency with LAD
estimates at comparable areas - Model developed and estimates likely to become
available in the coming year - Consistent estimation of all three labour market
states - employed, not economically active,
unemployed - Multivariate model currently in development
33- Income estimation
- Estimation at local authority level
- Clustered survey design entails a modification of
SAEP framework to cater - Currently in development
- Estimation of poverty/ households below threshold
- Currently being developed for local authority
level
344. Wider research activities
- In conjunction with academic partners
- Estimation of change over time
- Current work is confined to single point-in-time
estimation but users would like indication of
progress over time particular in relation to
funding - Estimation of poverty using M-quantile modelling
- Research using FRS data by Nikos Tzavidis
- Models incorporating spatial relationships
- Preliminary investigation of spatial relationship
in unemployment model in conjunction with Ayoub
Saei at Southampton University - Link with work at Imperial College by Nicky Best
and Virgilio Gomez-Rubio
355. Methodology Consultancy Service
- ONS is currently establishing a methodology
consultancy service - To undertake and support statistical work by
other government departments and public sector
organisations. - Resource for assessment/quality improvement
- Currently working with Health and Safety
Executive on small area estimation of incidence
of work related illness at local authority level.