Title: Philip Clarke and Denise Silva
1Development of Small Area Estimation at ONS
- Philip Clarke and Denise Silva
2Outline
- Small Area Estimation Problem
- History and current provision
- Development in progress
- Wider research
- Consultancy service
31. Small Area Estimation Problem
- Official statistics provide an indispensable
element in the information system of a democratic
society (Fundamental Principles of Official
Statistics, UNSD ) - Sample surveys are used to provide estimates for
target parameters on population (or National)
level and also for subpopulations or domains of
study - However implementation in a Small Area Context is
challenging
4Small Area Estimation Problem
- In small areas/domains sample sizes are usually
not large enough to provide reliable estimates
using classical design based methods. - Small area estimation problem refers to SMALL
SAMPLE SIZES (or none at all) in the domain or
area of interest.
52. History
- Small Area Estimation in UK begun as research
project in late 1990s. - In response to calls for locally focussed
information in many different areas - Environmental
- Business
- Social, e.g. health, housing, deprivation,
unemployment. - Also calls for more general domain estimation
- e.g. cross classifications by age/sex,
occupation. - Initial experimental studies on mental health
estimation for DoH.
6Developing alternative methodology
- Purpose
- To enable production of reliable estimates of
characteristics of interest for small areas or
domains based on very small or no sample. - To asses the quality (precision) of estimates.
- Several years of research and development (since
1995) - Partnership work with universities and Statistics
Finland - The EURAREA project
- Research programme funded by Eurostat to enhance
techniques to meet European needs (from
2001-2004)
7Basis of Approach Relax the Survey Restriction
- Borrow strength by removing the isolation of
depending solely on the survey and solely on
respondents in a given area. - Widen the class of respondents for a given area
by pooling - together similar areas.
- Widen the class of respondents by taking past
period respondents into account. - Take advantage of other related data sources
which are not sample survey based. - Known as auxiliary data.
- e.g. Administrative data or census data which
are available for all areas/domains.
8Model based estimation
- All approaches detailed are based on an implicit
or explicit model. - The auxiliary data and use of survey data from
all areas is the approach currently adopted in
UK. - Borrows strength nationally.
- Uses an explicit statistical model to represent
the relationship between the survey variable of
interest and auxiliary data. - Dependent variable is survey variable of
interest. - Independent variables are certain auxiliary data
variables known as covariates. - Model fitted using sample data and assumed to
apply generally. - Model then used in the obtaining of area/domain
estimates.
9Outline of a model structure
- Suppose variable of interest, Y, in an area j is
linearly related to a single covariate X - A possible model structure is given by
-
- where is the mean of Y in area j
- This is a deterministic structure, so we need to
add some random variability
10- Obtain
- uj represent random area differences from the
deterministic value. - represents variability between areas.
11Model fitting
- Fit the model using direct survey estimates for
each area. - This introduces additional sampling variability.
- Unit level sampling variability
- giving rise to additional area level sampling
variability
12Estimating from the model
- Once the model is fitted, estimate for area j by
using parameter estimates
13Estimating from the model
- Once the model is fitted, estimate for area j by
using parameter estimates - Estimate of mean squared error given by
14Estimating from the model
- Once the model is fitted, estimate for area j by
using parameter estimates - Estimate of mean squared error given by
- Modelling success measured by obtaining estimates
with high precision based on low mean squared
errors.
15Current provision
- SAEP a generic methodology for application to
variables from household based surveys. - Mean household income based on Family Resources
Survey published as Experimental Statistics for
wards in 1998/99, 2001/02 and for middle layer
super output areas 2004/05 - Specialised methodology for labour market
estimation of unemployment from Labour Force
Survey. - Unemployment levels and rates routinely published
quarterly as National Statistics for Local
Authority Districts in Great Britain.
16SAEP methodology and income estimation
- SAEP methodology is -
- derived from outlined model-based approach,
- BUT is
- based on a unit (household)/area multilevel
model - borrows strength across areas using multivariate
area level auxiliary data (covariates) - can model transformation of variable of interest
if required - adapted for estimating at ward/middle layer super
output area (MSOA) from customary ONS clustered
design household sample surveys
17Application to income estimation- Response
Variable
- Income value for each household sampled in Family
Resources Survey (FRS). - 3,300 MSOAs in England and Wales with sample in
2004/05, - 21,500 total responding households.
- But not a simple random sample.
- Clustered design with primary sampling units as
postcode sectors, - 1,500 sampled postcode sectors.
18Coping with design clustering
- Samples are random samples of postcode sectors
- So random terms are around postcode sectors,
indexed by j - Estimation is required for geographically
distinct wards or middle layer super output
areas - So covariates are for these areas, indexed by d
- For estimation, covariates must be known for all
areas not just sampled areas.
19SAEP model and estimator structure for income
estimation
- Multilevel structure gives rise to unit level
random term replacing area sampling
variability - Logarithmic transformation of income taken
because of positive skewness of income
distribution - Model
20SAEP model fitting procedure
- Create a dataset containing
- Variable of interest from individual household
responses to survey. - values of a large number of administrative and
census variables for the particular household
area of residence which we believe could impact
on variable of interest, eg census variables, DWP
social benefit claimant rates, council tax band
proportions
21SAEP model fitting procedure (cont.)
- Starting with a null model, fit covariates in a
stepwise manner in order of significance by using
specialised multilevel software eg. MLwiN or
SAS PROC MIXED. - In this way select a set of significant
covariates and fit an accepted model. - Use diagnostic techniques to investigate model
against assumptions eg. Randomness of residuals,
unbiasedness of predictions.
22Estimator and mean squared error
- Estimator on log income scale
- A synthetic estimator is used omitting the random
area terms
23Estimator and mean squared error
- Estimator on log income scale
- A synthetic estimator is used omitting the random
area terms - Mean squared error
24Converting to raw income scale
- Need to make allowance for
- mean(log) log(mean)
- Area estimate
25Converting to raw income scale
- Need to make allowance for
- mean(log) log(mean)
- Area estimate
- Confidence interval
26Actual model for ward estimation of income in
2004/05
phrpman proportion of household reference
persons aged 16-74 who are in professional or
managerial occupations. lnphrpecac logit of
proportion of household reference persons aged
16-74 who are economically active. lnphhtype1
logit of proportion of one person
households. engegh proportion of council tax
band GH dwellings for England. pcgeo
proportion of people aged 60 and over claiming
pension credit (guarantee element only) .
27(No Transcript)
28Income estimation outputs
- Estimates obtained of sufficient precision for
publication and acceptable to user community. - Accredited as Experimental Statistics
- Placed on Neighbourhood Statistics website
together with user guides and technical
documentation.
29Estimation of unemployment at local authority
level
- BACKGROUND
- Unemployment is a key indicator and is used for
policy making and resource allocation - Official UK measure of unemployment follows the
International Labour Organisation Definition
(ILO) - ILO unemployment is estimated via the Labour
Force Survey (national level) - Small (local) sample sizes in the LFS for some
areas
30Features of Labour Force Survey
- A rotating panel survey
- Roughly 60,000 households surveyed each quarter
- Each household remains in sample for 5 quarters
(waves 1 to 5) then drops out - Waves 1 and 5 respondents for last four quarters
used to obtain an annual local labour force
survey dataset of about 90,000 independent
households. - Unclustered survey design giving a sample in
each LAD.
31Features of unemployment modelling
- Unclustered LFS design means
- direct estimates available for each LAD
- availability of estimated random area terms in
LAD estimation - However
- low precision of direct survey estimates due to
small sample sizes - need for better precision model-based estimates
- Availability of a highly correlated covariate
number of claimants of unemployment benefit/job
seekers allowance - Eliminates need for model fitting to a range of
possible covariates on each occasion.
32 The small area estimation model
- A LOGISTIC multilevel model by local authority
(d) and six age/sex classes (i). It relates the
probability pdi of an individual to be
unemployed. - Response variable proportion of unemployed
individuals in LFS in age/sex class of local
authority (logit transformed). - Covariate data
- Benefit data the logit of the claimant
proportion of job seekers allowance in each
age/sex class within each local authority and
also for overall age/sex classes - The age/sex class male/female for age groups (16
to 24 25 to 49 50 and over) - Geographical region the 12 government office
regions (GOR) - ONS area classification 7 categories under the
National - Statistics Area Classification for Local
Authorities
33- The model used to link pid with the auxiliary
data is a Binomial linear mixed model with a
logistic link function -
Area random effect
34Estimator from model
- The model-based estimator of proportion
unemployed in each age/sex group of each LAD is
then given after fitting model by - Note the use of the term in the estimator
as it is now available for each LAD.
35Model-based estimate for Unemployment
- Model has estimated a proportion at each age/sex
group - This is converted into an estimate of
unemployment level at each LAD by - multiplying each proportion estimate by the LFS
estimate of population unsampled - adding those sampled and found unemployed
- summing the age/sex group estimates
- Final Estimator for unemployment level for area d
is -
6 age-sex groups
36LAD Estimation of unemployment rate
- The estimate of unemployment rate is obtained
using model-based estimate of unemployment level
and the direct estimate of employment
Model-based estimate of Unemployment
Direct survey estimate of Employment
37Precision of Estimates
- The mean squared error (MSE) for the unemployment
level estimates in LAD d is given by several
components - G1 and G2 come from the uncertainty in estimating
the coefficients and u in the model - G3 arises because we have estimated the
variance of u - G4 is necessary because the model estimates
actual values rather than means - G5 is the additional variance component due the
estimation of population size in each
LAD -
38Unemployment estimates publication
- The standard errors of the model based estimates
found to be smaller than the corresponding direct
standard errors in each LAD. - Model-based estimates have been accredited as
National Statistics and now published quarterly
in Labour Market statistics releases. - (http//www.statistics.gov.uk/StatBase/Product.asp
?vlnk14160)
393. Developments in progress
- Labour Market area
- Consistent estimation of all three labour market
states - employed, not economically active,
unemployed - Currently Local Authority labour market
estimates are - Model-based estimates for unemployment
- Direct survey estimates for economically
inactivity and employment figures - Now developing a multivariate model to estimate
concurrently number of unemployed, employed and
economic inactive people by local authority
40 Compositional data
- The proportions of individuals classified in each
category are - Proportions bounded between 0 and 1 and subject
to a unity-sum constraint. - Multinomial Logistic model to relate labour
market probabilities with auxiliary data for all
categories is therefore defined with only 2
equations.
41Multinomial Logistic Model
42Multinomial Logistic Model
Then
43The Model
- Relates the probabilities of labour market states
to following predictors - age/sex group Geographical region and ONS area
classification - Benefit data claimant proportions (JSA) and
incapacity benefit - Other variables will be tested (e.g. income
support)
44Model-based estimate for all Labour Market States
- Model estimates a proportion for each labour
market state at each age/sex group - Final Estimator for a labour market state j for
area d is -
6 age-sex groups
All labour market states
45Development stage of multinomial model
- Current stage
- development of SAS programs to calculate
precision of the multinomial estimates based on
methodology proposed by Saei(2006) - Model selection and test of other covariates
- Model cross validation including several time
periods - Up to now
- Implementation of the multinomial model indicates
that plausible estimates can be obtained for all
labour market states when simultaneously modelled -
46 Developments in progress (cont.)
- Labour Market area
- Unemployment estimation at Parliamentary
constituency level - Non-nested geography but with certain matching
areas - Issue here is to ensure consistency with local
authority estimates at comparable areas - Model developed and estimates likely to become
available in the coming year
47Developments in progress (cont.)
- Income estimation
- Estimation at local authority level
- Clustered survey design entails a modification of
SAEP framework to cater - Currently in development
- Estimation of poverty proportion households
below threshold - Currently being developed for MSOA/local
authority level
484. Wider research activities
- In conjunction with academic partners
- Estimation of change over time
- Current work is confined to single point-in-time
estimation but users would like indication of
progress over time particular in relation to
funding - Estimation of poverty using M-quantile modelling
- Research using FRS data by Nikos Tzavidis
- Models incorporating spatial relationships
- Preliminary investigation of spatial relationship
in unemployment model in conjunction with Ayoub
Saei at Southampton University - Link with work at Imperial College by Nicky Best
and Virgilio Gomez-Rubio
495. Methodology Consultancy Service
- ONS is currently establishing a methodology
consultancy service - To undertake and support statistical work by
other government departments and public sector
organisations. - Resource for assessment/quality improvement
- Currently working with Health and Safety
Executive on small area estimation of incidence
of work related illness at local authority level.
50References
- Small Area Estimation Project Report. Model-Based
Small Area Estimation Series No.2, ONS, January
2003 - Developments in small area estimation in UK with
focus in current research. Clarke, P., Mcgrath
K., Chandra, H., Tzavidis, N. (2007). IASS
Satellite Meeting on Small Area Estimation, Pisa. - Model Based Estimates of Income for Middle Layer
Super Output Areas 2004/05 Technical Report, ONS,
September 2007 - http//neighbourgood.statistics.gov.uk/HTMLDo
cs/images/Technical Report 2004_05 v2 -
Final_tcm97-53513.pdf http//neighbourhood.statis
tics.gov.uk/dissemination/MetadataDownloadPDF.do?d
ownloadId21704 - Development of improved estimation methods for
local area unemployment levels and rates. Labour
Market Trends, vol. 111, no 1 - www.statistics.gov.uk/cci/article.asp?id372
- Summary publication accompanying the publication
of the 2003 unemployment estimates November 2004 - http//www.statistics.gov.uk/downloads/theme_labo
ur/ALALFS/AnnexA.pdf