Philip Clarke and Denise Silva - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Philip Clarke and Denise Silva

Description:

'Official statistics provide an indispensable element in the information system ... Sample surveys are used to provide estimates for target parameters on population ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 36
Provided by: cla104
Category:

less

Transcript and Presenter's Notes

Title: Philip Clarke and Denise Silva


1
Development of Small Area Estimation at ONS
  • Philip Clarke and Denise Silva

2
Outline
  • Small Area Estimation Problem
  • History and current provision
  • Development in progress
  • Wider research
  • Consultancy service

3
1. Small Area Estimation Problem
  • Official statistics provide an indispensable
    element in the information system of a democratic
    society (Fundamental Principles of Official
    Statistics, UNSD )
  • Sample surveys are used to provide estimates for
    target parameters on population (or National)
    level and also for subpopulations or domains of
    study
  • However implementation in a Small Area Context is
    challenging

4
Small Area Estimation Problem
  • In small areas/domains sample sizes are usually
    not large enough to provide reliable estimates
    using classical design based methods.
  • Small area estimation problem refers to SMALL
    SAMPLE SIZES (or no sample in the domain or area
    of interest)
  • In sample surveys an area or domain is regarded
    as small if the sample within the area is not
    sufficiently large enough to provide direct
    survey estimates of adequate precision

5
2. History
  • Small Area Estimation in UK begun as research
    project in late 1990s.
  • In response to calls for locally focussed
    information in many different areas
  • Environmental
  • Business
  • Social, e.g. health, housing, deprivation,
    unemployment.
  • Also calls for more general domain estimation
  • e.g. cross classifications by age/sex,
    occupation.
  • Initial experimental studies on mental health
    estimation for DoH.

6
Developing alternative methodology
  • Purpose
  • To enable production of reliable estimates of
    characteristics of interest for small areas or
    domains based on very small or no sample.
  • To asses the quality (precision) of estimates.
  • Several years of research and development (since
    1995)
  • UK (ONS, UoS , IoE) Finland ( Statistics
    Finland , UoJ)
  • The EURAREA project
  • Research programme funded by Eurostat to enhance
    techniques to meet European needs (from
    2001-2004)

7
Basis of Approach Relax the Survey Restriction
  • Borrow strength by removing the isolation of
    depending solely on the survey and solely on
    respondents in a given area.
  • Widen the class of respondents for a given area
    by pooling
  • together similar areas.
  • e.g. by Region - consider all respondents within
    a region when estimating for any of its areas.
  • Widen the class of respondents by taking past
    period respondents into account.
  • Take advantage of other related data sources
    which are not sample survey based. Known as
    auxiliary data.
  • e.g. Administrative data or census data which
    are available for all areas/domains.

8
Model based estimation
  • All approaches detailed are based on an implicit
    or explicit model.
  • The auxiliary data and use of data from all areas
    is the approach currently adopted.
  • Borrows strength nationally.
  • Uses an explicit statistical model to represent
    the relationship between the survey variable of
    interest and auxiliary data.
  • Dependent variable is survey variable of
    interest.
  • Independent variables are certain auxiliary data
    variables known as covariates.
  • Model fitted using sample data and assumed apply
    generally.
  • Model then used in the obtaining of area/domain
    estimates.

9
Outline of a model structure
  • Suppose variable of interest, Y, in an area j is
    linearly related to a variable X which is
    available from a non-survey source in
    approximately the following manner
  • This is a deterministic structure, so we need to
    add some random variability

10
  • Obtain
  • uj represent random area differences from the
    deterministic value.
  • represents variability between areas.

11
Model fitting
  • Fit the model using direct survey estimates for
    each area.
  • This introduces additional sampling variability.
  • Unit level sampling variability
  • giving rise to additional area level sampling
    variability

12
Estimating from the model
  • Once the model is fitted, estimate for area j by
    using parameter estimates
  • Estimate of mean squared error given by
  • Modelling success measured by obtaining estimates
    with high precision based on low mean squared
    errors.

13
Current provision
  • SAEP a generic methodology for application to
    variables from household based surveys.
  • Mean household income based on Family Resources
    Survey published as Experimental Statistics for
    wards in 1998/99, 2001/02 and for middle layer
    super output areas 2004/05
  • Specialised methodology for labour market
    estimation of unemployment from Labour Force
    Survey.
  • Unemployment levels and rates routinely published
    quarterly as National Statistics for Local
    Authority Districts in Great Britain.

14
SAEP methodology and income estimation
  • SAEP methodology is -
  • derived from outlined model-based approach,
  • BUT is
  • based on a unit (household)/area multilevel
    model
  • borrows strength across areas using multivariate
    area level auxiliary data (covariates)
  • can model transformation of variable of interest
    if required
  • adapted for estimating at ward/middle layer super
    output area (MSOA) from customary ONS clustered
    design household sample surveys
  • estimator is synthetic depends only on fixed
    effect model terms.

15
Application to income estimation- Respondent
Variable
  • Income value for each household sampled in Family
    Resources Survey (FRS).
  • 3,000 wards in England and Wales with sample in
    2001/02,
  • 23,000 total responding households.
  • But not a random sample.
  • Clustered design with primary sampling units as
    postcode sectors,
  • 1,500 sampled postcode sectors.

16
Coping with design clustering
  • Samples are random samples of postcode sectors
  • So random terms are around postcode sectors,
    indexed by j
  • Estimation is required for geographically
    distinct wards or middle layer super output
    areas
  • So covariates are for these areas, indexed by d
  • For estimation, covariates must be known for all
    areas not just sampled areas.

17
SAEP model and estimator structure for income
estimation
  • Multilevel structure gives rise to unit level
    random term replacing area sampling
    variability
  • Logarithmic transformation of income taken
    because of positive skewness of income
    distribution
  • Model
  • Estimator

18
SAEP model fitting procedure
  • Create a dataset containing
  • Variable of interest from individual household
    responses to survey.
  • values of a large number of administrative and
    census variables for the particular household
    area of residence which we believe could impact
    on variable of interest, eg census variables, DWP
    social benefit claimant rates, council tax band
    proportions

19
SAEP model fitting procedure (cont.)
  • Starting with a null model, fit covariates in a
    stepwise manner in order of significance by using
    specialised multilevel software eg. MLwiN or
    SAS PROC MIXED.
  • In this way select a set of significant
    covariates and fit an accepted model.
  • Use diagnostic techniques to investigate model
    against assumptions eg. Randomness of residuals,
    unbiasedness of predictions.

20
Converting to raw income scale
  • Need to make allowance for
  • mean(log) log(mean)
  • Area estimate
  • Confidence interval

21
Actual model for ward estimation of income in
2001/02
pmanprof proportion of people professional and
managerial phhtyp7 proportion of two person h/h
with non-dependent children pjsa proportion of
people claiming job seekers allowance enggh
proportion of council tax band GH dwellings for
England walegh proportion of council tax band
GH dwellings for Wales
22
Income estimation outputs
  • Estimates obtained of sufficient precision for
    publication and acceptable to user community.
  • Placed on Neighbourhood Statistics website
    together with user guides and technical
    documentation.
  • Accredited as Experimental Statistics

23
Estimation of unemployment at local authority
level
  • BACKGROUND
  • Unemployment is a key indicator and is used for
    policy making and resource allocation
  • Official UK measure of unemployment follows the
    International Labour Organisation Definition
    (ILO)
  • ILO unemployment is estimated via the Labour
    Force Survey (national level)
  • Small (locally) sample sizes in the LFS

24
Features of Labour Force Survey
  • A rolling longitudinal survey
  • Roughly 60,000 households surveyed each quarter
  • Each household remains in sample scope for 5
    quarters, known as waves 1 to 5
  • Then household drops out
  • Waves 1 and 5 respondents for last four quarters
    used to obtain an annual local labour force
    survey dataset of about 90,000 independent
    households.
  • Unclustered survey design giving a sample in
    each LAD.

25
Features of unemployment modelling
  • Unclustered LFS design means
  • direct estimates available for each LAD though
    small sample sizes mean they have low precision
    hence a need for better precision model-based
    estimates
  • no separate sampling and estimation areas as for
    SAEP modelling with clustered surveys
  • Availability of estimated random area terms in
    LAD estimation.
  • Availability of a highly correlated covariate
    number of claimants of unemployment benefit/job
    seekers allowance
  • Eliminates need for model fitting to a range of
    possible covariates on each occasion.

26
Specification of model
  • Modelling is by six age/sex groups in each LAD.
  • age groups 16 to 24 25 to 49 50 and over
  • Survey data variable
  • proportion of responders in each age/sex group in
    each LAD who are unemployed. (logit transformed)
  • Covariate data
  • Proportion of people claiming job seekers
    allowance in particular age/sex group in LAD
    (logit transformed)
  • Proportion of people claiming job seekers
    allowance in LAD as whole. (logit transformed)
  • dummy variables
  • Indicator of sex/age group
  • Indicator of Government Office Regions for LAD.
  • Indicator of Supergroup of National Statistics
    Area Classification for LAD.

27
  • is the LFS proportion of individual
    respondents who were unemployed in age/sex group
    i and LAD d
  • is the vector of area by age/sex class
    level covariates
  • Random term specified at LAD level, d

28
Estimator from model
  • The model-based estimator of proportion
    unemployed in each age/sex group of each LAD is
    then given after fitting model by
  • Note the use of the term in the estimator
    as it is now available for each LAD.

29
Determination of LAD unemployment level
  • Model has estimated a proportion at each age/sex
    group.
  • This is converted into an estimate of
    unemployment level at each LAD by
  • multiplying each proportion estimate by the LFS
    estimate of population unsampled
  • adding those sampled and found unemployed
  • summing the age/sex group estimates

30
Determination of LAD unemployment rate
  • The estimation of unemployment rate is obtained
    by dividing the estimate of level by the sum of
    the direct estimate of unemployment and the
    model-based estimate of unemployment

31
Unemployment estimates publication
  • The standard errors of the model based estimates
    found to be smaller than the corresponding direct
    standard errors in each LAD.
  • Model-based estimates have been accredited as
    National Statistics and now published quarterly
    in Labour Market statistics releases.

32
3. Developments in progress
  • Labour Market area
  • Unemployment estimation at Parliamentary
    constituency level
  • Issue here is to ensure consistency with LAD
    estimates at comparable areas
  • Model developed and estimates likely to become
    available in the coming year
  • Consistent estimation of all three labour market
    states - employed, not economically active,
    unemployed
  • Multivariate model currently in development

33
  • Income estimation
  • Estimation at local authority level
  • Clustered survey design entails a modification of
    SAEP framework to cater
  • Currently in development
  • Estimation of poverty/ households below threshold
  • Currently being developed for local authority
    level

34
4. Wider research activities
  • In conjunction with academic partners
  • Estimation of change over time
  • Current work is confined to single point-in-time
    estimation but users would like indication of
    progress over time particular in relation to
    funding
  • Estimation of poverty using M-quantile modelling
  • Research using FRS data by Nikos Tzavidis
  • Models incorporating spatial relationships
  • Preliminary investigation of spatial relationship
    in unemployment model in conjunction with Ayoub
    Saei at Southampton University
  • Link with work at Imperial College by Nicky Best
    and Virgilio Gomez-Rubio

35
5. Methodology Consultancy Service
  • ONS is currently establishing a methodology
    consultancy service
  • To undertake and support statistical work by
    other government departments and public sector
    organisations.
  • Resource for assessment/quality improvement
  • Currently working with Health and Safety
    Executive on small area estimation of incidence
    of work related illness at local authority level.
Write a Comment
User Comments (0)
About PowerShow.com