MoSeS Starts for the Promised Land - PowerPoint PPT Presentation

About This Presentation
Title:

MoSeS Starts for the Promised Land

Description:

Belinda Wu is working on the applications beginning with a Toy Model for Leeds ... 3 cells. Alternative Futures ASAP Research Cluster Seminar 16th November 2005 ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: andy207
Category:
Tags: moses | land | promised | starts | story | toy

less

Transcript and Presenter's Notes

Title: MoSeS Starts for the Promised Land


1
MoSeS Starts for the Promised Land
  • Andy Turner
  • Outline
  • Introduction
  • Population Modelling Progress
  • Next Steps
  • Feedback

2
Introduction
3
  • A religious story?
  • Lost in the Desert?
  • Our heading?
  • The Promised Land
  • SIM-UK
  • GeoSIM

4
Modelling and Simulation for e-Social Science
(MoSeS)
  • Mark Birkin, Martin Clarke, Phil Rees, Andy
    Turner, Belinda Wu
  • (School of Geography)
  • Haibo Chen
  • (Institute for Transport Studies)
  • Justin Keen
  • (Institute for Health Sciences)
  • John Hodrien, Paul Townend, Jie Xu
  • (School of Computing)

5
MoSeS is a Node of the National Centre for
e-Social Science
  • http//www.ncess.ac.uk
  • NCeSS aims to investigate, promote and support
    the use of eScience in social science research

6
eScience
  • Based on Grid Computing and collaboration
  • What is Grid Computing?
  • Many definitions
  • A move towards ubiquitous computing
  • A service/protocol for sharing Information
    Technology (IT) resource over the Internet
  • Computer scientists are building the next
    generation of computational infrastructure
  • The Grid intends to make access to computing
    power, scientific data repositories and
    experimental facilities as easy as the Web makes
    access to information. (Tony Blair, 2002)

7
eScience
  • Grid Computing Environments and The Grid
  • Enhance capabilities for IT resource sharing for
    research
  • Is about providing easy and secure access to
    massive computational resources, software and
    data promoting collaborative working of virtual
    organisations
  • e-Social Science is eScience targeted and geared
    for applications more specific to social science
    including a major part of geography

8
MoSeS Aims and Objectives
  • Raise awareness of eScience and eResearch
  • Develop practical geographical e-Social Science
    applications demonstrating the potential of Grid
    Computing
  • Model the UK human population at individual and
    higher organisational levels
  • households, communities, regions
  • disparate and/or geographically diffuse
    organisations and society
  • service orientated government
  • Develop and package a suit of modelling tools
    which allows specific research and policy
    questions to be addressed with demonstrator
    applications for
  • Health
  • Business
  • Transport

9
MoSeS Initial Tasks
  • Develop methods to generate individual human
    population data for the UK from 2001 UK human
    population census data
  • Develop a Toy Model
  • Dynamic agent based microsimulation modelling
    toolkit and apply it to simulate change in the UK
  • Develop applications for
  • Health
  • Business
  • Transport

10
MoSeS Challenges
  • Grid enabling the data and tools
  • Visualisation
  • Google Earth
  • Computer Games
  • Collaboration
  • Retaining a problem focus
  • Design and Development

11
MoSeS Current Parallel Developments
  • Belinda Wu is working on the applications
    beginning with a Toy Model for Leeds
  • Paul Townend is working on Grid Enabling
  • Andy Turner is focussing on the population
    modelling
  • The MoSeS team are meeting regularly and plan a
    launch some time next year when we hope to have
    something impressive to show off to NCeSS
    colleagues and invited guests from the eScience
    community, government and business

12
MoSeS Human Population Model
  • Current focus on the contemporary situation
    looking forwards over the next 25 years
  • Primarily data wanted for individuals grouped
    into households
  • Need to develop a method to synthesise and enrich
    data since available census and social survey
    data is not sufficient in coverage and detail
  • A method was outlined in the proposal
  • This is being implemented and results are being
    tested

13
Population Modelling Method
  • To select a fitting set of individual records
    from the 2001 UK Population Census 3 Individual
    Sample of Anonymised Records (ISAR) to represent
    the individuals for regions given by 2001 UK
    Population Census Area Statistics (CAS)
  • Initial focus is for regions called Output Areas
  • Smallest Census Output Areas
  • Typically about 300 people, 100 households
  • Begin with Leeds and scale up to the UK

14
Combination
  • Given the population (p) of an Output Area we
    want to select a sub-sample of this size from the
    n 1843525 records in the ISAR
  • The general formula for finding the number of
    permutations of size p taken from n objects
    npPermutations is
  • Approximately np

15
Computation
  • Number of potential solutions too great to find
    the best fitting solution by a brute force
    search?
  • Probably, yes, even using all the computational
    power of The Grid
  • Interestingly the number of potential solutions
    is even greater for larger regions than Output
    Areas (although there are less of them)
  • Fortunately we are only interested in specific
    types of solution and can constrain our search
  • For some criteria hard constraints are
    appropriate and for other variables optimisation
    is the key within these constraints

16
Constraints
  • What can we constrain to?
  • There are limits
  • The more detailed the constraint criteria the
    less likely it can be met
  • The ISAR is only a 3 sample
  • Specific CAS tabulations
  • The aggregations of variables are bespoke
  • Beware of errors especially systematically
    introduced disclosure control measures
  • Census data are estimates and contain unknown
    level of error
  • What is most important to ensure is right?
  • Age/Gender profile
  • Number of Household Reference People
  • Household Composition
  • Social Class
  • Health status etc

17
Getting to Grips with ISAR and CAS data
  • 2001 UK Census data is unusual (like most census
    data)
  • Details are lost by aggregation and accuracy is
    deliberately worsened via the application of
    disclosure control measures
  • This is done for confidentiality reasons and as
    users we are forced to appreciate this
  • On the one hand this generates jobs, on the other
    hand, it renders census data almost useless for
    supporting certain applications
  • Details on UK Census data including ISAR and CAS
    are available via
  • http//www.statistics.gov.uk/census/
  • Usefully 2001 CAS tables that do not currently
    exist can be commissioned
  • There is an application procedure for gaining
    access to Controlled Access Microdata Sample
    (CAMS) records from the 2001 Census
  • The data is supposedly better
  • It will be hard for us to use due to the way it
    is controlled

18
CAS
  • Themed Tables
  • 6 cross tabulations
  • E.g. CT001
  • Theme Table On All Dependent Children
  • 348 cells
  • Univariate Tables
  • 43 tabulations
  • E.g. UV003
  • Sex
  • 3 cells
  • Key Statistics Tables
  • 31 tabulations
  • E.G KS001
  • Usually Resident Population
  • 6 cells
  • Standard offerings
  • 53 cross tabulations
  • E.g. CS001
  • Age/Sex/Resident Type
  • 250 cells

19
Constraint and Optimisation using Key Statistics
  • As a first step we have constrained by age and
    ensured that we have the correct number of
    household reference people
  • Makes it easier to construct households for Toy
    Model
  • Our fitness function is a simple Sum of Squared
    Errors (SSE) for a number of aggregate variables
  • Measure of the difference between aggregate
    counts from the ISAR records and the published
    and aggregated CAS Key Statistics
  • Initial focus on health and household composition

20
Optimisation Variables
  • Health variables
  • peopleWhoseGeneralHealthWasGood
  • peopleWhoseGeneralHealthWasFairlyGood
  • peopleWhoseGeneralHealthWasNotGood
  • peopleWithLimitingLongTermIllness
  • peopleWithoutLimitingLongTermIllness (Derived)
  • Houshold Composition variables
  • oneFamilyAndNoChildren (Derived)
  • marriedOrCohabitingCoupleWithChildren (Derived)
  • loneParentHouseholdsWithChildren (Derived)
  • (Derived) means calculated from other variables

21
Optimisation and Goodness of Fit
  • Initially for each Output Area in Leeds we
    generated 10000 possibly different solutions and
    picked the best one
  • Now we are using a genetic algorithm to assist in
    finding a better solution
  • More strategic
  • Constraints form genes
  • Effectively each genetic bit string is an ordered
    boolean array for the ISAR
  • AGE0 and HRP order
  • Currently genetic algorithm works by breeding and
    mutation and survival of the fittest

22
Next Steps 1
  • Constraints
  • Additional constraint by gender
  • Should improve household formation
  • Need to use Standard CAS cross tabulations
  • Problems due to confidentiality
  • Perhaps need to consider larger regions than
    Output Areas
  • Beginning investigating what other constraints
    are possible
  • Leeds
  • UK
  • Identify problem Output Areas
  • Optimisation
  • Use more optimisation variables
  • Experiment with the genetic algortihm

23
Next Steps 2
  • Testing
  • Examine results
  • Mapping
  • Optimised variables
  • Exogenous variables
  • Grid Enabling
  • Data
  • Provenance
  • Toy Model
  • Publication

24
MoSeS Recap
  • We are developing a dynamic geographic
    microsimulation of the UK
  • A model comprising of individual people that
    occupy the UK environment and move about it
    through time interacting in numerous ways
  • Each individual will have family, household and
    social networks and reasonably complex
    characteristics and behaviour
  • The idea is to build a platform for simulating
    change in the UK for ASAP

25
Thank you !
  • Any feedback or questions?
  • Please email
  • A.G.D.Turner_at_leeds.ac.uk
  • http//www.ncess.ac.uk

26
Acknowledgements
  • Thanks to all involved in the production of the
    maps that I grabbed off the internet for the
    start of this presentation
Write a Comment
User Comments (0)
About PowerShow.com