Title: RTI ppt template
1Creating a Synthesized U.S. Agent Database for
Agent-Based Modeling ISDS Conference October 12,
2007 Bill Wheaton
RTI International is a trade name of Research
Triangle Institute
3040 Cornwallis Road P.O. Box 12194
Research Triangle Park, North Carolina, USA
27709
Phone 919-541-6158
e-mail wdw_at_rti.org
Fax 919-541-8830
2Acknowledgments
-
- This work funded under the Models of Infectious
Disease Agent Study (MIDAS) for the National
Institute of General Medical Sciences (NIGMS) - RTI wishes to thank Irene Eckstrand and the MIDAS
Steering Committee for funding and support. - Prior Research and Techniques
- Beckman, Richard J., Baggerly, Keith A., McKay,
Michael D., Creating Synthetic Baseline
Populations, Transportation Research, Vol. 30,
No. 6, pp. 415-429, 1996. - Norman, Paul, Putting Iterative Proportional
Fitting on the Researchers Desk. Working Paper
99/03, School of Geography, University of Leeds. - TranSims Transportation Analysis Simulation
System. - http//transims.tsasa.lanl.gov
3Microsimulation/Agent-based Models
- Microsimulation methodologies aim at building
large-scale data sets on the attributes of
individuals or householdsand at analyzing policy
impacts on these micro-units through the
simulation of economic, demographic and social
processes. - If we do not have a micro data base on
individuals and households then there is a
necessity to simulate one - -- Ballas, D., Clarke, G., Turton, I. Exploring
Microsimulation Methodologies for the Estimation
of Household Attributes, paper presented at the
4th International conference on GeoComputation,
Mary Washington College, VA., 25-28 July 1999. - Thus the Idea Produce a national,
geospatially-explicit synthetic population for
the United States.
4Micro (Individual) vs. Macro (Aggregate) Data
- Macro/Aggregate Data
- Census counts by geographic area
- State, County, Census Tract, Block Group
- Does not provide information on household
structure - Micro/Individual Data
- Individual or Household-level data
- Household structure maintained
5Creating a Synthetic Population Data Inputs and
Techniques
- Block-group Level Demographics
- SF3 (2000 decennial census)
- Public Use Microdata (PUMS)
- Actual Census long-form records (from U.S. Bureau
of the Census, 2000) - Household and individual level data
- Family structure maintained
- 5 Sample within Public Use Microdata Areas
(PUMAs) - PUMAs contain about 100,000 persons
- Household Locations
- Randomly generated w/in block groups
- Iterative Proportional Fitting (IPF)
- Uses conditional probabilities to fill out a
synthetic population that matches SF3 counts
based on PUMS microdata samples.
6Geographical Context
- Counties
- Census Tracts
- Block Groups
- Public Use Microdata Areas (PUMAs)
- Households
- Clone particular records of the 5 PUMS sample
(red outlines) to match census counts at block
group level (black outlines)
7Transims Population Generator
- Transims A transportation modeling package
developed at Los Alamos National Lab - Became the basis for EpiSims infectious disease
modeling software - Included development of code that uses IPF to
generate a synthetic population - Details in Beckman, Richard J., Creating
Synthetic Baseline Populations, Transportation
Research, Vol 30, No.6, pp 415-429, 1996
8IPF Attributes for MIDAS
- Works on HOUSEHOLD Attributes
- The MIDAS Synthetic Population Uses
- Persons
- Population lt 18
- Workers in Family
- Vehicles Available
- Household Income
- Other household attributes could be used in
future
9Example Household and Persons
Randomly Selected Synthetic Household
Household Attributes
Persons
10Results
- Households
- 105,480,101 generated vs. 104,926,825 in
census - X,Y locations
- Household attributes
- Persons
- 273,624,650 generated vs. 281,421,906 (Group
Quarters persons subsequently added) - Individual attributes (age, sex, etc.)
- Family Structures Maintained
- Closely Matches Census Counts
11Schools, Workplace Assignments
- Assign school-aged children to schools
- We have locations of schools by grade and
capacity for U.S. - Developed method of assigning school-aged
children in synthetic population to schools - Assignments are generated for particular
geographic area when needed - Workers assigned to workplaces based on STP64
commuting patterns
12Group Quarters
- Persons in Group Quarters accounted for 2.8 of
U.S. population in 2000 - Group Quarters
- Institutional
- Non-Institutional
- For synthesized agent database, created Group
Quarters locations (nursing homes, prisons,
military bases) and synthesized agents to occupy
them
13Limitations Gaps
- Ethnicity, Race and Other Personal
Characteristics Not Used - Group Quarters
- In the works, currently not included
- Spatial Locations Could be Enhanced
- Geographic Limitations
- Counties, Tracts, and BGs with few people are
less accurate
14Potential Usefulness to Syndromic Surveillance
- An aid in modeling/predicting effects of
epidemics identified by syndromic surveillance - Assign synthesized persons an infected code
based on demographic characteristics and
geographic location of actual patients (seeds) - Use those agents as the seeds in agent-based
modeling - Run different mitigation scenarios through the
model to predict different outcomes and to inform
policy decisions
Seeds
15Acknowledgments
- Production Team Bernadette Chasteen, Justine
Allpress, Michael Bacon, Jamie Cajka - Support Team Doug Roberts, Gana, Diglio Simoni,
Phil Cooley, Diane Wagener
16Conclusion