Managing Metadata for Statistical Models - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Managing Metadata for Statistical Models

Description:

OPUS project. Optimising the use of Partial information in Urban and regional Systems ... OPUS - objectives. Data Integration the Holy Grail ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 19
Provided by: andreww59
Category:

less

Transcript and Presenter's Notes

Title: Managing Metadata for Statistical Models


1
Managing Metadata for Statistical Models
  • Andrew Westlake
  • Survey Statistical Computing
  • Imperial College London
  • WWW.SASC.CO.UK

2
Introduction
  • OPUS project
  • Optimising the use of Partial information in
    Urban and regional Systems
  • Strong Transport bias in partners, but aims are
    generic
  • Data Integration through Statistical Models
  • Describe main concepts
  • Meta-data Management
  • Use existing ideas (cf MetaNet)
  • Extensions into representation of Statistical
    Models
  • Structure and Concepts
  • Including construction process
  • Support for users of results from the models
  • Provide information about Provenance and
    Reliability of results
  • Explore tools to deliver this information
    effectively

3
OPUS - objectives
  • Data Integration the Holy Grail
  • Can we bring together information from multiple
    datasets
  • Can we combine what we already know with evidence
    from new datasets
  • In ways that are
  • Coherent
  • Formalised
  • Transparent
  • Central role of the Statistical Model
  • Model Knowledgei Evidencei ? Model
    Knowledgei1

4
Examples of Multiple Data Sources
  • UK Crime Statistics
  • Household Surveys British Crime Survey
  • Police Statistics Reported Crime
  • Different selection mechanisms for who responds
    and what is reported
  • Transport for London
  • Household Surveys (LATS)
  • On-Board surveys (RODS, BODS)
  • Road-side counting and interviews
  • Automatic sensors ticket gates, road loops
  • In-car tracking
  • Census
  • Different sources give different partial but
    overlapping views of the same underlying system

5
Model-centred approach
  • Formulate Statistical Model using domain
    understanding
  • Yields Likelihood for data observations
  • Optimise Likelihood to reduce Uncertainty about
    model Parameters

6
Information about Statistical Models
  • Provenance and Reliability of Results from Models
  • Alongside substantive results/estimates
  • Taken from meta-data
  • Why is it needed?
  • Because model results depend on the model and the
    fit
  • Aimed at subsequent users of results
  • Provide confidence and understanding
  • Different needs for different user skills
  • Storage
  • Structure, implemented as XML Documents
  • Presentation
  • Guided or ad hoc exploration
  • Specialised requirements in some domains

7
OPUS Meta-data Components
  • Multiple Statistical Models of any system
  • Focus on different sub-systems
  • Different levels of abstraction
  • Functional Form of Model Specification
  • Variables, Parameters
  • Derivations, constraints and stochastic
    relationships
  • Fitting Steps
  • Links to datasets how are Data variables linked
    to Model ones
  • Methods used and outcomes
  • Knowledge States
  • Knowledge (uncertainty distributions) for
    Parameters
  • Each Fit produces a new State

8
Modelling in Opus
System under study
Dissemination System
Multiple Models
SpecificationsProcesses
StatModel
Capture
Presentation
Knowledge
Many Users
Many Modellers
9
StatModel Design
  • UML model of components and structure
  • Use hyperModel from XMLmodeling.com
  • Includes Profile for XML Schema
  • Generate XML Schemas
  • Control entry in XML editors, e.g. XML-Spy, for
    model instances
  • Basis for presentation design
  • XSL/T for fixed presentation
  • Generic for whole models
  • Specific for special contexts
  • Web components for dynamic presentation
  • E.g. Model exploration through Influence Diagram
  • C.f. Statistical Presentations in Nesstar

10
Main components of Model
11
Model Fit
  • Creates a State
  • May be based on a State
  • Use of Data
  • Links to Datasets
  • Mapping for Variables

12
Model State
  • A Knowledge Dist for each Parameter not defined
    through a relationship
  • Bayesian Posterior knowledge from MCMC is always
    an Empirical Distribution
  • Dependencies possible through multivariate
    distributions

13
Creating StatModel Instances
  • Not main focus of project
  • Focus on structure and presentation
  • Mostly by hand
  • Use XML-Spy
  • Formulator for MathML expressions
  • Template files
  • XML Forms applications for some parts
  • E.g. Altova Authentic, MS InfoPath
  • Done by expert in discussion with modeller
  • Ideally
  • Statistical Model Design application creates
    Model Specification
  • Fitting applications read specification and
    create Fit and State meta-data

14
How? Metadata Capture
Modeller
StatModelUML
StatModelXMLDocuments
ValidatingXML Editoreg XML Spy
Create
Structure
Rules
StatModelXSD schema
Semantics
SpecialisedXML Editoreg Authentic
View/Edit
Rules and Layout
Export
Style sheet
ModellingApplicationeg WinBUGSDesign,
Fit,Results
Import
OtherApplicationeg MLWin
Concepts, Semantics and Structure
Exchange
15
Presentation Examples
  • London 2
  • Stylesheet listings, mathematics
  • Model Diagram, Process diagram
  • WP08
  • Model Sequence
  • WP11
  • Understanding complex WinBUGS
  • Show Doodle and Script in WinBUGS
  • Documentation in StatModel
  • Influence Diagram in StatModel

16
Conclusions
  • Propose structure for storing information about
    Statistical Models
  • Seems to work well for us
  • Refinement and application outside Opus needed
  • For end users, so must address presentation
  • Some basic tools demonstrated
  • Specialised solutions for application domain
    usually needed
  • Meta-data capture is difficult issue
  • Integration into modelling applications
  • Encourage modellers to document and explain their
    choices
  • Much still to do
  • www.opus-project.org www.sasc.co.uk

17
Acknowledgements
  • Rajesh Krishnan, Imperial College London
  • Implementation of the web application
  • Miles Logie, Minnerva
  • Saikumar Chalisani, ETH Zürich
  • Contribution to initial ideas about StatModel
  • Software Used
  • hyperModel XMLModeling.com, David Carlson
  • UML modelling for XML Schema
  • Formulator - www.hermitech.ic.zt.ua
  • MathML editor, integrates with XML Spy
  • XML Spy www.altova.com
  • XML Editor and associated applications
  • JGraph - www.jgraph.com
  • Java Graph Visualization and Layout

18
End
Write a Comment
User Comments (0)
About PowerShow.com