National Data Network - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

National Data Network

Description:

1.submit undertaking. 2.Undertaking lodged. 3. Undertaking accepted. 4. ... The user does not get a copy of the data but can submit programs against it: ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 30
Provided by: PCS89
Category:

less

Transcript and Presenter's Notes

Title: National Data Network


1
National Data Network
  • Qld Discussion
  • 29 March 2004

2
The problem
  • Researchers and Policy analysts want easy and
    unrestricted access to data.
  • Agencies increasingly want to share/integrate
    data with each other
  • Administrative data sources are under-utilised
    for research/policy development
  • Custodians cannot just give data away. They have
    legal obligations, customer expectations, and
    budget restrictions which must be met. So, they
    need to be able to expose data in a way that
    allows them to be confident that their
    obligations will be honoured.

3
The problem (cont.)
  • Data Sources are not well documented, this makes
    it difficult for users to know whether data is
    fit for purpose
  • Agencies lack infrastructure for managing data.
    They need user-friendly tools to manage meta
    data, design data collections and support data
    access.

4
The National Data Network
  • Shared facilities and protocols which are well
    understood by a large body of custodians and
    analysts/researchers.
  • Streamlined and consistent data access
    approaches
  • Trusted facilities for protecting secrecy and
    confidentiality
  • Will provide a range of services to both people
    and applications

5
The National Data Network
  • Data Owners will remain in control of their data.
  • Will exist as
  • a collection of network nodes. Each (major)
    custodian will operate their own node on which
    their own data is stored
  • A hub which maintains a catalog of sources and
    services (and network adminstration services)
  • Will service multiple sectors and jurisdictions
    health, transport, spatial, statistics

6
Data Network
data
data
Node
Node
Service
Catalog
www.nationaldatanetwork.org
Node
log
Services
Node
data
Service
Node
Service
7
Data Network Service Framework
Owner/ Custodian Services
Design
Data Capture
Process
Publish
Search
Acquire
Analyse
Report
Link/ Integrate
Researcher/ Analyst Services
Network Administration Services
Registration Audit Planning
8
Data Network Services
  • Auto coding
  • Assisted coding
  • Character recognition
  • Collection Control
  • Respondent Management
  • Confidentialise
  • Document
  • Expose
  • Define Access Policy
  • Archive
  • Acquire test data
  • Qualify for access
  • Extract
  • Subscribe
  • Document data source issues
  • Document Findings
  • Expose Findings

Design
Data Capture
Process
Publish
Search
Acquire
Analyse
Report
Link/ Integrate
  • Supervised Analysis
  • Analyse
  • Graph
  • Tabulate
  • Data Item Definitions
  • Classifications
  • Standard Questions
  • Sample selection
  • Form Design
  • Edit
  • Seasonal Adjustment
  • Estimation
  • Imputation
  • Aggregate
  • Search
  • Request
  • Match
  • Link

9
Getting Started..
  • Start with development of services at the
    interface between custodian and researcher

Publish
Search
Link/ Integrate
  • Develop base standards for documenting data
    sources and access rules

10
Getting Started..
  • Support 4 access classes
  • Unrestricted (user can freely take the data and
    do anything with it)
  • Approval Required (Researcher must apply for
    access and agree/meet conditions specified by the
    data owner/custodian. Example conditions
    payment, sign an undertaking..)
  • Remote Analysis Only (Researcher cannot acquire
    the data but, on giving required undertakings,
    can submit programs to analyse the data. Output
    subject to vetting by data owner)
  • Specification Only (Researcher cannot acquire
    the data but can submit specifications for
    interrogations/ tabulations. Scope of specs may
    be restricted

11
Getting Started..
  • Support 3 linkage models
  • User linkage (user can do their own linking
    using their own or Data Network facilities,
    typically applies where privacy is not an issue
    eg. aggregate data, consent has been given)
  • Blind (Identifying information from each
    dataset provided to an independent linking unit,
    link key passed back to each custodian,
    custodians supply datasets with link keys to
    researcher)
  • Trusted (applies when an agency is trusted to
    do the linking)
  • NDN should support confidentialisation of data
    so, where required, a linked dataset can be
    confidentialised before it is given to researcher)

12
How the Data Network will work
  • Governing Body endorses protocols, standards,
    acquires funds commissions work
  • Administering Body administers the network
    promotion, audit, registration, performance
    monitoring
  • Custodians document and expose data sources by
    registering them.
  • Registration includes documenting access rules
  • Service Providers can register and provide
    services (which comply with standards)
  • Researchers/ Analysts agree to comply with
    access rules and provide feedback

13
Data Network resource registration
Census BCPs
ABS
CURFs
RADL
ANZSIC Coder
Register sources and services
www.nationaldatanetwork.org
Centrelink
Data Definitions
14
Data Network - search
Census BCPs
ABS
CURFs
RADL
ANZSIC Coder
Catalog
www.nationaldatanetwork.org
Centrelink
Researcher
Data Definitions Data Sets
15
Data Network undertaking process
Census BCPs
ABS
CURFs
RADL
ANZSIC Coder
www.nationaldatanetwork.org
Catalog
Centrelink
Researcher
Data Definitions Data Sets
16
Data Network Access via download
Census BCPs
ABS
CURFs
RADL
ANZSIC Coder
Researcher
www.nationaldatanetwork.org
Catalog
Centrelink
Data Definitions Data Sets
17
Data Network Access via RADL
Census BCPs
ABS
CURFs
RADL
ANZSIC Coder
RADL session
Researcher
www.nationaldatanetwork.org
Catalog
Centrelink
Data Definitions Data Sets
18
Getting Started..
  • We have started by working with ARACY, CSIRO with
    meetings/ roundtables of major custodians,
    researchers, privacy commissioners and others.
    Now starting to engage more with States.
  • Need to progress on three fronts
  • - governance, protocols, priorities, resources
  • - data source development
  • - infrastructure development

19
Getting Started..
  • Governance, protocols, priorities
  • - Establish an Interim Governing Board (ABS,
    ARACY, AIHW, DOHA, User rep, State gov rep..)
    and a broader member network which can be
    consulted and kept informed
  • - agree on principles, priorities

20
Getting Started..
  • Data Sources
  • pick a couple of exemplar projects, identify
    range of relevant data sources and work with
    custodians to expose the data sources using the
    Data Network Infrastructure
  • Range of regional data sources (eg. IRDB,
    Healthwhiz?)
  • ABS Curfs (via RADL and download)

21
Getting Started..
  • Infrastructure
  • Form infrastructure development consortium (this
    has started with ABS, CSIRO, Geosciences Aust)
  • Develop demonstration version of NDN system

22
Data Source Documentation Standards
  • Need a rich metadata schema and an agreed
    minimum documentation standard
  • A plethora of partial solutions and standards
  • A single schema possible in theory but too
    cumbersome in practice?
  • Start with schemas for some common data object
    types? (eg. Time Series, Classification, Unit
    Record Dataset..)

23
NDN Services internal resource as well as
external
  • NDN software can be installed inside and outside
    of firewall. So, NDN services can be used
    privately as part of the agencies
    infrastructure for managing their own data
    holdings.
  • This may improve the value proposition for some
    custodians as well as make it easier to meet
    standards and maintain quality.

24
Open Source Software
  • Ideally, NDN software will be developed using
    Open Source code.
  • Benefits portability, reduce barriers to
    adoption (no license fees required), transparency
    (anyone can see source code), we can avoid
    starting from scratch by building on top of
    existing Open Source products (eg. Zope, Napster)

25
Aim to have demonstrable system in 2004
  • National Data Network Website established
  • Min 3 nodes established with reasonable range of
    data
  • Search service which can locate data on the nodes
  • Access services working (3 classes)
  • Linking Services working

26
Data Network demonstration version
  • Auto coding
  • Assisted coding
  • Character recognition
  • Collection Control
  • Respondent Management
  • Confidentialise
  • Document
  • Expose
  • Define Access Policy
  • Archive
  • Acquire test data
  • Qualify for access
  • Extract
  • Subscribe
  • Document data source issues
  • Document Findings
  • Expose Findings

Design
Data Capture
Process
Publish
Search
Acquire
Analyse
Report
Link/ Integrate
  • Supervised Analysis
  • Analyse
  • Graph
  • Tabulate
  • Data Item Definitions
  • Classifications
  • Standard Questions
  • Sample selection
  • Form Design
  • Edit
  • Seasonal Adjustment
  • Estimation
  • Imputation
  • Aggregate

RADL
  • Search
  • Request
  • Match
  • Link

27
Service description Assisted Coding
  • Consistent approach for coding according to
    standard classifications. For example, the
    standard Industry Classification is ANZSIC. The
    network will provide
  • A human interface a web page where you can
    key in an industry description (fishing trawler
    operation) and get back a code 0922
  • An application interface which allows an
    application to invoke a coding service which
    returns an ANZSIC code (or a set of codes)

28
Service description Remote Analysis
  • The user does not get a copy of the data but can
    submit programs against it
  • Programs written in SAS or SPSS
  • Programs subjected to automated and human vetting
  • Program outputs also checked to ensure that they
    do not disclose information which should be
    protected

29
Principles
  • Who can use the network?
  • What data will the network serve?
  • What conditions can custodians impose?
  • What conditions should the network impose?
  • How visible/open should the network be?
  • Who can be a node?
  • Architecture/design principles
Write a Comment
User Comments (0)
About PowerShow.com