Title: The SDMX Registry Model
1The SDMX Registry Model
- April 2, 2009
- Arofan Gregory
- Open Data Foundation
2Background
- SDMX provides a number of standards and
guidelines which support the standard exchange of
statistics - Standard models/XML/EDIFACT formats for data
- Standard models/XML formats for metadata
- Standard architecture based on a set of registry
services - Guidelines for the use of standard statistical
concepts across domain boundaries - Framework for establishing domain standards
within each statistical domain
3SDMX Registries
- This talk focuses on the SDMX Registry Services
- These are key to fully automating statistical
discovery and exchange - They are the primary means of enhancing
visibility and discovery of data and metadata
within statistical communities - They are designed to provide a connection point
between SDMX and other related standards
4Existing Problems
- Duplication of effort
- There is a lot of duplicative work within
statistics, because there is little awareness of
other data collection within specific areas - This is wasteful
- Even with a large amount of public statistical
data available on the Internet - It is difficult to find good data with good
metadata - This impacts end-users (researchers, students,
journalists) more than policy makers with
dedicated access to the data - Using existing data can be difficult
- Too many formats too much emphasis on Web-site
presentation (as opposed to download) - Too little metadata for existing data sets
- Difficult or impossible to combine data from
different sources - Access to data sources is difficult or impossible
(not even the documentation is accessible) - Understanding concepts and definitions can be
challenging this impacts comparability of data
5The Case for Infrastructure Support
- New standards allow for broader visibility and
re-use of data and metadata - Produces greater transparency
- Produces higher quality and efficiency in data
access through automation - Domains cannot be governed by individual
organizations - The mission of most organizations is too narrow
(even international ones) - This is the role of governments, supra-national
initiatives, and public-private consortia - Most public data is paid for by the taxpayers
- But they are the least-well served for their
investment
6Emerging Solutions
- Web-services technology can deal with many of the
generic problems inherent in distributing data
sources and applications around the Internet - Standards such as DDI, SDMX, and ISO/IEC 11179
provide specific models and formats for use
within the domains of statistics and research - SDMX provides a powerful registry model for
establishing a research infrastructure - Designed to integrate with/support use of many
other related standards (DDI, ISO 11179, METS,
XBRL, etc.) - SDMX registry tools are available free and as
open source today
7How do the SDMX Registry Services Work?
- An SDMX Registry (that is, an implementation of
the standard registry services) provides a number
of things to applications - A repository of metadata about the structures and
concepts of data and metadata sets - A repository of information about who provides
what data and metadata to whom - Helps to manage data across a broad network
- A registry of available data and metadata sets in
standard formats - Lists all information to find and use standard
data and metadata throughout a community network
8SDMX Registry/Repository
SDMX Registry Interfaces
Register
Indexes data and metadata
REGISTRY Data Set/Metadata Set
Query
Submit
Describes data and metadata sources and reporting
processes
REPOSITORY Provisioning Metadata
Query
Submit
REPOSITORY Structural Metadata
Describes data and metadata structures
Query
9SDMX Registry/Repository
SDMX Registry Interfaces
Register
Indexes data and metadata
REGISTRY Data Set/Metadata Set
Query
Subscription/Notification Applications can
subscribe to notification of new or changed
objects
Submit
REPOSITORY Provisioning Metadata
Query
Submit
REPOSITORY Structural Metadata
Describes data and metadata structures
Query
10Deploying SDMX Registry Services Within Domains
- It is anticipated that each organization leading
a statistical domain will deploy a set of
registry services to support exchanges within
that domain - This is also possible within national statistical
systems and individual organizations - It is possible to have generic, public
registries as well - This model has not been widely explored
- SDMX-type registries within research domains also
make sense - To supplement existing data archives and RDCs
- Lowers the cost of development of research
infrastructure significantly - Huge increase in visibility of and access to data
and sourcing information
11The Old JEDH (Joint External Debt Hub) Site
BIS
WEBSITE
IMF
OECD
World Bank
(Various Formats)
(3-month production cycle)
12JEDH with SDMX
Retrieves data from sites
BIS
SDMX Agent
SDMX-ML
SDMX-ML Loaded into JEDH DB
Info about data is registered
IMF
SDMX-ML
Discover data and URLs
SDMX Registry
OECD
SDMX-ML
Data provided in real time to site
World Bank
SDMX-ML
JEDH Site
SDMX-ML
(Debtor database)
13SDMX in Action Prototype System
FAO SDMX Registry
2
3a
National Publication Server(s)
Regional Publication Server
3b
Flow of FAO CountrySTAT- RegionSTAT Implementation
4
1
RegionSTAT
CountrySTAT
Slide courtesy of the FAO
14Prototype System Explanation
1
- CountryStat National Publication Server
- The web site is published from the files in
CountryStat - SDMX Publication
- The new CountryStat files are converted to
SDMX-ML data sets and made web accessible on the
CountryStat web site - These files are registered in the FAO SDMX
Registry - RegionStat Regional Publication Server
- Queries the registry for new registrations which
responds with registration details including the
URL of the new data sets - Retrieves the new data sets from the CountryStat
web site - Converts the SDMX-ML files to an internal format
and integrates the new data sets with existing
RegionStat data sets - Re-publishes the RegionStat web site
2
3a
3b
4
Slide courtesy of the FAO
15Federation of SDMX Registries
- SDMX uses a selective approach to replication of
resources found inside domain SDMX registries - Each domain registry can become a recognized user
in other domain registries - Subscription/notification can drive real-time
replication of registry metadata around the
network - With a coordinated hub registry, a more formal
registry network could be established - This would require no extension to existing
technologies - This would require a major feat of organization
(!) - This is a very light federation mechanism
- Other, more intensive schemes have failed in
other technology domains (UDDI, etc.)
16SDMX Registries and Other Standards
- The SDMX Registry Services are designed to
support related standards - SDMX reference metadata reports can provide
links to metadata and data in other standard
formats - Allows for indexing of needed metadata fields
from other standards within the SDMX registry
natively - Can provide access to native non-SDMX formatted
XML resources (DDI, Dublin Core, METS, XBRL,
etc.) - Benefits include
- Clarifying data and metadata ownership issues
- Making sourcing transparent by linking aggregates
to source data/metadata - Provide capabilities which are typically not
available today to support comparison
(integration with ISO/IEC 11179 metadata
registries for dealing with terminology issues,
etc.)
17Clarification
- Not all registries are the same
- UDDI and ebXML registries are much more generic
in purpose, and compatible with SDMX - ISO/IEC Metadata Registries are not mechanistic
web-services registries - They are specialized repositories of metadata
around semantics, concepts and terminology - These are compatible with, not duplicative of,
SDMX registry technology - ISO/IEC 11179 could be implemented as an SDMX
registry (!)
18ODaF Vision - Standards
Federated Registries (Based on SDMX, ebXML, web
services)
ISO 11179
Semantic definitions
Aggregated Data/Metadata (SDMX)
registered
Organized using
References to source data
METS Packaging
XBRL Business Reports
DDI Microdata Sets
Standard classifications
Dublin Core Citations
Used in
ISO 19115 Geographies