Title: SDMX Information Model
1SDMX Information Model
- Pedagogical Explanation
- Arofan Gregory and Chris Nelson Metadata
Technology Ltd. - OECD SDMX Expert Group MeetingGeneva April 6-7
2006
2Data Set
3We have a dataset, what do we need to know?
- Its structure
- Who reports it
- How a specific data set fits into the overall
collection framework and which organisation is
responsible for reporting which parts - The reporting schedule
- That it has been reported
4Data Set Structure
5Data Set Structure
- Computers need structure of data
- Concepts and terms
- Code lists
- Data values
- How these fit together
6Structural Definitions
7Data Makes Sense
SA,B,1,1999-06-3016547
8Data Set Structure
- Comprises
- Concepts that identify the observation value
- Concepts that add additional metadata about the
observation value - Concept that is the observation value
- Any of these may be
- coded
- text
- date/time
- number
- etc.
9Data Set Structure
- Dimensions
- Attributes
- Measure
- Representation
10Data Set Structure
dimension
dimension
attribute
attribute
dimension
dimension
dimension
measure
11Data Structure Definition
Key
Group Key
Dimensions
Attributes
Measures
Representation
Concept
12Data Set Publishing/Reporting
- Publishing data sets and collecting data sets is
a process - As a process it must have metadata that enables
organisations to control it - what data is it
- who publishes it
- who collects it
- when is it published/reported
13Structure Definition
Data Flow
Data Set
can get data from multiple data providers
can provide data for many data flows using agreed
data structure
Provision Agreement
Data Provider
- The data flow is the artefact that contains
metadata about the provision of data
- In a data reporting scenario the data flow is
defined by the data collector, and there can be
many data providers reporting data for the data
flow
- A data provider may report data for many data
flows (perhaps for many organisations)
14Organising Data Flows
- Organisations may wish to categorise the data
flows - For convenience
- To facilitate control
- who reports what/when (release calendar)
- who has reported
- more about these later
- To facilitate search for data (more about this
later)
15Data Reporting
Data Structure Definition
CategoryScheme
comprises subject or reporting categories
uses specific data/metadata structure
can be linked to categories in multiple category
schemes
Data Flow
Category
Data Set
conforms to business rules of the data/metadata
flow
can have child categories
publishes/reports data sets
can get data from multiple data providers
can provide data for many data flows using agreed
data structure
Provision Agreement
Data Provider
Metadata
16We have metadata what do we need to know?
- What is the metadata for (what does it describe)
- Who reports it
- How a specific metadata set fits into the overall
collection framework and which organisation is
responsible for reporting which parts - The reporting schedule
- That it has been reported
17Metadata Controlling It
- What can be done for data can also be done for
metadata - Metadata has a structure
- Metadata is reported/published
- Metadata needs to be controlled
- Metadata needs to be found
- Metadata may need to be linked to data
18What Sort of Metadata?
- Data values are limited in where they belong
- Series key (usually qualified by time)
- Data attribute values are limited in where they
belong - Observation value
- Series key
- Group key
- Data set
- Metadata is not limited in this way
- Metadata is everywhere
- Can we learn from the data side how to describe
metadata structure definitions
19Metadata Structure Definition
- Concepts
- Hierarchies
- Representation (e.g. code list)
Provision Agreement
20Metadata Structure Definition
uses defined concepts
concept defined in
Metadata Report
Concept Scheme
Concept
takes semantic and context from
can have hierarchy
specifies to which object types the concept can
be attached
Partial Target Identifier
identifies the code list from which the value of
the (key) component must be taken when metadata
is reported
specifies the identifier components (key) of
the target object
identifies target object type of the component
Target Object Type
21Metadata Target
Data Flow
Provision Agreement
Data Provider
22ARC
Metadata_Concepts
Metadata Structure Definition
MetadataReport
Concept Scheme
Concept
Release Date
Release Status
Format and Permitted Value List
Id Provision_Agreement
Can be used to identify just the Data Provider or
just the Data Flow
Partial Target Identifier
Data Flow
Data Provider
Target Object Type
23Metadata Structure Definition Identifiers
Metadata Structure Definition ARC_DATA
Full Target Identifier
Provision_Agreement
Identifier Component
Target Object Type
Data Flow
Item Scheme
Identifier Component
Data Provider
Target Object Type
Item Scheme
24Metadata Structure Definition Metadata Report
ARC
Metadata Report
Attachment Provision_Agreement
Metadata Attribute
Release Date
Concept
Representation DateTime
Metadata Attribute
Release Status
Concept
Representation
25Metadata Reporting
Metadata Structure Definition
CategoryScheme
comprises subject or reporting categories
uses specific metadata structure
can be linked to categories in multiple category
schemes
Metadata Flow
Category
Metadata Set
conforms to business rules of the metadata flow
can have child categories
can get metadata from multiple metadata providers
publishes/reports metadata sets
Constraint
can have constraints sub set of possibilities
defined in the Structure Definition
Provision Agreement
can provide metadata for many metadata flows
using agreed metadata structure
Data Provider
26Information Model Summary So Far
- Supports data and metadata reporting and exchange
- Data and metadata structure definitions
- Data and metadata sets
- Supports the process of reporting and exchange
- Data/metadata providers
- Data/metadata flows
- Provision agreements
27Data/Metadata Reporting/Exchange
CategoryScheme
Structure Definition
comprises subject or reporting categories
uses specific data/metadata structure
can be linked to categories in multiple category
schemes
Data Set or Metadata Set
Data or Metadata Flow
Category
conforms to business rules of the data/metadata
flow
publishes/reports data sets or metadata sets
can have child categories
can get data/metadata from multiple data/metadata
providers
Constraint
can have constraints sub set of possibilities
defined in the Structure Definition
can provide data/metadata for many data/metadata
flows using agreed data/metadata structure
Provision Agreement
Data Provider
28Controlling Data and Metadata
- How do we control data and metadata reporting?
- How do we find data and metadata?
- How do we share data and metadata
29SDMX Registry
- The Registry supports many of the artefacts in
the Information Model - Hold indexes for data and metadata and where
these can be found on the web - Data and metadata set indexes
- Stores structure definitions
- Data and metadata structures
- Code lists
- Category schemes
- Data flows
- Stores provisioning metadata
- Data providers
- Provision agreements
- The Registry is used to store structural and
provisioning definitions, to register data sets
and metadata sets, and links between them - The Registry is a resource that can be queried by
applications to find data, metadata, and the
structural definitions supporting these - The Registry specification defines the behaviour
of an SDMX Registry and the Registry interfaces,
which are an XML schema specification - The Registry functions are modelled in the
Information Model, but its functionality is best
explained in the context of the schematic already
used for data and metadata (Data/Metadata
Reporting and Exchange)
30SDMX Registry/Repository
SDMX Registry Interfaces
Register
Indexes data and metadata
REGISTRY Data Set/Metadata Set
Query
Subscription/Notification
Submit
Describes data and metadata sources and reporting
processes
REPOSITORY Provisioning Metadata
Query
Submit
REPOSITORY Structural Metadata
Describes data and metadata structures
Query
31Data Set Registration
Structure Definition
- The data is registered against the provision
agreement - The Constraint holds the indexes such as the
series keys, or the list of dimension values
Data Flow
Constraint
Keys
Data Set
Provision Agreement
Data Provider
URL, registration date etc.
32Data Query
CategoryScheme
Structure Definition
- The query can start anywhere and navigate to the
data - In the registry all navigation is bi-directional.
- Category Drill down searches will start at the
Category and go via Data Flows. - Fine grained queries can be built using
structural metadata (e.g. dimension names and
possible values) - Fine grained searches are possible on the
Constraints
Data Flow
Category
Constraint
Data Set
Provision Agreement
Data Provider
33Metadata Set Registration
- Metadata that is reported regularly is registered
against the (Metadata) Provision Agreement - The metadata content (the metadata set) is linked
to the object to which it relates - This link can be stored in the registry
- e.g. a link to data set to which it relates
- a link to the data provider to which it relates
- Registry/Repository operators could use the
repository to store the metadata itself - This is not a part of the Information Model nor
of the SDMX standards
34Metadata Query
- The indexed metadata set itself can be searched
- Links to data can be discovered and followed
- e.g. is there any metadata for a specific data
set, or part of the data set? - If so what sort of metadata?
- Where is the metadata (URL)?
- More on this later
35Information Model Summary So Far
- Supports data and metadata reporting and exchange
- Data and metadata structure definitions
- Data and metadata sets
- Supports the process of reporting and exchange
- Data/metadata providers
- Data/metadata flows
- Provision agreements
- Supports registration
- Data and metadata sets
- Supports query
- Categories linked to data and metadata
- Constraints for finer grained queries
36Summary Data/Metadata Reporting, Query
CategoryScheme
Structure Definition
comprises subject or reporting categories
uses specific data/metadata structure
can be linked to categories in multiple category
schemes
Data Set or Metadata Set
Data or Metadata Flow
Category
conforms to business rules of the data/metadata
flow
publishes/reports data sets or metadata sets
can have child categories
can get data/metadata from multiple data/metadata
providers
Constraint
can have constraints sub set of possibilities
defined in the Structure Definition
can provide data/metadata for many data/metadata
flows using agreed data/metadata structure
Provision Agreement
Data Provider
37Registry what else?
- Link metadata to parts of a data set or data base
contents - Query for metadata linked to data
38Registry link metadata to data
These can be described in terms of key sets,
combined into an Attachment Constraint, linked to
a specific data set, and a metadata set
39Constraints Structure
- Supports the specification of sub sets of data or
metadata structure definitions or data and
metadata sets - In terms of allowable key values
- In terms of allowable dimension, attribute, or
measure values - Constraints can apply to
- Data sets so called cubes or cube regions
- Entire databases
- Data flows
- Metadata sets
- Entire metadata repositories
- Metadata flows
- Data providers
- Provision agreements
- Two kinds of Constraint
- Content this is used to define the actual or
allowable content - Attachment this is used to define a sub set of
data or metadata set for the purpose of attaching
metadata to it
40Constraints Structure Schematic
Sets of keys to be included in or excluded from
the scope
Constraint
AttachmentConstraint
ContentConstraint
Key Set
Sets of values to be included in or excluded from
the scope
Specification of a key
Cube Region
Key
Set of values for a concept
Identity of the Concept (e.g. Country)
Specification of a key value
Concept Values
Key Value
Concept
List of values
Values
41Constraints usage
- Data source registration
- Data source can be a data set or a database
- Content Constraint is used to define the content
of a data set or database - This supports fine grained queries
- Attaching metadata to parts of a data set or
other data source - Target object of a metadata set is an Attachment
Constraint linked to a registered data set or
database content
42Attachment Constraint
Metadata is linked to the Constraint
Constraint is linked to the Data Set
Attachment Constraint
Registered Data Set
Registered Metadata Set
Key Sets define the sub set of the Data Set
Key Set
SA,B,1,1999-03-31 SA,B,1,1999-06-33 SA,B,1,1999-09
-30 SA,B,2,1999-03-31 etc.
Key(s)
43Information Model Support for Data Analysis
- Viewing, comparing and analysing data in
different groupings - Hierarchical Code Lists
- Converting data and metadata from one coding and
structure scheme to another scheme - Structure and Code Mapping
44Hierarchical Code Lists - Example
- France is a country
- France is part of the continent of Europe
- France is a member of NATO
- France is a member of the EU
- France is a member of the G10
- When I analyse statistics I might want to see
totals by - continent
- trading block
- military alliance
- financial grouping
- France will be grouped with different sets of
countries depending on the view required - How do we express these groupings?
45Code List
Code Composition
Reference Area
6B NATO B0 EU B1 NAFTA BE Belgium BG
Bulgaria CA Canada CH Switzerland CZ Czech
Republic DE Germany DK Denmark E1 Europe E8
North America EE Estonia ES Spain FI Finland FR
France GB United Kingdom GR Greece HU Hungary JP
Japan I2 Euro 12 IT Italy NE Netherlands US
United States
Code
G10 countries
Europe
EU countries
NATO countries
NAFTA countries
Code Association
North America
46Hierarchical Code Scheme
comprises code groups
comprises hierarchies
Code List
relates a code to a parent code
belongs to
code
Code Association
Code
parent code
Properties of the association
groups codes with the same parent
Property
Code Composition
value based hierarchy has code groups
comprises code groups
Hierarchy
level based hierarchy has formal levels
Level
47Item Scheme Maps
- Many types of item scheme use the same
fundamental structure - Code list
- Category scheme
- Concept scheme
- Two Item Schemes can be mapped
48Item Scheme Association
target item scheme
source item scheme
Category Scheme Map
Concept Scheme Map
Code List Map
Association Role
Item Scheme
Item Scheme
has item associations
Concept Scheme
Category Scheme
Category Scheme
Concept Scheme
Code List
Code List
Item Association
target item
source item
Item
Item
Concept
Category
Category
Concept
Code
Code
Additional metadata
Property
49Structure Maps
- Structures can also be mapped
- Data structures
- Metadata structures
50Information Model Summary
- Supports data and metadata reporting and exchange
- Data and metadata structure definitions
- Data and metadata sets
- Supports the process of reporting and exchange
- Data/metadata providers
- Data/metadata flows
- Provision agreements
- Supports registration
- Data and metadata sets
- Data and metadata can be linked
- Supports query
- Categories linked to data and metadata
- Constraints for finer grained queries
- Retrieval of metadata linked to data
- Supports data analysis, comparison and conversion
- Hierarchical code schemes
- Structure, Concept, Code, Category maps
51Data/Metadata Reporting, Query, Analysis, Mapping
CategoryScheme
Structure and Item Scheme Maps
Structure Definition
Data Set or Metadata Set
Data or Metadata Flow
Category
Attachment Constraint
Content Constraint
Provision Agreement
Data Provider
Registered Data Set or Metadata Set
52Thank You