Title: Schematron Data Validation
1Schematron Data Validation
- An open framework
- Presented By Chris Clark and Dr. Yunhao Zhang
2Topics
- What is Schematron?
- Business rules and Schematron rules
- Schematron validation process
- Exchange Network Schematron extensions
- Implementation and Tools
- Guidance and recommendations.
- Conclusion
3Schematron
- An XML schema language
- Combines powerful validation capability with
simple syntax - Based on XSLT and XPath
- Open Source Implementation
- An ISO standardISO/IEC 19757 - DSDL Document
Schema Definition Language -
4Schematron Advantages
- Direct mapping from business rules to validation
rules. - Easier to design and deploy.
- Moves data validation from application
programming logic to a shared service with a
centralized rule set (rule sharing) - Natural language descriptions of error conditions
and solutions. - Highly extensible using XPath expression
extensions.
5Schematron and XLST
XML Doc
Error Report
XSLT Processor
XSLT Rules
- Transform an XML document into an error report
using XSLT. Rules are coded in style sheets.
6Business Rules
7Business Rules
- RuleId A unique identifier for the business
rule. - XmlElement XML element the rule is imposed upon.
- Test Conditions Conditions the data element must
meet or not meet. - Error Description The error message when the
rule is fired.
8Schematron Rules
- A Schematron rule has three major parts
- The context The element a rule applies to. - The
XML element in the bus. rule. - An assertion A statement about an element,
usually an Xpath expression. - The error
condition in the bus. rule). - A result A statement to be reported if an
assertion fails (or succeeds). The error
description in the bus. rule.
9Schematon Rule Example
ltrule context"aqsObservationDate"gt ltassert
test"neienCheckDate(string(.),'', '19570101',
'Today', 0)"gt AQS23Error ObservationDate
(ltvalue-of select"."/gt) must be in proper
YYYYMMDD format and in the range Jan 1, 1957
through today. lt/assertgt lt/rulegt
10Recommended Error Format
- RuleId The business rule Id in the business rule
table. - ErrorType This is the type of error - Error,
Warning, Critical, or some custom error level. - Element Full Path This is the complete path with
the element name, leading to the offending
element - Error Description The error description. It must
contain the offending value.
11Quality Assurance Services
- The Exchange Network has been using the
Schematron technology since 2003. - QA services are shared web services for data
validation using both schema and Schematron. - Many data flows are currently using Schematron
validations NEI, AQS,OWWQX, UIC, VERIFY - QA services can be accessed from all
applications. - Service description is available
athttps//tools.epacdxnode.net/xml/validator.wsd
l.
12Flow Data Validation Process
Well-form Check
Schema validation
Rule validation
Error Report
XML Doc
XML Parser
Schema Validator
XSLT Processor
Schemas
Schematron Rules
13Exchange Network Schematron Extensions
- Table Lookup Check whether a value is in a
predefined database table. - Date Format and Range Verify if a date string in
a correct format and within the specified range. - Regular Expression Compare a value against a
specified format.
14Schematron with Extensions
Error Report
XML Doc
XSLT Processor
XSLT
Xpath Extension
Schematron Processor
Meta Schemas
FRS
Registry Info
SRS
Registry Info
Schematron Rules
Regular Expression
15Current Implementation
- A set of web services.
- Provides both schema validation and Schematron
validation. - Has synchronous and asynchronous modes.
- Supports table lookups to any database tables.
- Can process compressed or uncompressed xml
document. - Accessible to any nodes, applications or users.
- Node client 2007 is integrated with QA services.
16Schematron Guidance Document
- Business Rules and Requirements
- Schema and Schematron
- Schematron Rule Development
- Schematron Extensions
- Regular Expression Support
- Schematron Software Developer Kit
- Data Quality Assurance Services
- Using QA Server for Schema Validation
- Using QA Server for Schematron Validation
- Deploying New Schematron Rules
17Conclusion
- Streamlined data validation is crucial to
successful data exchange - Rules should be defined with schemas
- Data validation should happen as early as
possible - Technologies and tools are available for boosting
data quality - Schematron is a recommended design approach