Title: DFDL WG Session 2
1DFDL WG Session 2
- Mike Beckerle
- Ascential Software
- (Two note-takers please?)
- Room D008
- Thursday, 2004-09-23 11h000200 Brussels
(BE.CEST) - 05h00 New York
(US.EDT, UTC-4) - 02h00 San
Francisco (US.PDT, UTC-7)
2Abstract
- Detailed WG meeting to review particular
examples. - This meeting will assume familiarity with the
ongoing work of the WG.
3DFDL-WG Session 2 Current Working Issues
- Agenda
- Charter revision discussion
- IBM WebSphere Business Integration Broker
Presentation - Steve Hanson smh_at_uk.ibm.com
- Review current examples
- Discussion
- Identify missing examples
- .
4Proposed Revisions to DFDL-WG Charter
- Documents
- Primer - draft (done for GGF11)
- Spec - 3 primary sections
- Language
- Transforms and their parameters
- XML-extensions
- Language Bindings/APIs
- Schedule
- Primer - update by GGF13 (officlal draft)
- Spec - internal WG document by GGF13, updated for
GGF14, official draft for GGF15 - focus for GGF13 disucssion on language
- focus for GGF14 on transfomrms
- GGF15 is the whole spec.
- Language Bindings/API - internal draft by GGF14,
target draft for GGF16. - focus for GGF15 discussion
5Describing non-XML data using XML Schema in IBM
WebSphere Business Integration Brokers
- Steve Hanson
- WebSphere Business Integration Brokers,
- IBM Hursley, England
- Internet smh_at_uk.ibm.com
- Phone (44)/(0) 1962-815848
6Examples (so far)
- BasicMathInExpressions.xsd
- BasicMathInRepresentations.xsd
- Choice.xsd
- DefaultValuePropagation.xsd
- DefaultedInput.xsd
- IncludeTransform.xsd
- IncludeType.xsd NewDFDLType.xsd
- MultiLayer.xsd
- MultiStreams.xsd
- NewDFDLTransform.xsd
- NewExternalDFDLTransform.xsd
- NewMixedDFDLTransform.xsd
- Reference.xsd
- StaticInformation.xsd
- ValidatedInput.xsd
7Examples
8END
9Issues
- Stored length, references in general
- Choice/unions
- Expression language for discrimination
- Layered translations
- compression, encryption
- IBM data streams (F, FB, VB, VBS)
- Modularity
- How to plug in new transformations?
- Composition properties
10Use Cases
Issue Coverage Issue Coverage Issue Coverage Issue Coverage
Stored-length/ references Choice Layered Translations Modularity
Variable length int vector X
Cobol redefines X X
Matrix w/dynamic size X X
Clever string X X X
Comma-separated values ? ?
11Stored Length References
- Variable-length int vector
- Prefix is 4-byte integer length gt 0
- Content follows
- Length field contains
- Number of elements in the vector?
- Number of bytes in the vectors rep?
- General issue how can the value of one field be
used in the format description of another field
12Stored Length
- 01 PATIENT-TREATMENTS.
- 05 PATIENT-NAME PIC X(30).
- 05 NUMBER-OF-TREATMENTS PIC 99
COMP-3. - 05 TREATMENT-HISTORY OCCURS 0 TO 50 TIMES
- DEPENDING ON NUMBER-OF-TREATMENTS.
- 10 TREATING-PHYSICIAN PIC
X(30). - 10 TREATMENT-CODE PIC 99.
13Stored Length
- ltcomplexType name'PATIENT-TREATMENTS-type'gt
- ltannotationgtltappinfogt
- ltdfdlbyteOrder value"bigEndian"/gt
- ltdfdldecimalFormatType value"text"/gt
- ltdfdlcharset value"ebcdic-cp-us"/gt
- lt/appinfogtlt/annotationgt
- ltsequencegt
- ltelement name'PATIENT-NAME'
type'string' - minLength'30' maxLength"30"/gt
- ltelement name'NUMBER-OF-TREATMENTS'
type'decimal' totalDigits'2'gt - ltannotationgtltappinfogt
- ltdfdlrepType value"binary"/gt
- ltdfdldecimalFormatType
value"packed"/gt - lt/appinfogtlt/annotationgt
- lt/elementgt
- ltelement name'TREATMENT-HISTORY' gt
- .....next slide
- lt/elementgt
- lt/sequencegt
14Stored Length References
- ltelement name'TREATMENT-HISTORY'
minOccurs'0' maxOccurs'50'gt - ltannotationgtltappinfogt
- ltdfdlstoredLength
value'../NUMBER-OF-TREATMENTS'/gt - ltdfdlstoredLengthUnitType
value'logicalItems'/gt - lt/appinfogtlt/annotationgt
- ltsequencegt
- ltelement name'TREATING-PHYSICIAN'
type'string - minLength'30
maxLength30/gt - ltelement name'TREATMENT-CODE'
type'decimal' totalDigits'2'/gt - lt/sequencegt
- lt/elementgt
-
15Choice
- Calculation required to determine which of
several formats applies. - A.k.a., union, tagged union, discriminated
unions, variants, tagged variants - Cobol redefines
- Changing data formats as new versions of data
standards come out. - EDI (ANSI X12) formats
16Use Case Cobol Redefines
- 01 AS-CPST-REC.
- // Header portion. Position 1-42 are common
to all variants. - 06 AS-CPCOM.
- 09 AS-COM-STORE-TYPE PIC X.
- 09 AS-COM-STORE-NO PIC 9(05).
- 09 AS-COM-TRAN-ID
PIC X(04). - 88 TRAN-COUPON
VALUE 'CP80'. - 88 TRAN-REVENUE
VALUE 'IC40' -
'RA40'. - 88 TRAN-SALES
VALUE 'IC40'. - 88 TRAN-DELIVER
VALUE 'IC44'. - 88 TRAN-RENTS
VALUE 'RA40' -
'RA42'. - 88 TRAN-RENT-RETURN
VALUE 'RA41'. - 09 AS-COM-QUANTITY PIC
S9(05). - 09 AS-COM-PART-NO PIC
9(06). - // Variant portion. Position 43-95 differ
depending on AS-COM-TRAN-ID
17Use Case Cobol Redefines
- // Txn-type RA4 Rental
- 06 AS-CPREN REDEFINES AS-CPVAR.
- 09 REN-REVENUE PIC
S9(07)V9(02). - 09 FILLER PIC
X(01). - 09 REN-LOS PIC
S9(03). - 09 REN-EXT-VIEW-FEE PIC
S9(07)V9(02). - 09 REN-QOH PIC
S9(05). - 09 REN-RENT-CODE PIC
X(01). - 09 REN-COPY-NO PIC
9(03). - 09 REN-CUST-NO PIC
X(11). - 09 REN-CRD-TYPE PIC
X(03). - // Txn-type IC4 Sales
- 06 AS-CPSLS REDEFINES AS-CPVAR.
- 09 SLS-COST PIC
S9(04)V9(02). - 09 SLS-REVENUE PIC
S9(07)V9(02). - 09 SLS-QOH PIC
S9(05). - 09 SLS-CUST-NO PIC
X(11). - 09 SLS-CRD-TYPE PIC
X(03). - // Txn-type CP8 Coupon
18Use Case Cobol Redefines
- ltcomplexType name"AS-CPST-REC"gt
- ltsequencegt
- ltelement name"AS-CPCOM"gtltsequencegt
- ltannotationgt
- // Header portion. Position 1-42 are
common to all variants. - lt/annotationgt
- ltannotationgtltappinfogt
- ltcharset value"ebcdic-cp-us"/gt
- ltrepType value"binary"/gt
- ltbyteOrder value"bigEndian"/gt
- ltdecimalType value"text"/gt
- lt/appinfogtlt/annotationgt
- ltelement name"AS-COM-STORE-TYPE"
type"string" fixLength"1"/gt - ltelement name"AS-COM-STORE-NO"
type"decimal" unsigned"true" totalDigits"5"/gt - ltelement name"AS-COM-TRAN-ID"
type"string" fixLength"4"/gt - ltannotation name"tran-id-definitions"gt
- 88 TRAN-COUPON
VALUE 'CP80'. - 88 TRAN-REVENUE
VALUE 'IC40' 'RA40'. - 88 TRAN-SALES
VALUE 'IC40'.
19Use Case Cobol Redefines
- ltelement name"AS-CPVAR"gt
- ltannotationgt
- // Variant portion. Position 43-95
differs depending on AS-COM-TRAN-ID - lt/annotationgt
- ltchoicegt
- ltannotationgtltappinfogt
- ltchoiceTagValueCalc
- value" let code
substring(../AS-CPCOM/AS-COM-TRAN-ID,0,3) - let tag if (code
"RA4") then "AS-CPREN" - else if
(code "IC4") then "AS-CPSLS" - else if
(code "CP8") then "AS-CPCP" - else throw
invalidDataFormat(code) - ./choiceTag tag
- " /gt
- ltchoiceTagRepCalc
- value" let tagRep if
(./choiceTag "AS-CPRENl") then "RA40" - else if
(./choiceTag "AS-CPSLS") then "IC40" - else if
(./choiceTag "AS-CPCP") then "CP80" - else throw
inconsistentDef(./choiceTag)
20Use Case Cobol Redefines
- ltelement name"AS-CPREN"gtltsequencegt
- ltannotationgt
- // Txn-type RA4 Rental
- lt/annotationgt
- ltelement name"REN-REVENUE" type"decimal"
totalDigits"9" fractionDigits"2"/gt - ltelement type"string" fixLength"1"/gt
- ltelement name"REN-LOS" type"decimal"
totalDigits"3"/gt - ltelement name"REN-EXT-VIEW-FEE"
type"decimal" totalDigits"9" fractionDigits"2"/
gt - ltelement name"REN-QOH" type"decimal"
totalDigits"5"/gt - ltelement name"REN-RENT-CODE" type"string"
fixLength"1"/gt - ltelement name"REN-COPY-NO" type"decimal"
unsigned"true" totalDigits"3"/gt - ltelement name"REN-CUST-NO" type"string"
fixLength"11"/gt - ltelement name"REN-CRD-TYPE" type"string"
fixLength"3"/gt - lt/sequencegtlt/elementgt
- ltelement name"AS-CPSLS"gtltsequencegt
- ltannotationgt
- // Txn-type IC4 Sales
- lt/annotationgt
21Expressions/References
- Expression language
- ./x/y/z references the data value at runtime
- ./x/y references run-time properties of the
data values - E.g., choiceTag to set the active alternative
- E.g., length to set a variable length for a
string - E.g., arrayLength ditto for array
22Run-time properties
Conceptual model of Run Time Attributes available
from
DFDL expression-language syntax
FloatingPoint
AllTypes
SimpleType
Number
-isNullboolean
-arrayLengthlong
-arrayLengthslong
Decimal
Numeric
-selfIndexlong
-selfIndiceslong
-lengthlong
-isValidboolean
Integer
String
Duration
-lengthlong
DateTime
-yearsint
Date
-monthsint
Choice
-eraint
-daysint
-yearint
-eraint
-hoursint
-choiceTagint
-monthint
-yearint
Time
-secondsint
-dayint
-monthint
-microsecondsint
-hourint
-weekint
-hourint
-minuteint
-dayint
-minuteint
-secondint
-secondint
-microsecondint
-microsecondint
23Layered Translations
- Use case Matrix with dynamic size in text file
- blank lines are ignored
- C-style comments are ignored (equiv. to
whitespace) - First line contains xdim ydim (whitespace
separated) - Subsequent lines are rows of the 2-d matrix.
- There must be exactly ydim rows
- each containing xdim numbers
- Within each row the values are whitespace
separated. - The charset is UTF-8
- Requires that we express preprocessing of the
input data to handle the C-style comments and
blank lines - The preprocessing is not part of the structure of
the data
24Layered TranslationsMatrix w/Dynamic Size Example
- / obsv3 ??? 2003?08?27? ?? /
- / gbxx2. 140221 ???? 8 ?
- /
- 3 2
- / ?????????????? /
- 1 2 3 //
- 4 5 /????/ 6
- / ??????? /
25Layered TranslationsMatrix w/Dynamic Size Example
- ltdimsgt
- ltxdimgt3lt/xdimgt
- ltydimgt2lt/ydimgt
- lt/dimsgt
- ltydatagt
- ltxdatagt1lt/xdatagt
- ltxdatagt2lt/xdatagt
- ltxdatagt3lt/xdatagt
- lt/ydatagt
- ltydatagt
- ltxdatagt4lt/xdatagt
- ltxdatagt5lt/xdatagt
- ltxdatagt6lt/xdatagt
- lt/ydatagt
26Layered TranslationsMatrix w/Dynamic Size Example
- ltelement name"example2"gt
- ltsequencegt
- ltelement name"dims"gt
- ltsequencegt
- ltelement name"xdim" type"int"/gt
- ltelement name"ydim" type"int"/gt
- lt/sequencegt
- lt/elementgt
- lt! XSD/XML Issues XSD has no 2-d array. Also
there is no way to constrain minOccurs or
maxOccurs based on the value of other elements of
the XML --gt - ltelement nameydata minOccurs0
maxOccursunboundedgt - ltsequencegt
- ltelement namexdata typedouble
- minOccurs0 maxOccursunbounded
/gt - lt/sequencegt
- lt/elementgt
- lt/sequencegt
- lt/elementgt
27Layered TranslationsMatrix w/Dynamic Size Example
- Underlying transformations
- Bits to bytes
- Bytes to Characters (UTF-8 encoding)
- Removal of blank lines
- Removal of C-style comments
28Layered TranslationsMatrix w/Dynamic Size Example
- The data now looks like
- 3 2
- 1 2 3
- 4 5 6
- Let b blank, nnewline. The data really is this
string of characters - 3b2n1b2b3bbn4b5bbb6n
29ReferencesMatrix w/Dynamic Size Example
- DFDL wants to make invalid mistakes like
- 3 2
- 1 2
- 3 4 5 6
- (line structure doesnt match dimensions) or
- 3 2
- 1 2 3
- 4 5 6
- 7 8 9
- (too many rows)
30ReferencesMatrix w/Dynamic Size Example
- ltelement name"example2"gt
- ltsequencegt
- ltelement name"dims"gt
- ltsequencegt
- ltannotationgtltappinfogt
- ltdfdlterminator value"\pwhitespace\p
Line_Separator"/gt - ltdfdlseparator value"\pwhitespace"/gt
- lt/appinfogtlt/annotationgt
- ltelement name"xdim" type"int"/gt
- ltelement name"ydim" type"int"/gt
- lt/sequencegt
- lt/elementgt
- ltelement nameydata minOccurs0
maxOccursunboundedgt - ltannotationgtltappinfogt
- ltdfdlseparator value"\pwhitespace\pLin
e_Separator"/gt - lt/appinfogtlt/annotationgt
- ltsequencegt
- ltelement namexdata typedouble
- minOccurs0 maxOccursunbounded
gt
31ReferencesMatrix w/Dynamic Size Example
- ltelement name"example2"gt
- ltsequencegt
- ltelement name"dims"gt
- ltsequencegt
- ltannotationgt lt/annotationgt
- ltelement name"xdim" type"int"/gt
- ltelement name"ydim" type"int"/gt
- lt/sequencegt
- lt/elementgt
- ltelement nameydata minOccurs0
maxOccursunboundedgt - ltannotationgtltappinfogt
- ltdfdlseparator value"\pwhitespace\pLin
e_Separator"/gt - ltdfdlvalidation expr" ./arrayLength
../dims/ydim "/gt - lt/appinfogtlt/annotationgt
- ltsequencegt
- ltelement namexdata typedouble
- minOccurs0 maxOccursunbounded
gt
32Layered TranslationsMatrix w/Dynamic Size Example
- Now add in the layered transformations of the
streams. - ltannotationgtltappinfogt
- ltcontainer name"charStream" type"string"gt
- ltrep charset"UTF-8"
- container"byteStream"gt lt!-- a built in
container --gt - ltvalueCalc exp" bytesToChars() "/gt
- lt/repgt
- lt/containergt
- ltcontainer name"noCommentsStream" type"string"gt
- ltrep container"charStream"gt
- ltvalueCalc exp"replaceString( '...a regexp
for comments...', ' ')"/gt - lt/repgt
- lt/containergt
- ltcontainer name"noBlankLinesStream"
type"string"gt - ltrep container"noCommentsStream"gt
- ltvalueCalc exp" replaceString( '..a regexp
for blanklines..',' ')"/gt
33Modularity
- Consider this example
- ltxssimpleType name"binaryInt"gt
- ltxsrestriction base"xsint"gt
- ltxsannotationgtltxsappinfogt
- ltcompositeMappinggt
- ltmapping name"data-bytes"/gt
- ltmapping name"bytes-int"/gt
- lt/compositeMappinggt
- lt/xsappinfogtlt/xsannotationgt
- lt/xsrestrictiongt
- lt/xssimpleTypegt
- This connects the definition of binaryInt all the
way back to how bits are turned into bytes - This over-specification limits reusability
34Modularity
- Issue Why should binaryInt care about where the
bytes come from? - They could come from a binary file
- They could come from conversion of uuencoded text
back into binary data - They could come from decompression.
- DFDL defined types want to be parameterized by
where they get their underlying data
35DFDL-WG Session 2 Multilayered descriptions and
references
- This session focuses on issues and potential
problems with the current approach - Representations of multilayered descriptions
- Descriptions of compressions, encryption etc
- Discussion of conditional types
- how to represent unions, clever string etc.