DFDL WG Session 2 - PowerPoint PPT Presentation

About This Presentation
Title:

DFDL WG Session 2

Description:

Thursday, 2004-09-23 11h00 02:00 Brussels (BE.CEST) 05h00 New York (US.EDT, UTC-4) ... 05 TREATMENT-HISTORY OCCURS 0 TO 50 TIMES. DEPENDING ON NUMBER-OF-TREATMENTS. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 8
Provided by: ogf
Category:
Tags: dfdl | archives | new | session | times | york

less

Transcript and Presenter's Notes

Title: DFDL WG Session 2


1
DFDL WG Session 2
  • Mike Beckerle
  • Ascential Software
  • (Two note-takers please?)
  • Room D008
  • Thursday, 2004-09-23 11h000200 Brussels
    (BE.CEST)
  • 05h00 New York
    (US.EDT, UTC-4)
  • 02h00 San
    Francisco (US.PDT, UTC-7)

2
Abstract
  • Detailed WG meeting to review particular
    examples.
  • This meeting will assume familiarity with the
    ongoing work of the WG.

3
DFDL-WG Session 2 Current Working Issues
  • Agenda
  • Charter revision discussion
  • IBM WebSphere Business Integration Broker
    Presentation
  • Steve Hanson smh_at_uk.ibm.com
  • Review current examples
  • Discussion
  • Identify missing examples
  • .

4
Proposed Revisions to DFDL-WG Charter
  • Documents
  • Primer - draft (done for GGF11)
  • Spec - 3 primary sections
  • Language
  • Transforms and their parameters
  • XML-extensions
  • Language Bindings/APIs
  • Schedule
  • Primer - update by GGF13 (officlal draft)
  • Spec - internal WG document by GGF13, updated for
    GGF14, official draft for GGF15
  • focus for GGF13 disucssion on language
  • focus for GGF14 on transfomrms
  • GGF15 is the whole spec.
  • Language Bindings/API - internal draft by GGF14,
    target draft for GGF16.
  • focus for GGF15 discussion

5
Describing non-XML data using XML Schema in IBM
WebSphere Business Integration Brokers
  • Steve Hanson
  • WebSphere Business Integration Brokers,
  • IBM Hursley, England
  • Internet smh_at_uk.ibm.com
  • Phone (44)/(0) 1962-815848

6
Examples (so far)
  • BasicMathInExpressions.xsd
  • BasicMathInRepresentations.xsd
  • Choice.xsd
  • DefaultValuePropagation.xsd
  • DefaultedInput.xsd
  • IncludeTransform.xsd
  • IncludeType.xsd NewDFDLType.xsd
  • MultiLayer.xsd
  • MultiStreams.xsd
  • NewDFDLTransform.xsd
  • NewExternalDFDLTransform.xsd
  • NewMixedDFDLTransform.xsd
  • Reference.xsd
  • StaticInformation.xsd
  • ValidatedInput.xsd

7
Examples
  • (not powerpoint)

8
END
9
Issues
  • Stored length, references in general
  • Choice/unions
  • Expression language for discrimination
  • Layered translations
  • compression, encryption
  • IBM data streams (F, FB, VB, VBS)
  • Modularity
  • How to plug in new transformations?
  • Composition properties

10
Use Cases
Issue Coverage Issue Coverage Issue Coverage Issue Coverage
Stored-length/ references Choice Layered Translations Modularity
Variable length int vector X
Cobol redefines X X
Matrix w/dynamic size X X
Clever string X X X
Comma-separated values ? ?
11
Stored Length References
  • Variable-length int vector
  • Prefix is 4-byte integer length gt 0
  • Content follows
  • Length field contains
  • Number of elements in the vector?
  • Number of bytes in the vectors rep?
  • General issue how can the value of one field be
    used in the format description of another field

12
Stored Length
  • 01  PATIENT-TREATMENTS.
  •        05  PATIENT-NAME                PIC X(30).
  •        05  NUMBER-OF-TREATMENTS        PIC 99
    COMP-3.
  •        05  TREATMENT-HISTORY OCCURS 0 TO 50 TIMES
  •               DEPENDING ON NUMBER-OF-TREATMENTS.
  •            10  TREATING-PHYSICIAN       PIC
    X(30).
  •            10  TREATMENT-CODE           PIC 99.

13
Stored Length
  • ltcomplexType name'PATIENT-TREATMENTS-type'gt
  • ltannotationgtltappinfogt
  • ltdfdlbyteOrder value"bigEndian"/gt
  • ltdfdldecimalFormatType value"text"/gt
  • ltdfdlcharset value"ebcdic-cp-us"/gt
  • lt/appinfogtlt/annotationgt
  • ltsequencegt
  • ltelement name'PATIENT-NAME'
    type'string'
  • minLength'30' maxLength"30"/gt
  • ltelement name'NUMBER-OF-TREATMENTS'
    type'decimal' totalDigits'2'gt
  • ltannotationgtltappinfogt
  • ltdfdlrepType value"binary"/gt
  • ltdfdldecimalFormatType
    value"packed"/gt
  • lt/appinfogtlt/annotationgt
  • lt/elementgt
  • ltelement name'TREATMENT-HISTORY' gt
  • .....next slide
  • lt/elementgt
  • lt/sequencegt

14
Stored Length References
  • ltelement name'TREATMENT-HISTORY'
    minOccurs'0' maxOccurs'50'gt
  • ltannotationgtltappinfogt
  • ltdfdlstoredLength
    value'../NUMBER-OF-TREATMENTS'/gt
  • ltdfdlstoredLengthUnitType
    value'logicalItems'/gt
  • lt/appinfogtlt/annotationgt
  • ltsequencegt
  • ltelement name'TREATING-PHYSICIAN'
    type'string
  • minLength'30
    maxLength30/gt
  • ltelement name'TREATMENT-CODE'
    type'decimal' totalDigits'2'/gt
  • lt/sequencegt
  • lt/elementgt

15
Choice
  • Calculation required to determine which of
    several formats applies.
  • A.k.a., union, tagged union, discriminated
    unions, variants, tagged variants
  • Cobol redefines
  • Changing data formats as new versions of data
    standards come out.
  • EDI (ANSI X12) formats

16
Use Case Cobol Redefines
  • 01 AS-CPST-REC.
  • // Header portion. Position 1-42 are common
    to all variants.
  • 06 AS-CPCOM.
  • 09 AS-COM-STORE-TYPE PIC X.
  • 09 AS-COM-STORE-NO PIC 9(05).
  • 09 AS-COM-TRAN-ID
    PIC X(04).
  • 88 TRAN-COUPON
    VALUE 'CP80'.
  • 88 TRAN-REVENUE
    VALUE 'IC40'

  • 'RA40'.
  • 88 TRAN-SALES
    VALUE 'IC40'.
  • 88 TRAN-DELIVER
    VALUE 'IC44'.
  • 88 TRAN-RENTS
    VALUE 'RA40'

  • 'RA42'.
  • 88 TRAN-RENT-RETURN
    VALUE 'RA41'.
  • 09 AS-COM-QUANTITY PIC
    S9(05).
  • 09 AS-COM-PART-NO PIC
    9(06).
  • // Variant portion. Position 43-95 differ
    depending on AS-COM-TRAN-ID

17
Use Case Cobol Redefines
  • // Txn-type RA4 Rental
  • 06 AS-CPREN REDEFINES AS-CPVAR.
  • 09 REN-REVENUE PIC
    S9(07)V9(02).
  • 09 FILLER PIC
    X(01).
  • 09 REN-LOS PIC
    S9(03).
  • 09 REN-EXT-VIEW-FEE PIC
    S9(07)V9(02).
  • 09 REN-QOH PIC
    S9(05).
  • 09 REN-RENT-CODE PIC
    X(01).
  • 09 REN-COPY-NO PIC
    9(03).
  • 09 REN-CUST-NO PIC
    X(11).
  • 09 REN-CRD-TYPE PIC
    X(03).
  • // Txn-type IC4 Sales
  • 06 AS-CPSLS REDEFINES AS-CPVAR.
  • 09 SLS-COST PIC
    S9(04)V9(02).
  • 09 SLS-REVENUE PIC
    S9(07)V9(02).
  • 09 SLS-QOH PIC
    S9(05).
  • 09 SLS-CUST-NO PIC
    X(11).
  • 09 SLS-CRD-TYPE PIC
    X(03).
  • // Txn-type CP8 Coupon

18
Use Case Cobol Redefines
  • ltcomplexType name"AS-CPST-REC"gt
  • ltsequencegt
  • ltelement name"AS-CPCOM"gtltsequencegt
  • ltannotationgt
  • // Header portion. Position 1-42 are
    common to all variants.
  • lt/annotationgt
  • ltannotationgtltappinfogt
  • ltcharset value"ebcdic-cp-us"/gt
  • ltrepType value"binary"/gt
  • ltbyteOrder value"bigEndian"/gt
  • ltdecimalType value"text"/gt
  • lt/appinfogtlt/annotationgt
  • ltelement name"AS-COM-STORE-TYPE"
    type"string" fixLength"1"/gt
  • ltelement name"AS-COM-STORE-NO"
    type"decimal" unsigned"true" totalDigits"5"/gt
  • ltelement name"AS-COM-TRAN-ID"
    type"string" fixLength"4"/gt
  • ltannotation name"tran-id-definitions"gt
  • 88 TRAN-COUPON
    VALUE 'CP80'.
  • 88 TRAN-REVENUE
    VALUE 'IC40' 'RA40'.
  • 88 TRAN-SALES
    VALUE 'IC40'.

19
Use Case Cobol Redefines
  • ltelement name"AS-CPVAR"gt
  • ltannotationgt
  • // Variant portion. Position 43-95
    differs depending on AS-COM-TRAN-ID
  • lt/annotationgt
  • ltchoicegt
  • ltannotationgtltappinfogt
  • ltchoiceTagValueCalc
  • value" let code
    substring(../AS-CPCOM/AS-COM-TRAN-ID,0,3)
  • let tag if (code
    "RA4") then "AS-CPREN"
  • else if
    (code "IC4") then "AS-CPSLS"
  • else if
    (code "CP8") then "AS-CPCP"
  • else throw
    invalidDataFormat(code)
  • ./choiceTag tag
  • " /gt
  • ltchoiceTagRepCalc
  • value" let tagRep if
    (./choiceTag "AS-CPRENl") then "RA40"
  • else if
    (./choiceTag "AS-CPSLS") then "IC40"
  • else if
    (./choiceTag "AS-CPCP") then "CP80"
  • else throw
    inconsistentDef(./choiceTag)

20
Use Case Cobol Redefines
  • ltelement name"AS-CPREN"gtltsequencegt
  • ltannotationgt
  • // Txn-type RA4 Rental
  • lt/annotationgt
  • ltelement name"REN-REVENUE" type"decimal"
    totalDigits"9" fractionDigits"2"/gt
  • ltelement type"string" fixLength"1"/gt
  • ltelement name"REN-LOS" type"decimal"
    totalDigits"3"/gt
  • ltelement name"REN-EXT-VIEW-FEE"
    type"decimal" totalDigits"9" fractionDigits"2"/
    gt
  • ltelement name"REN-QOH" type"decimal"
    totalDigits"5"/gt
  • ltelement name"REN-RENT-CODE" type"string"
    fixLength"1"/gt
  • ltelement name"REN-COPY-NO" type"decimal"
    unsigned"true" totalDigits"3"/gt
  • ltelement name"REN-CUST-NO" type"string"
    fixLength"11"/gt
  • ltelement name"REN-CRD-TYPE" type"string"
    fixLength"3"/gt
  • lt/sequencegtlt/elementgt
  • ltelement name"AS-CPSLS"gtltsequencegt
  • ltannotationgt
  • // Txn-type IC4 Sales
  • lt/annotationgt

21
Expressions/References
  • Expression language
  • ./x/y/z references the data value at runtime
  • ./x/y references run-time properties of the
    data values
  • E.g., choiceTag to set the active alternative
  • E.g., length to set a variable length for a
    string
  • E.g., arrayLength ditto for array

22
Run-time properties
Conceptual model of Run Time Attributes available
from
DFDL expression-language syntax
FloatingPoint
AllTypes
SimpleType
Number
-isNullboolean
-arrayLengthlong
-arrayLengthslong
Decimal
Numeric
-selfIndexlong
-selfIndiceslong
-lengthlong
-isValidboolean
Integer
String
Duration
-lengthlong
DateTime
-yearsint
Date
-monthsint
Choice
-eraint
-daysint
-yearint
-eraint
-hoursint
-choiceTagint
-monthint
-yearint
Time
-secondsint
-dayint
-monthint
-microsecondsint
-hourint
-weekint
-hourint
-minuteint
-dayint
-minuteint
-secondint
-secondint
-microsecondint
-microsecondint
23
Layered Translations
  • Use case Matrix with dynamic size in text file
  • blank lines are ignored
  • C-style comments are ignored (equiv. to
    whitespace)
  • First line contains xdim ydim (whitespace
    separated)
  • Subsequent lines are rows of the 2-d matrix.
  • There must be exactly ydim rows
  • each containing xdim numbers
  • Within each row the values are whitespace
    separated.
  • The charset is UTF-8
  • Requires that we express preprocessing of the
    input data to handle the C-style comments and
    blank lines
  • The preprocessing is not part of the structure of
    the data

24
Layered TranslationsMatrix w/Dynamic Size Example
  • / obsv3 ??? 2003?08?27? ?? /
  • / gbxx2. 140221 ???? 8 ?
  • /
  • 3 2
  • / ?????????????? /
  • 1 2 3 //
  • 4 5 /????/ 6
  • / ??????? /

25
Layered TranslationsMatrix w/Dynamic Size Example
  • ltdimsgt
  • ltxdimgt3lt/xdimgt
  • ltydimgt2lt/ydimgt
  • lt/dimsgt
  • ltydatagt
  • ltxdatagt1lt/xdatagt
  • ltxdatagt2lt/xdatagt
  • ltxdatagt3lt/xdatagt
  • lt/ydatagt
  • ltydatagt
  • ltxdatagt4lt/xdatagt
  • ltxdatagt5lt/xdatagt
  • ltxdatagt6lt/xdatagt
  • lt/ydatagt

26
Layered TranslationsMatrix w/Dynamic Size Example
  • ltelement name"example2"gt
  • ltsequencegt
  • ltelement name"dims"gt
  • ltsequencegt
  • ltelement name"xdim" type"int"/gt
  • ltelement name"ydim" type"int"/gt
  • lt/sequencegt
  • lt/elementgt
  • lt! XSD/XML Issues XSD has no 2-d array. Also
    there is no way to constrain minOccurs or
    maxOccurs based on the value of other elements of
    the XML --gt
  • ltelement nameydata minOccurs0
    maxOccursunboundedgt
  • ltsequencegt
  • ltelement namexdata typedouble
  • minOccurs0 maxOccursunbounded
    /gt
  • lt/sequencegt
  • lt/elementgt
  • lt/sequencegt
  • lt/elementgt

27
Layered TranslationsMatrix w/Dynamic Size Example
  • Underlying transformations
  • Bits to bytes
  • Bytes to Characters (UTF-8 encoding)
  • Removal of blank lines
  • Removal of C-style comments

28
Layered TranslationsMatrix w/Dynamic Size Example
  • The data now looks like
  • 3 2
  • 1 2 3
  • 4 5 6
  • Let b blank, nnewline. The data really is this
    string of characters
  • 3b2n1b2b3bbn4b5bbb6n

29
ReferencesMatrix w/Dynamic Size Example
  • DFDL wants to make invalid mistakes like
  • 3 2
  • 1 2
  • 3 4 5 6
  • (line structure doesnt match dimensions) or
  • 3 2
  • 1 2 3
  • 4 5 6
  • 7 8 9
  • (too many rows)

30
ReferencesMatrix w/Dynamic Size Example
  • ltelement name"example2"gt
  • ltsequencegt
  • ltelement name"dims"gt
  • ltsequencegt
  • ltannotationgtltappinfogt
  • ltdfdlterminator value"\pwhitespace\p
    Line_Separator"/gt
  • ltdfdlseparator value"\pwhitespace"/gt
  • lt/appinfogtlt/annotationgt
  • ltelement name"xdim" type"int"/gt
  • ltelement name"ydim" type"int"/gt
  • lt/sequencegt
  • lt/elementgt
  • ltelement nameydata minOccurs0
    maxOccursunboundedgt
  • ltannotationgtltappinfogt
  • ltdfdlseparator value"\pwhitespace\pLin
    e_Separator"/gt
  • lt/appinfogtlt/annotationgt
  • ltsequencegt
  • ltelement namexdata typedouble
  • minOccurs0 maxOccursunbounded
    gt

31
ReferencesMatrix w/Dynamic Size Example
  • ltelement name"example2"gt
  • ltsequencegt
  • ltelement name"dims"gt
  • ltsequencegt
  • ltannotationgt lt/annotationgt
  • ltelement name"xdim" type"int"/gt
  • ltelement name"ydim" type"int"/gt
  • lt/sequencegt
  • lt/elementgt
  • ltelement nameydata minOccurs0
    maxOccursunboundedgt
  • ltannotationgtltappinfogt
  • ltdfdlseparator value"\pwhitespace\pLin
    e_Separator"/gt
  • ltdfdlvalidation expr" ./arrayLength
    ../dims/ydim "/gt
  • lt/appinfogtlt/annotationgt
  • ltsequencegt
  • ltelement namexdata typedouble
  • minOccurs0 maxOccursunbounded
    gt

32
Layered TranslationsMatrix w/Dynamic Size Example
  • Now add in the layered transformations of the
    streams.
  • ltannotationgtltappinfogt
  • ltcontainer name"charStream" type"string"gt
  • ltrep charset"UTF-8"
  • container"byteStream"gt lt!-- a built in
    container --gt
  • ltvalueCalc exp" bytesToChars() "/gt
  • lt/repgt
  • lt/containergt
  • ltcontainer name"noCommentsStream" type"string"gt
  • ltrep container"charStream"gt
  • ltvalueCalc exp"replaceString( '...a regexp
    for comments...', ' ')"/gt
  • lt/repgt
  • lt/containergt
  • ltcontainer name"noBlankLinesStream"
    type"string"gt
  • ltrep container"noCommentsStream"gt
  • ltvalueCalc exp" replaceString( '..a regexp
    for blanklines..',' ')"/gt

33
Modularity
  • Consider this example
  • ltxssimpleType name"binaryInt"gt
  • ltxsrestriction base"xsint"gt
  • ltxsannotationgtltxsappinfogt
  • ltcompositeMappinggt
  • ltmapping name"data-bytes"/gt
  • ltmapping name"bytes-int"/gt
  • lt/compositeMappinggt
  • lt/xsappinfogtlt/xsannotationgt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • This connects the definition of binaryInt all the
    way back to how bits are turned into bytes
  • This over-specification limits reusability

34
Modularity
  • Issue Why should binaryInt care about where the
    bytes come from?
  • They could come from a binary file
  • They could come from conversion of uuencoded text
    back into binary data
  • They could come from decompression.
  • DFDL defined types want to be parameterized by
    where they get their underlying data

35
DFDL-WG Session 2 Multilayered descriptions and
references
  • This session focuses on issues and potential
    problems with the current approach
  • Representations of multilayered descriptions
  • Descriptions of compressions, encryption etc
  • Discussion of conditional types
  • how to represent unions, clever string etc.
Write a Comment
User Comments (0)
About PowerShow.com