Title: Use of XML in LDR's Integrated Tax System
1Use of XML in LDR's Integrated Tax System
Louisiana Department of Revenue
Technology Conference San Antonio, TX August 13 -
16, 2000
2Background
LDR is currently engaged in a cooperative
endeavor with IBM for a complete redesign and
redevelopment of the software systems that
support the administration of taxes. The system
is being designed, developed and implemented
using the following
- Thin pc clients (Windows NT) with applications
and data residing on a mainframe server - Object-oriented analysis and design using
Rational Rose design tools - Java development using Visual Age for Java
- MQSeries for message handling
- MQSeries WorkFlow as the workflow manager
- DB2 (6.X, most current version at time of
implementation) as the database on an OS/390
3Challenge
- The department exchanges and processes data from
multiple sources in a variety of formats. - For today and in the future, the goal is to
develop a system with a standardized approach for
processing data created and processed in multiple
formats.
4Solution XML
- XML is the the most logical solution to the
challenge of developing a system with a
standardized approach for processing data that is
created and processed in multiple formats.
5Reasons for choosing XML
- XML is simple, straightforward and human readable
- XML is platform independent
- XML is programming language independent
- XML is extensible and easy to maintain
- Standardized interfaces (APIs) for processing XML
data - Many tools exist for parsing and transforming XML
data - Standardized (W3C)
6Key Definitions
- XML - extensible markup language is an open
standard (W3C) that provides a data format and a
data modeling language for defining data. - DTD - document type definition is the modeling
mechanism for XML. It provides the rules for how
XML data is defined and logically related. - Well-formed XML - an XML document in proper XML
format, but with no structural conformance to a
DTD (flat XML). - Valid XML - an XML document in proper XML format
with a structural conformance to a DTD
(structured XML). - XSLT - extensible stylesheet language for
transforming XML documents into other XML
documents.
7Uses of XML at LDR
- Data exchange format for forms processed by the
system - External sources
- Internal sources
- Data exchange format for data between sub-systems
- Data exchange format for legacy system data being
converted into the new system - Data exchange format for data exchanged to other
LDR systems
8External Sources of Forms
- Data entry of original forms
- Scanned original forms
- Others, as defined and implemented
- Electronic filings
- EDI
- EFT
- Internet
- Flat files of various types of data (tape and
diskette)
9Original Documents from Data Entry or Scanners
- All original forms (remitted by the taxpayer) are
converted into an internally developed format
called Universal Data Format. - The data is validated for syntactical and
contextual correctness. - Data passing validation is routed further into
the system for conversion into flat XML. - Data failing validation will not be processed any
further within the system.
10Reasons for Using UDF
- Validating data in UDF format is a relatively new
process that works well. - This step provides assurance that data which the
system played no part in creating, is valid to
the extent that it can be processed within the
system. - Cost and timing factors weighed into the decision
to retain this method of validation. - Future plans are to develop validation routines
against data in XML format to eliminate this
step.
11Example of UDF Record
000001INIT0101BATHDR1234567890123451998-12-31-00.
00.00.00000000560013001000010001300200003000130030
000400016004072894263 1 2 3 4
5 6
7 8 9
10 11 12 13 14 15 16 17
18 19 20
12Example of UDF Record (cont.)
Where, 1.) 000001 is the record identifier
attribute value of the header 2.) INIT is the
record type attribute value of the batch
header 3.) 01 is the segment number attribute
value of the header 4.) 01 is the total number
of the segments attribute value of the
header 5). BATHDR is the document type
attribute value of the header 6.)
123456789012345 is the batch identifier
attribute value of the header, for
illustration purposes only. 7.)
1998-12-31-00.00.00.000000 is the processing
date attribute value of the header 8.) 0056
is the length of the variable portion of the
record. 9.) 0013 is the parameter length
(parameter included in the total) of the Number
of Returns in the Batch parameter 10.) 0010
is the code identifier of the Number of Returns
in the Batch parameter 11.) 00030 is the
value of the Number of Returns in the Batch
parameter . . .
13UDF Validation Diagram
14Conversion of UDF to XML
- UDF format records are converted to flat
(well-formed) XML. The flat form of the record
is simply a mapping out of the UDF data in XML
format. - The flat XML records are transformed using XSLT
into structured (valid) XML. XSLT expects, at
least, a well-formed document for transforming.
15Reasons for Two Phased Conversion
- Allows a very simple format of flat XML data to
be created by external systems for conversion, as
required, to structured XML. - Flat XML is a better format to receive data from
external sources into the system for conversion
because that single format of data is simple and
can be transformed into many different structured
versions of the record by transforming the data
with multiple DTDs using XSLT.
16UDF to XML Conversion Diagram
17Example of Flat XML Document
lt?xml version"1.0"?gt ltformgt ltfield
id"1000"gt2003lt/fieldgt ltfield
id"1010"gt1234567891lt/fieldgt ltfield
id"1015"gt333333333lt/fieldgt ltfield
id"1017"gt233300lt/fieldgt ltfield
id"1040"gt19991231lt/fieldgt ltfield
id"1050"gtCITMlt/fieldgt ltfield
id"1055"gt20000522lt/fieldgt ltfield
id"1060"gt20000526lt/fieldgt ltfield
id"1105"gtMAILlt/fieldgt ltfield
id"1125"gtNlt/fieldgt ltfield id"1130"gtNlt/fieldgt
ltfield id"1135"gtNlt/fieldgt ltfield
id"1140"gtNlt/fieldgt . . . lt/formgt
18Example of DTD
lt?xml encoding'UTF-8' ?gt lt!-- edited with XML
Spy v3.0 NT (http//www.xmlspy.com) --gt lt!--
STARTER FILE CONTAINING ALL TAX FORM XML
Includes Global Definitions LDR
Form CFT4 LDR Form CIFT620 LDR Form
IT620ES Revision DRAFT Date May 30,
2000 TBD refine definitions for entities
with strict formats? --gt lt!-- ENTITIES --gt lt!--
TBD promote appropriate constructs to entities
as needed --gt lt!-- ELEMENTS --gt lt!--
--gt lt!-- Generics --gt lt!-- --gt
19Example of DTD (cont.)
lt!-- Identification Numbers --gt lt!ELEMENT LRAN
(PCDATA )gt lt!-- Louisiana Revenue Account
Number --gt lt!ELEMENT FEIN (PCDATA )gt lt!--
Federal Employer Identification Number
--gt lt!ELEMENT BusinessCodeNumber (PCDATA
)gt lt!-- Dates --gt lt!ELEMENT YearMonthDay
(PCDATA )gt lt!ELEMENT YearMonth (PCDATA
)gt lt!ELEMENT Year (PCDATA )gt lt!ELEMENT
DateIssued (PCDATA )gt lt!-- Periods
--gt lt!ELEMENT Period (PeriodStart? , PeriodEnd
)gt lt!ELEMENT PeriodStart (PCDATA )gt lt!ELEMENT
PeriodEnd (PCDATA )gt lt!-- Names --gt lt!ELEMENT
BusinessName (PCDATA )gt lt!ELEMENT PersonName
(PCDATA )gt
20Example of DTD (cont.)
lt!-- Addresses --gt lt!ELEMENT MailingAddress
(Street , StateOrProvince , Country? ,
ZipOrPostalCode )gt lt!ELEMENT Street (PCDATA
)gt lt!ELEMENT StateOrProvinceOfIncorporation
(PCDATA )gt lt!ELEMENT StateOrProvince (PCDATA
)gt lt!ELEMENT ZipOrPostalCode (PCDATA
)gt lt!ELEMENT Country (PCDATA )gt lt!--
Telephone Numbers --gt lt!ELEMENT Telephone
(PCDATA )gt . . .
21Example of a Structured XML Document
lt?xml version"1.0" encoding"UTF-8" ?gt
lt!DOCTYPE TaxFormLDRCIFT620 (View Source for full
doctype...)gt ltTaxFormLDRCIFT620gt ltFormHeader
formName"CIFT620" /gt ltBasicBusinessInfogt
ltLRANgt1234567891lt/LRANgt ltFEINgt333333333lt/FEINgt
ltBusinessCodeNumbergt233300lt/BusinessCodeNumbergt
lt/BasicBusinessInfogt ltPeriodgt
ltPeriodEndgt19991231lt/PeriodEndgt lt/Periodgt
. . . lt/TaxFormLDRCIFT620gt
22Document Renderer
- Developed for the specific purpose of converting
form data from other formats into XML and XML to
other data formats for processing within the
system. - Enables transforming between XML and Java objects
for processing and efficient storage of data. - Data is rendered using a SAX (simple api for XML)
compliant parser. SAX is a simple API for
parsing XML documents and almost all parsers
support it.
23Processing of Form Data using the Document
Renderer
- A file of structured XML forms that are ready to
be validated are read by a form validation
application. - The application will rely on the document
renderer to convert the structured XML version of
the form to the corresponding domain objects
required for validation. - The validation rule engine will validate the form
for correctness within the context of the
taxpayers registration and accounting profile. - The validated data in the domain objects is
persisted in the underlying database.
24(No Transcript)
25Internal Sources of Forms
- Internal forms originating from a GUI
- Internal forms generated from system processes
26Internal Forms Originating from a GUI
- Users key data utilizing a GUI. The resulting
raw data is passed on for transformation. - The raw data is converted to an XML version of a
form and passed on for rendering. - The XML version of the form is rendered into the
domain objects for validation. - Once validation is complete, the data is either
persisted in the database or routed back to the
presentation layer for correction.
27(No Transcript)
28Internal Forms Generated from System Processes
- Processes within the system may independently
determine that there is a need for adjustments to
the taxpayers account. - All accounting data emanates from a form.
- A request for the creation of a form to initiate
the creation of the adjusting accounting data
must be made.
29(No Transcript)
30Legacy Data Conversion
- Taxpayer registration data must be migrated from
the legacy system into the new system to populate
the database. - A COBOL program was written to extract legacy
registration data and create a file of structured
XML records. - The structured XML records are parsed into domain
objects and the data contained in the domain
objects is persisted in the underlying database.
31(No Transcript)
32Data Exchanged to Other LDR Systems
- A data warehouse application will require input
data from the main system database. - The data needs to be reformatted to assume a
meaningful context in the warehouse applications.
33(No Transcript)
34Lessons Learned
- Keep things simple
- Carefully plan, design and develop the DTD using
clear and descriptive comments in its definition - Many XML manipulation tools are available and
many more are on the way! - Keep abreast of advancement in technology
- XML Schema
- XML Binding
- Others
35Sites of Interest
- www.alphaworks.ibm.com
- www.apache.org
- xml.com
- xml.org
- www.ebxml.org
- www.oasis-open.org
- www.w3c.org
- Java.sun.com/xml
36Conclusion
- Whats the big deal? Theres nothing magic going
on here. - XML simply serves as a means of exchanging and
transforming data. - With a standardized format for data exchange and
open source initiatives for software to transform
this data, the bulk of the design and development
effort can be targeted toward the logic of
business processing.
37Contact Information
Barry Aucoin Louisiana Department of Revenue LDR
Information Services Division email
baucoin_at_rev.state.la.us phone (225)
925-4220 fax (225) 922-0850