Title: Corporate Overview
1Creating order from IT chaos?
Central Data eXchange (CDX) Then, Now and in the
Future Michael C. Daconta Chief Scientist,
Advanced Programs Group 8/28/2003
2250 Corporate Park Drive Suite 500
Herndon, VA 20171 703.326.1000
http//www.mcdonaldbradley.com
2Agenda
- Introduction
- Then
- eMortgage experience
- VKB experience
- XML Training, Books Design experience.
- Now
- Unnecessary Complexity
- No integration strategy
- Limited Semantic Interoperability
- Future
- Streamline current system
- Federation
- Semantic Technologies
- Conclusion
3Introduction
- Michael C. Daconta
- Director, Semantic Web Technologies Chief
Scientist, Advanced Programs Group, MBI - Chief DIA Architect, VKB Collateral Space/NCES
- Author/co-author of 10 technical books
- Inventor of Fannie Mae XML Electronic Mortgage
Standards (2 patents pending)
4Then Overview
- The proposal that almost was
- Lockheed Martin missed the CDX submission
deadline by 20 minutes. - Daconta as Key Personnel
- Would have participated in Orals
- Team leveraged MBIs architecture experience in
proposal. - Lessons Learned from
- Electronic mortgages XML development
- Teaching XML
- Virtual Knowledge Base Design and Implementation
5Then eMortgage experience
- Chief architect of Fannie Maes eMortgage
architecture. - Adopted as a MISMO standard
- Patented by Fannie Mae
- Key achievements
- SMART Documents (patented)
- Rule-based validation (patented)
- Secure Transmission Package with Manifest
6eMortgage Experience (2)
- SMART Document
- Separates presentation and data
- Arcs link/transform fields
- Lessons Learned
- Simple Structure
- No ID/IDREF
- Secured
- XML Signature
- Verifiable
- Validator
- Use of precedent and analogy in design
7Then VKB Experience
- Virtual Knowledge Base
- Virtual
- Data Bus Architecture (web services)
- Standard Enterprise Services (producer/consumer
federation) - Knowledge Base
- Resource Registry
8Then XML Design Experience
- XML since 1996
- Standard in 1998
- XML Training
- Web Services in 1999
- Standard in 2003
- Design articles
- Are Elements and Attributes Interchangeable?
- Other Design issues
- Containment versus IDREF
- Modularization
- Latest RPC versus Document-based Web Services
- Lesson XML Design matters
9Now Overview
- Quick Review of key documents
- Blueprint for a National Environmental
Information Exchange Network - Core Reference Model
- XML Design Rules and Conventions
- XML Schemas
- Final Network node Functional Specification V1.0
- CDX Web Services Definition Language (WSDL)
- Following analysis of CDX and EIEN
- Caveat understanding based only on document
review and conversations with Brand Niemann
10Now Unnecessary Complexity
- Second-system pitfall
- Every technology goes through three stages
first a crudely simple and quite unsatisfactory
gadget second, an enormously complicated group
of gadgets designed to overcome the shortcomings
of the original and achieving thereby somewhat
satisfactory performance through extremely
complex compromise third, a final proper design
there from. - Robert Heinlein, The Rolling
Stones - XML Design Complexities slow adoption
- Container approach simpler than ID/IDREF
- Always choose the simplest method to achieve a
goal! - XLink/XPointer not well supported/needed
- Multiple Namespaces is an advanced concept.
- Data Type Derivation must be carefully scoped
11Now Unnecessary Complexity (2)
- Core Reference Model
- The metamodel reinvents the wheel
- Data Element, Data Block, Compound Data Block,
Major Data Group ? better to stick with OOP
terms. - OOP adoption Java, .Net, UML, OWL
- Omits concept of Instances and Associations
- Too many parts need to Simplify and Streamline
- Precedent OSI versus TCP/IP
- Document-level and data-level distinction
unnecessary - Class is a recursive concept merge Data Block
and Data Group (or better, switch to Classes). - Simplify and Streamline with a clear vision (see
future slides) will enable - Aggressive completion of remaining formats ( lt 6
mos.) - Return to national initiative
- Universal exchange (or access) is a necessary
pre-condition to higher level goals like semantic
interoperability.
12Now No Integration Strategy
- Hub and spokes architecture
- CDX is the exchange hub
- Point-to-Point XML exchanges
- No Information Integration
- Blueprint stated Information, especially
integrated information, is an increasingly
important environmental management tool. - Whats the mechanism for integration?
- Virtual Knowledge Base federation concepts
- Resource Registration
- Data Producer Registration
- Data Consumer Registration
- DODs Net-centric Enterprise Services (NCES)
Strategy - Post to a shared, managed, network space
- Space accessible via portal interface or system
interface
13Now Limited Semantic Interoperability
- Core Reference Model standardization
- Some Shared Semantics
- Schemas Review
- Beach-Monitoring Data
- Good
- Good use of enumerations
- Bad
- Exclusively elements
- Weak hierarchy
- StartDate, StopDate ? concept of Duration
- BinaryLargeObjectTypeCode should use MIME type.
14Schema Review (2)
- Drinking Water Bacteriological Analysis
- Good
- Robust hierarchy
- Bad
- Exclusively elements
- High tag to data ratio in example not enough
optional or conditional tags? - Certification statement should use XML signature
- Label redundancy i.e. MailingAddressCityName,
MailingAddressText, MailingAddressStateName - Mixed Metaphors SenderPhoneFaxEmail versus
SamplerContactInfo ? both are Contact - Facility versus PWSFacility ambiguity.
15Schema Review (3)
- National Emissions Inventory Schema
- Only reviewed Area Non Road Submission Group
- Good
- Good Type creation/constraints
- Bad
- Exclusively elements affects performance since
sample file is 2.6 MB. Poor design decision. - Uses different elements for concepts Contact
and Duration in other schemas.
16Schema Review (4)
- Facility Registration System V2.2
- Good
- Good documentation
- Good modularization
- Good hierarchy
- Bad
- Could use better semantics
- Person versus Individual Data Type
- individual is a broader concept
- FacilitySiteTypeName is a string?
- Most types should be enumerations.
17Schema Review (5)
- CDX WSDL
- Soap encoding violates WS-I Basic Profile
- Paragraph 5.6.4 ( see http//www.ws-i.org/Profiles
/Basic/2003-08/BasicProfile-1.0a.htm) - Should use Document-based web services
- (See http//webservices.devchannel.org/webservices
/03/07/11/2122220.shtml?tid25tid38) - Type should use MIME type.
- Document structure should include Dublin core
metadata. - Only binary content should be base64 encoded
(expands size by 1/3). - Operations are too generic to serve as federated
web services. (See IFIS in Future slides)
18Future Overview
- Initial Recommendations
- Streamline, Simplify and complete Current System
- Establish a distributed Federation
- Leverage Semantic Technologies
- Each recommendation needs further study and
refinement - And aggressive Implementation!
19Future Streamline Current System
- Simplify
- Remove complexities from guidance
- Streamline
- Remove unnecessary parts
- Leverage Enterprise Schema Management Tools
- SchemaLogic, BlueOxide, etc.
- Complete
- Aggressive drive to 1.0 drafts for all data
exchanges - Most efficient mechanism expert draft followed
up by a workshop with the influenced
organizations to hammer it into a 1.0.
20Future Federation
- What is a federation?
- A group of cooperating independent entities.
- Also known as grids, fabrics, service-oriented-arc
hitectures and distributed systems. - Examples
- Digital library federation, grid-computing, jini,
agent federations, ebXML and UDDI federated
registries - Requirements for Data federation
- Registry
- Web-service polymorphism (static or dynamic
binding) - Reliability and failover
21Future Federation (2)
- NCES Federation Prototype (OSD program)
- Intelligent Federated Index Search (IFIS)
currently integrates 6 enterprise systems.
22Future Semantic Technologies
- Overview
- Enterprise Architect Article
- Smart Data Continuum
- NCES impact put the smarts in the data and NOT
the application endpoints - Semantic Technologies for eGov Workshop
- Purpose of Semantic Web Book
- Understandable by non-PhDs
- Show the Return On Investment (ROI)
- Provide a Roadmap
23Future Semantic Technologies (2)
- Data Independence is step one.
- Data is more important than applications.
- Data value increases with the number of
connections it shares. - Data about data can expand to as many layers as
there are meanings. - Data modeling harmony is the alignment of syntax,
semantics, and pragmatics. - Data and logic are the yin and yang of
information processing. - Data modeling makes the implicit explicit and the
transparent apparent. - Data standardization is not amenable to
competition. - Data modeling must be decentralized.
- Data relations must not be based on probability
or luck. - Data is truly independent when the next
generation need not reinvent it.
24Conclusion
- Then
- Experience matters.
- Now
- You are on the right track (though progress has
been slow) - Current problems are fixable.
- Need aggressive champions with expertise to
finish! - Future
- Get to federation.
- Get to Semantic Interoperability.
- McDonald Bradley, Inc. can help you get there!
- Questions?