Maximizing PeertoPeer Portals Through XML: An Integration Case Study from the EPA - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Maximizing PeertoPeer Portals Through XML: An Integration Case Study from the EPA

Description:

Enterprise Web and Corporate Portal Conference. September 5-6, 2001, Santa ... Notes) and Content Management System (e.g., Interwoven, Vignette, and DOCS Open) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 50
Provided by: Niem
Category:

less

Transcript and Presenter's Notes

Title: Maximizing PeertoPeer Portals Through XML: An Integration Case Study from the EPA


1
Maximizing Peer-to-Peer Portals Through XML An
Integration Case Study from the EPA
  • Presentation for
  • Enterprise Web and Corporate Portal Conference
  • September 5-6, 2001, Santa Clara, California
  • by Brand Niemann, Ph.D.
  • Office of Environmental Information
  • U.S. Environmental Protection Agency

2
Overview
  • The Big Picture
  • XML and P2P 101
  • Integrating Information and Applications
  • Questions and Answers

3
The Big Picture
The Big Picture
The Semantic Web
XML
Structured Data Information
Categories, Metadata, Databases
Integrated Web Pages
Titles Metatags
Personal Web Pages
P2P Content Networks
Web Pages
Portals
4
Portals and Content Networks
NXT 3 Interface
Search, Personalization, Document Management,
Metadata, etc.
Content Network Hierarchical Folders Each a
Portal!
Portlets
Portal (s)
Portlets
5
Portals and Content Networks
  • NXT 3 options
  • Customize NXT 3 interface as a portal 4 day
    class.
  • Integrate with Groupware (e.g., Lotus Notes) and
    Content Management System (e.g., Interwoven,
    Vignette, and DOCS Open).
  • NextPage Product Announcement
  • Universal Updates in Peer Space - called
    Proactive Delivery at this conference.
  • Solo offline distribution of portal.
  • Matrix collaborate on distributed content in
    context of a business process.

6
XML and P2P are a Disruptive Technology and
Architecture
  • Repurpose or republish content.
  • Breaks down information silos/stovepipes.
  • Challenges traditional centralization and
    security practices.
  • Improves upon a simple topics view of
    categorization (as content grows in size and
    diversity, need multiple topics and more topics).
  • Queries produce new content.
  • Etc.

7
The Value Proposition
  • Corporate and government information is
    undervalued and hidden because it is trapped in
    proprietary formats and stovepipe systems so it
    is not fully accessible and is difficult and
    expensive to integrate.
  • XML make information more accessible and
    interoperable and future proofs it from
    periodic technology change.

8
XML 101
  • Key Questions
  • 1. Is it a programming language like Java or
    something in plain text that is read and acted
    upon by a browser?
  • 2. What browsers can handle it and how prevalent
    are they?
  • 3. What are some EPA problems and applications?
  • 4. Is it as easy to learn as HTML?
  • 5. Are there helpful tools like for HTML?
  • 6. How soon will it be prevalent?
  • 7. What would be a killer application for EPA?

9
Key Question 1
  • Is it a programming language like Java or
    something in plain text that is read and acted
    upon by a browser?
  • Simple answer Plain text on purpose.
  • More complete answer eXtensible Markup Language
    (XML) is an incredibly powerful system for
    managing information. Use it with many other
    technologies (Java, ASP-Active Server Pages,
    etc.) HTML defines how elements are displayed
    XML defines what those elements contain.

10
Background
  • 1991 Tim Berners-Lee designed the WWW (Weaving
    the Web, HarperBusiness, 2000, paperback)
  • 1993 Marc Andreesen created Mosaic and Netscape
    Web browser
  • 1996 XML proposed by the W3C
  • 2001 About 2 billion Web pages (mostly HTML)
  • HTML(HyperText Markup Language) A simple, but
    elegant way of formatting data with special tags
    in a text file that can be viewed on virtually
    any computer platform.
  • XML(eXtensible Markup Language) Based on the
    same parent as HTML (SGML) designed to better
    handle the task of managing information.
  • HTML lets everyone do some things and XML lets
    some people do practically anything.
  • World Wide Web Consortium
  • Standard Generalized Markup Language

11
Key Question 2
  • What browsers can handle it and how prevalent are
    they?
  • Simple answer Internet Explorer 5.5 and W3Cs
    Amaya (also an editor). Netscape 6 (Mozilla)?
  • More complete answer Use XML to manage data now
    and convert it to HTML on the server-side for Web
    browsers that lack XML support. Client-side
    technology is lagging, but the new SVG is an
    important step forward in Web user interface
    technology. (See http//maps.map.net/start)
  • Scalable Vector Graphics nearly a stable W3C
    Recommendation.

12
Background
  • World Wide Web Consortium (W3C)
  • Created in 1994 to lead the WWW to its full
    potential by developing common protocols that
    promote its evolution and ensure its
    interoperability.
  • More than 500 organizations worldwide participate
    in this forum for information, commerce,
    communication, and collective understanding.
  • http//www.w3.org/
  • XML is the universal format for structured
    documents and data on the Web and became a W3C
    Recommendation in February 1998.

13
Key Question 3
  • What are some EPA problems and applications?
  • Simple answer XML technology and Peer-to-Peer
    (P2P) architecture will make practically
    everything we do better, faster, and cheaper
    (XML A Managers Guide, Addison-Wesley
    Information Technology Series, 2000).
  • More complete answer It is being used or
    planned for use in Web database delivery, data
    exchange and integration, electronic records
    management, public access content management, and
    distributed content integration. The EPA XML
    Technical Advisory Group has a database of
    projects and applications in Lotus Notes.
  • Charted by OEI management in July 2000.

14
Selected Examples
  • Web database delivery OSWER's Chemical Emergency
    Preparedness and Prevention Office Local
    Emergency Planning Committee Database (LEPC),
    http//www.epa.gov/ceppo/lepclist.htm
  • Data exchange and integration Integrated
    Taxonomic Information System (ITIS) Canadian XML
    Version, http//sis.agr.gc.ca/pls/itisca/taxaget?p
    _ifx
  • Public access content management and distributed
    content integration EPA Node on a Federal
    Government Content Network, http//www.sdi.gov/ser
    ver.htm

15
Background
16
Six Databases Need 30 Filters
Oracle
Postgress
Sybase
mySQL
Informix
Access
17
Six Databases and An XML Hub Only Need 12 Filters
Oracle
Postgress
Sybase
mySQL
XML Hub
Informix
Access
18
XML for InterchangeBetween Applications
Database
GIS
Spreadsheet
XML Repository
XML
OLAP Data Warehouse
3D Visualization
19
Key Question 4
  • Is it as easy to learn as HTML?
  • Simple answer No, but there are resources that
    make learning it like HTML. See XML for the World
    Wide Web Visual QuickStart Guide, Elizabeth
    Castrow, Peachpit Press, http//www.cookwood.com/x
    ml/index.html
  • More complete answer I recommend training for
    all managers and for hands-on workers because
  • If you think XML is just for techies, or arent
    sure what it is, youre already behind the curve.
  • XML is the new standard for exchanging data
    electronically.
  • XML is a better way of organizing Web content.
  • XML will help you do a lot of things faster,
    better, and cheaper.
  • Source XML A Managers Guide Book Foreword by
    Dr. David A. Taylor.

20
Key Question 5
  • Are there helpful tools like for HTML?
  • Simple answer Yes, and they continue to
    improve.
  • More complete answer - XML-Journal Readers'
    Choice Awards (the Oscar's of the Software
    Industry), http//www.sys-con.com/xml/readerschoi
    ce/index.html
  • XMLSpy 3.0 won 6 of the 13 award categories!
    http//www.xmlspy.com/

21
Key Question 6
  • How soon will it be prevalent?
  • Simple answer It is becoming prevalent outside
    the agency now because of the Federal CIO XML
    Working Group and will become so at EPA because
    of the CDX and NEIEN projects.
  • More complete answer See The State of XML Why
    Individuals Matter, XML.COM, http//www.xml.com/p
    ub/a/2001/05/30/stateofxml.html
  • Many existing technologies are being
    re-engineered to take advantage of XML, gaining
    interoperability benefits previously too costly
    to realize (called the attack of the angle
    brackets).
  • Industries are finding that XML vocabularies can
    form a basis for collaboration and cost-cutting.
  • XMLs influence is proving disruptive to the
    technological status quo.
  • Central Data Exchange
  • National Environmental Information Exchange
    Network

22
Key Question 7
  • What would be a killer application for EPA?
    Bridge the Digital Divide and provide universal
    access to Web content!
  • Simple answer The phone remains the ubiquitous
    communications device and can be used to meet the
    new Section 508 requirements, so if you can
    access content via the Web from a browser, you
    can access it using VoiceXML from the telephone.
  • More complete answer The VoiceXML Forum, the
    W3C Voice Browser Working Group, and vendors
    provide standards and tools
  • http//www.voicexml.org/
  • http//www.w3.org/Voice/
  • http//studio.tellme.com/

23
VoiceXML Schematic
24
P2P 101
  • P2P Collaboration Turn desktop PC into a server
    that directly shares data with other PCs.
  • P2P Networking Each node in the network
    functions as both server and client without the
    need for central systems.
  • P2P Technology Just like the Internet fast,
    flexible, and decentralized. (Ossi Urchs,
    Internet Guru)
  • P2P and Web Services Both are about Web servers
    communicating with one another.

25
Hierarchical Peer-to-Peer Architecture
Key Client Nodes (outer circles) Server Nodes
(inner circles)
26
True Peer-to-Peer Architecture
Key Peer Nodes (all circles)
27
Integrating Information and Applications
  • Environmental Node and Earth Science Portal EPA
    and USGS
  • Federal Statistics and Tools FedStats.Net and
    Beyond
  • XML Pilot Projects Portal National
    Environmental Information Exchange Network and
    EPAs Central Data Exchange
  • The Uberportal FedGov Content Network
  • Gartner Group, Local Briefing, Emerging Internet
    Technologies, June 27, 2001, p. 19.

28
Content Network Concepts
  • Folders can contains files, databases, and Web
    resources.
  • Folders can/should be on different Web servers,
    but look and function as though they are on the
    same Web server.
  • This is accomplished by two new XML-based
    standards that send lean XML messages between the
    Web servers
  • Content Network Protocol (CNP)
  • eXtensible Indexing Language (XIL)
  • Distributed folders and nodes can be managed both
    centrally and locally by the Content Network
    Manager and the Manage Content Administration
    Tools.

29
(No Transcript)
30
NXT 3 Technology
  • Version 3.2 just released
  • Cross-platform with port to Sun Solaris
    (Intel/Windows and Sun/Solaris can be connected
    peer Web servers).
  • ORACLE database adapter and 87 new language
    encoders.
  • New installation wizard migrates previous content
    collections.
  • Application Framework for use with leading portal
    products.

31
NXT 3 Architecture
32
Content Network Designand Management
33
Environmental Node
34
Earth Science Portal
35
Federal Statistics and Analysis Tools -
FedStats.Net and Beyond
  • Virtual centralization of Federal statistics.
    http//www.fcw.com/fcw/articles/2001/0108/cov-xmlb
    x3-01-08-01.asp
  • Repurposing of Annual Statistical Abstract 1999
    and into an XML document database.
  • Republishing of the Annual Statistical Abstract
    2000 into an Integrated Distributed Statistical
    Compendia for live content management.
  • Integration of Federal statistics with analysis
    and visualization tools from Insightful
    Corporation.

36
Virtual Centralization of Federal Statistics
37
Republishing of the Annual Statistical Abstract
2000
38
StatServer Modeling
39
StatServer Graphlets
40
XML Pilot Projects Portal
  • National Environmental Information Exchange
    Network
  • A standards-based, highly interconnected,
    dynamic, flexible and secure network, operating
    with broad-based voluntary participation of
    individual state environmental agencies and EPA.
  • Federal Computer Week, June 18, 2001
    http//www.fcw.com/fcw/articles/2001/0618/pol-epa-
    06-18-01.asp
  • EPAs Central Data Exchange
  • The point for electronic entry for nearly all
    environmental data submissions to the agency.
  • http//www.epa.gov/cdx/
  • TIBCO Extensibility Canon/Developer
  • A leading design-time repository that manages the
    development and deployment of XML assets
    utilizing a Web-based interface.

41
(No Transcript)
42
TIBCO Extensibility Canon/Developer
43
The UberportalFedGov Content Network
  • Purpose
  • The FedStats.Net Content Network proof of
    concept was a success so apply it to the
    integration of Federal Web portals (about 50)
    into a Federal Government Content Network.
  • FirstGov.Gov, Science.Gov, etc. could use content
    network technology to compliment and supplement
    their work with search engine-based technologies.

44
Concepts
  • Web search engine-based technology and efforts
    help find and organize content for content
    networks.
  • Content network technology adds value to the Web
    experience by providing more structure to and
    improved searching of the actual content, either
    in its original form and location or repurposed
    or republished using XML and P2P technologies and
    tools.

45
Strategy
  • There are 6 basic ways to integrate Web portals
    into a Content Network using NXT 3 technology
  • 1. Use the Web Content Service to crawl and index
    the contents of external Web sites to integrate
    their content.
  • 2. Use the Content Network Link to connect to
    other Web servers running NXT 3 to syndicate
    their content (Server P2P).
  • 3. Replicate the content of a Web server on a
    central Web server because of agency security
    constraints.

46
Strategy (continued)
  • Six basic ways (continued)
  • 4. Re-purpose or re-publish key content to
    improve its usability in a content network.
  • 5. XML-ize proprietary search engine indices.
  • 6. Use distributed content generation
    technologies to feed the content network from the
    grassroots level (Desktop P2P).

47
Examples of Work in Progress
  • Work on an example for each of the six basic
    ways
  • 1. Crawl and index many Web documents and Web
    sites.
  • 2. Syndicate NXT 3 servers NextPage, EPA, and
    USGS so far.
  • 3. Replicate content Federal databases on
    CD/DVD and the Web (the top 100!?).

48
Examples of Work in Progress (continued)
  • Work on an example for each of the six basic ways
    (continued)
  • 4. Re-purpose or re-publish Distributed
    Integrated Statistical Compendia based on the
    Census Bureaus Year 2000 Statistical Abstract
    and other examples.
  • 5. XML-ize proprietary search engine indices
    looking for search engine that will provide it.
  • 6. Distributed content generation technologies
    using Manage Content feature of NXT 3 and
    encouraging organizations to try My Association
    technology.

49
Questions and Answers
  • Brand Niemann. Ph.D.
  • USEPA Headquarters, EPA West, Room 6143D
  • Office of Environmental Information, MC 2822T
  • 1200 Pennsylvania Avenue, NW, Washington, DC
    20460
  • 202-566-1657
  • niemann.brand_at_epa.gov
  • EPA http//161.80.70.167
  • Outside EPA http//130.11.44.140
Write a Comment
User Comments (0)
About PowerShow.com