Title: Maximizing PeertoPeer Portals Through XML: An Integration Case Study from the EPA
1Maximizing Peer-to-Peer Portals Through XML An
Integration Case Study from the EPA
- Presentation for
- Enterprise Web and Corporate Portal Conference
- September 5-6, 2001, Santa Clara, California
- by Brand Niemann, Ph.D.
- Office of Environmental Information
- U.S. Environmental Protection Agency
2Overview
- The Big Picture
- XML and P2P 101
- Integrating Information and Applications
- Questions and Answers
3The Big Picture
The Big Picture
The Semantic Web
XML
Structured Data Information
Categories, Metadata, Databases
Integrated Web Pages
Titles Metatags
Personal Web Pages
P2P Content Networks
Web Pages
Portals
4Portals and Content Networks
NXT 3 Interface
Search, Personalization, Document Management,
Metadata, etc.
Content Network Hierarchical Folders Each a
Portal!
Portlets
Portal (s)
Portlets
5Portals and Content Networks
- NXT 3 options
- Customize NXT 3 interface as a portal 4 day
class. - Integrate with Groupware (e.g., Lotus Notes) and
Content Management System (e.g., Interwoven,
Vignette, and DOCS Open). - NextPage Product Announcement
- Universal Updates in Peer Space - called
Proactive Delivery at this conference. - Solo offline distribution of portal.
- Matrix collaborate on distributed content in
context of a business process.
6XML and P2P are a Disruptive Technology and
Architecture
- Repurpose or republish content.
- Breaks down information silos/stovepipes.
- Challenges traditional centralization and
security practices. - Improves upon a simple topics view of
categorization (as content grows in size and
diversity, need multiple topics and more topics). - Queries produce new content.
- Etc.
7The Value Proposition
- Corporate and government information is
undervalued and hidden because it is trapped in
proprietary formats and stovepipe systems so it
is not fully accessible and is difficult and
expensive to integrate. - XML make information more accessible and
interoperable and future proofs it from
periodic technology change.
8XML 101
- Key Questions
- 1. Is it a programming language like Java or
something in plain text that is read and acted
upon by a browser? - 2. What browsers can handle it and how prevalent
are they? - 3. What are some EPA problems and applications?
- 4. Is it as easy to learn as HTML?
- 5. Are there helpful tools like for HTML?
- 6. How soon will it be prevalent?
- 7. What would be a killer application for EPA?
9Key Question 1
- Is it a programming language like Java or
something in plain text that is read and acted
upon by a browser? - Simple answer Plain text on purpose.
- More complete answer eXtensible Markup Language
(XML) is an incredibly powerful system for
managing information. Use it with many other
technologies (Java, ASP-Active Server Pages,
etc.) HTML defines how elements are displayed
XML defines what those elements contain.
10Background
- 1991 Tim Berners-Lee designed the WWW (Weaving
the Web, HarperBusiness, 2000, paperback) - 1993 Marc Andreesen created Mosaic and Netscape
Web browser - 1996 XML proposed by the W3C
- 2001 About 2 billion Web pages (mostly HTML)
- HTML(HyperText Markup Language) A simple, but
elegant way of formatting data with special tags
in a text file that can be viewed on virtually
any computer platform. - XML(eXtensible Markup Language) Based on the
same parent as HTML (SGML) designed to better
handle the task of managing information. - HTML lets everyone do some things and XML lets
some people do practically anything. - World Wide Web Consortium
- Standard Generalized Markup Language
11Key Question 2
- What browsers can handle it and how prevalent are
they? - Simple answer Internet Explorer 5.5 and W3Cs
Amaya (also an editor). Netscape 6 (Mozilla)? - More complete answer Use XML to manage data now
and convert it to HTML on the server-side for Web
browsers that lack XML support. Client-side
technology is lagging, but the new SVG is an
important step forward in Web user interface
technology. (See http//maps.map.net/start) - Scalable Vector Graphics nearly a stable W3C
Recommendation.
12Background
- World Wide Web Consortium (W3C)
- Created in 1994 to lead the WWW to its full
potential by developing common protocols that
promote its evolution and ensure its
interoperability. - More than 500 organizations worldwide participate
in this forum for information, commerce,
communication, and collective understanding. - http//www.w3.org/
- XML is the universal format for structured
documents and data on the Web and became a W3C
Recommendation in February 1998.
13Key Question 3
- What are some EPA problems and applications?
- Simple answer XML technology and Peer-to-Peer
(P2P) architecture will make practically
everything we do better, faster, and cheaper
(XML A Managers Guide, Addison-Wesley
Information Technology Series, 2000). - More complete answer It is being used or
planned for use in Web database delivery, data
exchange and integration, electronic records
management, public access content management, and
distributed content integration. The EPA XML
Technical Advisory Group has a database of
projects and applications in Lotus Notes. - Charted by OEI management in July 2000.
14Selected Examples
- Web database delivery OSWER's Chemical Emergency
Preparedness and Prevention Office Local
Emergency Planning Committee Database (LEPC),
http//www.epa.gov/ceppo/lepclist.htm - Data exchange and integration Integrated
Taxonomic Information System (ITIS) Canadian XML
Version, http//sis.agr.gc.ca/pls/itisca/taxaget?p
_ifx - Public access content management and distributed
content integration EPA Node on a Federal
Government Content Network, http//www.sdi.gov/ser
ver.htm
15Background
16Six Databases Need 30 Filters
Oracle
Postgress
Sybase
mySQL
Informix
Access
17Six Databases and An XML Hub Only Need 12 Filters
Oracle
Postgress
Sybase
mySQL
XML Hub
Informix
Access
18XML for InterchangeBetween Applications
Database
GIS
Spreadsheet
XML Repository
XML
OLAP Data Warehouse
3D Visualization
19Key Question 4
- Is it as easy to learn as HTML?
- Simple answer No, but there are resources that
make learning it like HTML. See XML for the World
Wide Web Visual QuickStart Guide, Elizabeth
Castrow, Peachpit Press, http//www.cookwood.com/x
ml/index.html - More complete answer I recommend training for
all managers and for hands-on workers because - If you think XML is just for techies, or arent
sure what it is, youre already behind the curve. - XML is the new standard for exchanging data
electronically. - XML is a better way of organizing Web content.
- XML will help you do a lot of things faster,
better, and cheaper. - Source XML A Managers Guide Book Foreword by
Dr. David A. Taylor.
20Key Question 5
- Are there helpful tools like for HTML?
- Simple answer Yes, and they continue to
improve. - More complete answer - XML-Journal Readers'
Choice Awards (the Oscar's of the Software
Industry), http//www.sys-con.com/xml/readerschoi
ce/index.html - XMLSpy 3.0 won 6 of the 13 award categories!
http//www.xmlspy.com/
21Key Question 6
- How soon will it be prevalent?
- Simple answer It is becoming prevalent outside
the agency now because of the Federal CIO XML
Working Group and will become so at EPA because
of the CDX and NEIEN projects. - More complete answer See The State of XML Why
Individuals Matter, XML.COM, http//www.xml.com/p
ub/a/2001/05/30/stateofxml.html - Many existing technologies are being
re-engineered to take advantage of XML, gaining
interoperability benefits previously too costly
to realize (called the attack of the angle
brackets). - Industries are finding that XML vocabularies can
form a basis for collaboration and cost-cutting. - XMLs influence is proving disruptive to the
technological status quo. - Central Data Exchange
- National Environmental Information Exchange
Network
22Key Question 7
- What would be a killer application for EPA?
Bridge the Digital Divide and provide universal
access to Web content! - Simple answer The phone remains the ubiquitous
communications device and can be used to meet the
new Section 508 requirements, so if you can
access content via the Web from a browser, you
can access it using VoiceXML from the telephone. - More complete answer The VoiceXML Forum, the
W3C Voice Browser Working Group, and vendors
provide standards and tools - http//www.voicexml.org/
- http//www.w3.org/Voice/
- http//studio.tellme.com/
23VoiceXML Schematic
24P2P 101
- P2P Collaboration Turn desktop PC into a server
that directly shares data with other PCs. - P2P Networking Each node in the network
functions as both server and client without the
need for central systems. - P2P Technology Just like the Internet fast,
flexible, and decentralized. (Ossi Urchs,
Internet Guru) - P2P and Web Services Both are about Web servers
communicating with one another.
25Hierarchical Peer-to-Peer Architecture
Key Client Nodes (outer circles) Server Nodes
(inner circles)
26True Peer-to-Peer Architecture
Key Peer Nodes (all circles)
27Integrating Information and Applications
- Environmental Node and Earth Science Portal EPA
and USGS - Federal Statistics and Tools FedStats.Net and
Beyond - XML Pilot Projects Portal National
Environmental Information Exchange Network and
EPAs Central Data Exchange - The Uberportal FedGov Content Network
- Gartner Group, Local Briefing, Emerging Internet
Technologies, June 27, 2001, p. 19.
28Content Network Concepts
- Folders can contains files, databases, and Web
resources. - Folders can/should be on different Web servers,
but look and function as though they are on the
same Web server. - This is accomplished by two new XML-based
standards that send lean XML messages between the
Web servers - Content Network Protocol (CNP)
- eXtensible Indexing Language (XIL)
- Distributed folders and nodes can be managed both
centrally and locally by the Content Network
Manager and the Manage Content Administration
Tools.
29(No Transcript)
30NXT 3 Technology
- Version 3.2 just released
- Cross-platform with port to Sun Solaris
(Intel/Windows and Sun/Solaris can be connected
peer Web servers). - ORACLE database adapter and 87 new language
encoders. - New installation wizard migrates previous content
collections. - Application Framework for use with leading portal
products.
31NXT 3 Architecture
32Content Network Designand Management
33Environmental Node
34Earth Science Portal
35Federal Statistics and Analysis Tools -
FedStats.Net and Beyond
- Virtual centralization of Federal statistics.
http//www.fcw.com/fcw/articles/2001/0108/cov-xmlb
x3-01-08-01.asp - Repurposing of Annual Statistical Abstract 1999
and into an XML document database. - Republishing of the Annual Statistical Abstract
2000 into an Integrated Distributed Statistical
Compendia for live content management. - Integration of Federal statistics with analysis
and visualization tools from Insightful
Corporation.
36Virtual Centralization of Federal Statistics
37Republishing of the Annual Statistical Abstract
2000
38StatServer Modeling
39StatServer Graphlets
40XML Pilot Projects Portal
- National Environmental Information Exchange
Network - A standards-based, highly interconnected,
dynamic, flexible and secure network, operating
with broad-based voluntary participation of
individual state environmental agencies and EPA. - Federal Computer Week, June 18, 2001
http//www.fcw.com/fcw/articles/2001/0618/pol-epa-
06-18-01.asp - EPAs Central Data Exchange
- The point for electronic entry for nearly all
environmental data submissions to the agency. - http//www.epa.gov/cdx/
- TIBCO Extensibility Canon/Developer
- A leading design-time repository that manages the
development and deployment of XML assets
utilizing a Web-based interface.
41(No Transcript)
42TIBCO Extensibility Canon/Developer
43The UberportalFedGov Content Network
- Purpose
- The FedStats.Net Content Network proof of
concept was a success so apply it to the
integration of Federal Web portals (about 50)
into a Federal Government Content Network. - FirstGov.Gov, Science.Gov, etc. could use content
network technology to compliment and supplement
their work with search engine-based technologies.
44Concepts
- Web search engine-based technology and efforts
help find and organize content for content
networks. - Content network technology adds value to the Web
experience by providing more structure to and
improved searching of the actual content, either
in its original form and location or repurposed
or republished using XML and P2P technologies and
tools.
45Strategy
- There are 6 basic ways to integrate Web portals
into a Content Network using NXT 3 technology - 1. Use the Web Content Service to crawl and index
the contents of external Web sites to integrate
their content. - 2. Use the Content Network Link to connect to
other Web servers running NXT 3 to syndicate
their content (Server P2P). - 3. Replicate the content of a Web server on a
central Web server because of agency security
constraints.
46Strategy (continued)
- Six basic ways (continued)
- 4. Re-purpose or re-publish key content to
improve its usability in a content network. - 5. XML-ize proprietary search engine indices.
- 6. Use distributed content generation
technologies to feed the content network from the
grassroots level (Desktop P2P).
47Examples of Work in Progress
- Work on an example for each of the six basic
ways - 1. Crawl and index many Web documents and Web
sites. - 2. Syndicate NXT 3 servers NextPage, EPA, and
USGS so far. - 3. Replicate content Federal databases on
CD/DVD and the Web (the top 100!?).
48Examples of Work in Progress (continued)
- Work on an example for each of the six basic ways
(continued) - 4. Re-purpose or re-publish Distributed
Integrated Statistical Compendia based on the
Census Bureaus Year 2000 Statistical Abstract
and other examples. - 5. XML-ize proprietary search engine indices
looking for search engine that will provide it. - 6. Distributed content generation technologies
using Manage Content feature of NXT 3 and
encouraging organizations to try My Association
technology.
49Questions and Answers
- Brand Niemann. Ph.D.
- USEPA Headquarters, EPA West, Room 6143D
- Office of Environmental Information, MC 2822T
- 1200 Pennsylvania Avenue, NW, Washington, DC
20460 - 202-566-1657
- niemann.brand_at_epa.gov
- EPA http//161.80.70.167
- Outside EPA http//130.11.44.140