Title: Web Content Authoring and Management Tools with XML
1Web Content Authoring and Management Tools with
XML
- FedWeb 2002 Tutorial, May 20, 2002
- Brand Niemann and the Team
- US Environmental Protection Agency
- Office of Environmental Information
2Overview
- Preface
- This tutorial is being videotaped by Susan
Turnbull (GSA) for use in the Instant Access
software at the Universal Access Collaboration
Workshop on July 16th. - Part 1 (9 a.m.-12 noon)
- XML (eXtensible Markup Language) 9-1015.
- VoiceXML (XML for the telephone) 1030-1115.
- GML (XML for geospatial databases) 1115-1145.
- Questions and Answers 1145-12.
- Part 2 (1-4 p.m.)
- Web Content Management Strategies, Tools and Best
Practices (Howard McQueen).
3Part 1 The Team
- XML
- Brand L. Niemann (Sr.), US EPA.
- Brand K. Niemann (Jr.), Tax Analysts.
- VoiceXML
- Art Clarke, Tellme Networks.
- Janina Sajka and Katie Haritos-Shea, American
Foundation for the Blind. - Simon Chung and Craig Brown, NextPage.
- GML
- Chris Tucker, Ionic Enterprise.
- Videotape and Indexing
- Susan Turnbull, GSA.
- Antoinette Purdon, Instant Index.
4Part 1 Training Materials
- XML
- This presentation others at http//130.11.44.140
. - XML Web Services EPA-State Content Network
- Web Publishing on DVD Repurposing Federal Data
in XML. - VoiceXML
- XML Web Services VoiceXML and Phone Directories
and others by the Team members. - GML
- GML and Open Web Services and others at
http//www.ionicenterprise.com/.
5Contents
- 1. Background
- 2. Creating Even Better Content from Good
Content - 3. Managing Content as Collections and a Network
- 4. Web Content Management Tools
- 5. Contact Information
61. Background
- 1.1 What have I done with XML?
- 1.2 Why XML?
- 1.3 What is XML?
- 1.3.1 General.
- 1.3.2 Parts of a Well-Formed XML Document.
- 1.3.3 Supports Data Binding to HTML.
- 1.3.4 Separates Content from Presentation.
- 1.3.5 Supports Multi-Channel Dissemination.
- 1.3.6 Supports Web Content Management and XML Web
Services. - 1.3.7 Exchange Nodes and Content Networks.
- 1.3.8 XML Training Resources.
71.1 What have I done with XML?
- Federal CIO Working Group on XML, January 2001
- XML Project Centralizes Agency Stats, Federal
Computer Week, January 8, 2001. - FedWeb, March 2001
- The Future of Portals Case Study of FedStats.Net
as a Model for Collaboration and Data
Integration. - Portals How e-Government is Transforming
Communication with Citizens From Portals to Peer
Space. - GSA Office of Intergovernmental Solutions
Newsletter, XML Applications in Government,
February 2002 - Building Peer-to-Peer XML Content Networks of Web
Services for Federal Scientific and Statistical
Data and Information FedStats.Net and Beyond. - GAO Report to Congress on XML, April 2002
- Challenges to Effective Adoption of the
Extensible Markup Language (contributor).
81.2 Why XML?
- The eXtensible Markup Language became a World
Wide Web Consortium (W3C) standard in 1998 as the
universal format for structured documents and
data on the Web (http//www.w3.org/XML/). - The CIO Council created the XML Working Group in
2000 to facilitate the efficient and effective
use of XML through cooperative efforts among
government agencies, including partnerships with
commercial and industrial organizations
(http//xml.gov/). - GAO report to Congress urges government to adopt
XML (http//www.gao.gov/new.items/d02327.pdf).
91.3.1 What is XML?General
- XML is a standard for preserving and
communicating information encoding, tagging,
and internationalizing that will be everywhere. - Web Services provide communication between
applications running on different Web servers
that will bring the Internet to its new level. - XML Web Services are applications running on
different devices that communicate XML data using
XML messages. - Web Services can and should be interoperable
across multiple vendor tools and platforms in the
enterprise (see http//www.ws-i.org/Community.aspx
).
101.3.2 What is XML? Parts of a Well-Formed XML
Document
- XML
Declaration - Comment
- White Space
- href"Inventory01.css"? Processing Instruction
- End of Prolog
- White Space
-
-
- The Adventures of Huckleberry
Finn - Mark Twain
- mass market paperback
- 298
- 5.49
-
- - Document Element (Root Element)
- -
-
- The Turn of the Screw
- Henry James
111.3.3 What is XML?Supports Data Binding to HTML
Link an XML document to an HTML page and then
bind standard HTML elements to individual XML
elements which save time money on delivering
small Web databases. The XML file has many other
uses (e.g. Section 508 accessibility, roundtrip
to Excel, etc.) and future proofs your data
against periodic technology changes.
121.3.4 What is XML?Separates Content from
Presentation
Personalization Customer Relationship Management
Distributed
Content Network Uber Portal
Presentation
Traditional
Content Network Integrated Portal
Centralized
Centralized
Distributed
Content
131.3.5 What is XML?Supports Multi-Channel
Dissemination
- Web
- Print
- CD/DVD
- XML Web Service
- Telephone
- Digital Talking Books
- Other
141.3.6 What is XML? Supports Web Content
Management and XML Web Services
- XML indexing of PDF document collections.
- Re-purposing PDF and Web documents to XML content
collections. - Extracting and creating XML data tables from PDF
and other Web documents. - Converting relational databases to XML and XML
Web Services. - Delivering selected content to other channels
like the telephone. - Converting spatial data to GML (Geography Markup
Language) and integrating it with non-spatial XML
content. - Centralized and distributed.
151.3.7 What is XML? Exchange Nodes and Content
Networkshttp//www.epa.gov/neengprg/
161.3.7 What is XML?Exchange Nodes and Content
Networkshttp//fedgov.nextpage.com
NXT 3 Interface
Search, Personalization, Document Management,
Metadata, etc.
Content Network Hierarchical Folders Each Can
be a Portal on Different Web Server!
Portlets
Portal (s)
Portlets
171.3.8 What is XML? XML Training Resources
- Video
- E.g., Introduction to XML Video (see next slide).
- Commercial
- E.g., Microsoft Visual Studio .NET, etc.
- Online (free and cost)
- E.g., xml.gov, xml.org, and xml.com
- Develop in-house capability
- E.g. EPA http//130.11.44.140.
181.3.8 What is XML?Introduction to XML Video
- Chapter 1 XML in Business (20 minutes)
- Chapter 2 History of XML (27 minutes)
- Chapter 3 Theory of Markup (7 minutes)
- Chapter 4 Introduction to XML Syntax (14
minutes) - Chapter 5 XML in the Real World (6 minutes)
- Chapter 6 Information Stewardship (4 minutes)
- More Information (1 minute)
- Purchase http//www.synthbank.com/xmlvideo.htm
191.3.8 What is XML? Key questions answered by
video
- What is XML?
- Who developed XML?
- How is XML different from HTML?
- Why is XML important to my business?
- Can I begin to use XML today?
- What tools and companies support XML?
202. Creating Even Better Content from Good Content
- 2.1 CIA
- World Fact Book Country Profiles.
- Repurposing HTML to XML and creating new content
presentations (XML HitList Table). - 2.2 Census Bureau
- 2.2.1 Statistical Abstract of the United States.
- Repurposing PDF and Excel to XML.
- 2.2.2 USA County Spatial Statistics
- Content Adapter for Relational Database to XML
and Custom Search Form in XML. - 2.3 Comparison of Navigation and Searching.
- 2.3.1 Census Bureau.
- 2.3.2 NXT 3 Content Collection.
212.1 World Fact Book Country Profiles
222.1 World Fact Book Country Profiles
232.1 World Fact Book Country Profiles
242.1 World Fact Book Country Profiles
252.2.1 Statistical Abstract of the United States
262.2.1 Statistical Abstract of the United States
272.2.2 USA County Spatial Statistics
282.2.2 USA County Spatial Statistics
292.3 Comparison of Navigation and Searching
- 2.3.1 Census Bureau
- 2.3.1.1 42 Separate PDF files.
- 2.3.1.2 No Search.
- 2.3.1.3 1500 Excel Tables on a Separate CD-ROM.
- 2.3.2 NXT 3 Content Collection
- 2.3.2.1 Hierarchical Structure.
- 2.3.2.2 Integration of Text and Tables.
- 2.3.2.3 Excel Table Within the Document.
- 2.3.2.4 Custom Search Form.
- 2.3.2.5 Relevance Ranked Hit List.
- 2.3.2.6 Hits Highlighted in Document and Tables.
302.3.1.1 42 Separate PDF files
312.3.1.2 No Search
322.3.1.3 1500 Excel Tables on a Separate CD-ROM
332.3.2.1 Hierarchical Structure
342.3.2.2 Integration of Text and Tables
352.3.2.3 Excel Table Within the Document
362.3.2.4 Custom Search Form
372.3.2.5 Relevance Ranked Hit List
382.3.2.6 Hits Highlighted in Document and Tables
392.3.2.6 Hits Highlighted in Document and Tables
403. Managing Content asCollections and a Network
- 3.1 Definitions.
- 3.2 Concepts
- 3.2.1 The Uberportal and the NXT 3 Interface.
- 3.2.2 Integration of Portals.
- 3.3 Examples
- 3.3.1 Housing and Urban Development.
- 3.3.2 US Geological Survey.
- 3.3.3 Environmental Protection Agency.
- 3.4 FirstGov Content Management Survey
- 3.4.1 General Questions (XML 1 of 12)
- 3.4.2 Author Questions (XML 0 of 20)
- 3.4.3 Advanced Questions (XML 4 of 22)
- A portal that sits on top of the portals (The
Gartner Group, Emerging Internet Technologies,
Local Briefing, June 27, 2001, page 19).
413.1 Definitions
- A content collection is a collection of one or
more documents. Content collections may contain
any type of content (documents, databases,
applications, and other digital media), as well
as considerable internal folder structure. - A site contains one or more content collections
organized into hierarchies. When users access a
site, they are presented with a site table of
contents that allows them to browse the content
collections stored on the site. They may also
search across all content collections or a subset
of the content collections. The result of a
search is a list of matches that are linked to
corresponding documents. - Note Special tools are used to build content
collections or to create an empty content
collection and check documents into this new
collection (see Section 4). - Based on NextPage NXT 3. See NextPage NXT 3
Quick Start.
423.2.1 The Uberportal the NXT 3 Interface
NXT 3 Interface
Search, Personalization, Document Management,
Metadata, etc.
Content Network Hierarchical Folders Each Can
be a Portal on Different Web Server!
Portlets
Portal (s)
Portlets
433.2.2 Integration of Portals
- Web search engine-based technology and efforts
help find and organize content for content
networks using NXT 3 as follows - 1. Use the Web Content Service to crawl and index
the contents of external Web sites to integrate
their content. - 2. Use the Content Network Link to connect to
other Web servers running NXT 3 to syndicate
their content (Server P2P). - 3. Replicate the content of a Web server on a
central Web server because of agency security
constraints or other mitigating circumstances. - 4. Re-purpose or re-publish key content to
improve its usability in a content network. - 5. XML-ize proprietary search engine indices.
- 6. Use distributed content generation
technologies to feed the content network from the
grassroots level (Desktop P2P).
443.3.1 Housing and Urban Development Structured
documents
453.3.1 Housing and Urban Development Real data
from relational databases
463.3.1 Housing and Urban Development XML query
templates (housing by state)
473.3.1 Housing and Urban Development Real data
XML hitlist query results
483.3.2 US Geological SurveyAgency Publications
493.3.2 US Geological SurveyGeneral Interest
Publications
503.3.2 US Geological SurveySpecial Interest
Publications
513.3.2 US Geological SurveySearch across all
publications
523.3.3 Environmental Protection AgencyBackground
- Requests from multiple EPA offices for help with
XML training and pilots (financial, public
relations, environmental information, superfund,
research development, and water). - Select the very best content for each office to
be XML-ized and to be integrated into a content
network using the best technology. - Registered the best content with its metadata in
the content network that is both centralized and
distributed. - The content network supports the new agency
initiatives like Environmental Indicators
Initiative and State of the Environment Report,
Environmental Health Tracking Network (EHTN), and
the Situation Room. - The content network supports the agency goals of
(1) creating the building blocks of an exchange
network (2) enable integration of environmental
data and (3) provide vital services to EPA and
the public.
533.3.3 Environmental Protection AgencyEPA-State
Content Network
543.3.3 Environmental Protection Agency National
Coastal Condition Report
- The Problem
- Large PDF files (14) totaling 114.6 MB!
- Files range in size from 0.1 17.2 MB.
- Pages slow to render and print (200 pages)
because of multi-colored backgrounds, graphics,
and photographs. - Lots of data graphics, but few data tables.
- Neither a structured table of contents PDF file
nor in Tagged format for export to XML. - The Solution
- NXT 3 makes search and display across the entire
collection of files very efficient and fast
because of XML. - http//www.epa.gov/owow/oceans/nccr/index.html
553.3.3 Environmental Protection Agency National
Coastal Condition Report
563.3.3 Environmental Protection Agency National
Coastal Condition Report
573.4 FirstGov Content Management Survey
- 3.4.1 General Questions (XML 1 of 12)
- 9. Intended use of XML How important is it for
you to employ XML in the acquisition, management,
and/or delivery of content? - 3.4.2 Author Questions (XML 0 of 20)
- 3.4.3 Advanced Questions (XML 4 of 22)
- 1. Need for XML tools How important is it for
you for the system to have native XML processing
tools and functions built in?
583.4 FirstGov Content Management Survey(continued)
- 3.4.3 Advanced Questions (XML 4 of 22)
- 2. Need for XML Standards Support How important
is it for you to support XML-based standards such
as RSS, ICE, ebXML, and the Web Services family
(e.g. SOAP). - 3. Existing XML Usage Have you already
developed DTDs or Schemas to validate your XML
content? - 4. Current Usage of XML Stylesheets Have you
already developed XSL Stylesheets to transforms
your XML documents?
594. Web Content Management Tools
- 4.1 Recent Evaluations
- 4.2 NXT 3 e-Content P2P Platform
- 4.2.1 Concepts.
- 4.2.2 Architecture and Services.
- 4.2.3 Content Network Manager.
- 4.2.4 Manage Content (Interface for Document
Management and Building Metadata). - 4.3 NextPage Apps and Tools
- 4.3.1 Matrix (Virtual Collaborative Peer to Peer
Space). - See Washington Post, January 3, 2002, page E01,
Deals Become Online Models For Learning. - 4.3.2 Solo (Offline Access to the Content
Network). - 4.3.3 RapidApps (Interwoven Integration).
604.1 Recent Evaluations
- NextPage NXT 3 P2P Platform
- Andy Warzecha, The META Group, 3/12/2002
- If companies want to do cross-enterprise content
management, NextPage has the solution - "Content networks provide a way for users to
simultaneously access Internet sites, databases,
intranets and other formal or informal content
resources as if the content existed in a single
location." - "The advantage of this approach is that new
content sources can be added quickly ... This
puts power in the hands of business users to
quickly tie in or disconnect the various content
sources they require access to." (see next slide) - Peer-to-peer Every device connected to the
network is both a server and consumer of content.
614.1 Recent EvaluationsMETA Group Content
Network with Portal
624.1 Recent Evaluations
- NextPage NXT 3 P2P Platform
- Esther Dysons Release 1.0, 1/22/2002
- NextPage is unique in the content-management
market in its distributed approach - NetxPages platform, NXT 3, virtually connects
the distributed information sources and makes
them appear integrated to the user. Unlike
syndication, in which content is copied and
integrated with other content locally, NextPage
keeps objects where they are. - NextPage uses the standard simple object access
protocol (SOAP) to exchange and normalize
information between local content directories,
assembling meta-indexes so that users can search
or manipulate content transparently, regardless
of physical location.
634.2.1 Concepts
- Folders can contains files, databases, and Web
resources. - Folders can/should be on different Web servers,
but look and function as though they are on the
same Web server. - This is accomplished by two new XML-based
standards that send lean XML messages between the
Web servers - Content Network Protocol (CNP)
- eXtensible Indexing Language (XIL)
- Distributed folders and nodes can be managed both
centrally and locally by the Content Network
Manager and the Manage Content Administration
Tools.
644.2.1 Concepts
654.2.2 Architecture and Services
664.2.2 Architecture and Services
674.2.3 Content Network Manager
- The Content Network Manager is the graphic user
interface utility for managing NXT 3 servers. In
addition to providing a GUI interface to all
server configuration information, and INI files
associated with the NXT 3 product, Content
Network Manager allows you to modify a site and
server configuration without shutting down the
server. Content Network Manager allows you to log
in and manage networks of sites on the local host
or on a remote server.
684.2.3 Content Network Manager
694.2.3 Content Network Manager
- As an NXT 3 site administrator, you can control
the organization of your site and what content to
display. Others may provide content for you, but
you determine how the content is integrated with
your site. Typically, related items are gathered
into collections or sub-collections to keep
themes together. For example, you might use two
primary collections on your siteone for
internally created information and one for
externally created information (purchased from a
content publisher). Each of these general
collections could then contain sub-collections.
For example, your internal collection might
contain accounting standards, policies and
procedures, human resource information, and ISO
9000 documents. Your external collection would
probably be organized by the publisher of the
information.
704.2.4 Manage ContentInterface for Document
Management and Building Metadata
- Properties
- A document stored in a content collection has a
set of properties. NXT 3 supports these document
element properties - ID
- Name
- Title
- Hidden
- Version
- Content Type
- Encoding
- Compression
- First-child-content
- Index
- Location
- DSE
- Indexsheet
714.2.4 Manage ContentInterface for Document
Management and Building Metadata
- Metadata
- NextPage recommends using the Resource
Description Framework (RDF) for metadata. The
metadata used by NXT 3 is specific metadata used
for searching or resource discovery. Metadata
support allows the defining, creating, storing,
indexing, searching, retrieving, etc. of
metadata. Metadata can exist within the resource
that it is describing (internal metadata), or it
can exist in a separate file (external metadata)
that is associated with the content file. By
default, NextPage uses the Dublin Core rules as a
foundation for processing external metadata, as
in Manage Content.
724.2.4 Manage ContentInterface for Document
Management and Building Metadata
734.3.1 MatrixVirtual Collaborative Peer to Peer
Space
744.3.2 SoloOffline Access to the Content Network
754.3.3 RapidAppsInterwoven Integration
765. Contact Information
- Brand Niemann, Ph.D.
- USEPA Headquarters, EPA West, Room 6143D
- Office of Environmental Information, MC 2822T
- 1200 Pennsylvania Avenue, NW, Washington, DC
20460 - 202-566-1657
- niemann.brand_at_epa.gov
- EPA http//161.80.70.167
- Outside EPA http//130.11.44.140
77Part 1 VoiceXML
- Demonstration of the EPA VoiceXML application.
- Brand Niemann, US EPA
- VoiceXML Using the Tellme Studio and Tellme
Networks Infrastructure - Art Clarke of Tellme Networks.
- VoiceXML for Digital Talking Books
- Janina Sajka and Katie Haritos-Shea of the
American Foundation for the Blind and Art Clarke. - XML for the Las Vegas Blue Pages Pilot
- Simon Chung and Craig Brown of NextPage.
- XML to VoiceXML for the Las Vegas Blue Pages
Pilot - Art Clarke and Simon Chung.