Title: XML, Java, and the future of the Web
1XML, Java, and the future of the Web
CSE 597B Computational Issues in Ecommerce
Sandip Debnath, Dr. C Lee Giles Dr. David
Pennock Dr. Ingemar Cox Dr. Hongyuan Zha
2Layout of the Presentation(XML, Java, and the
future of the Web)
- Background (HTML, SGML) etc.
- XML Effort
- What is XML?
- Why XML?
- How it can be used?
- XML syntax, elements, attributes, validation,
support, parsing,and displaying - Related concepts CSS, XSL etc.
- Advanced concepts Namespace, CDATA, Encoding,
Server etc.(will be discussed later) - XML applications and technologies
- Java Effort
- General Java Concept
- Java for XML
-
3Background (HTML, SGML)(XML, Java, and the
future of the Web)
- Most documents in Web are in HTML (which is based
on SGML ISO 8879) - Problems in HTML
- Extensibility HTML does not allow users to
specify their own tags. - Structure HTML does not support the
specification of deep structures. - Validation HTML specification does not allow
consuming applications to check data for
structural validity. - Structural looseness HTML itself is not strict
enough to impose structural integrity. - However SGML contains many optional features that
are needed for Web applications which are tapped
to create a new Markup Language, XML.
4Birth of XML(XML, Java, and the future of the
Web)
- The first phase started in June 96, culminated
in XML1.0, issued in Feb 98 - The second phase resulted in XML Namespaces (Jan
99) and Style Sheet Linking (June 99) - In Sep 99, the third phase started to finish
unfinished second phase and on XML query - XML protocol activity was launched in Sep 00
- Working groups
- Schema working group
- Query working group
- Linking working group
- Core working group
- Coordination group
5What is XML anyway?(XML, Java, and the future of
the Web)
- XML stands for eXtensible Markup Language
- XML is a markup language much like HTML.
- XML was designed to describe data.
- XML tags are not predefined in XML. You must
define your own tags. - XML uses a DTD (Document Type Definition) to
describe the data. - XML with a DTD is designed to be self-descriptive
- Differs from HTML in the following way
- Information providers can define new tag and
attribute names at will - Document structures can be nested to any level of
complexity - Any XML doc can contain optional description of
its grammar for the consuming application to
understand and validate the structural integrity.
6Why XML?(XML, Java, and the future of the Web)
- Differences from HTML tells the initial benefit
of XML, and reasons behind its birth. - XML Will
- Enable internationalized media-independent
electronic publishing - Allow industries to define platform-independent
protocols for the exchange of data, especially
the data of electronic commerce - Deliver information to user agents in a form that
allows automatic processing after receipt - Make it easier to develop software to handle
specialized information distributed over the Web - Make it easy for people to process data using
inexpensive software - Allow people to display information the way they
want it, under style sheet control - Make it easier to provide metadata -- data about
information -- that will help people find
information and help information producers and
consumers find each other --- W3C activity
statement
7Why XML is so important?(XML, Java, and the
future of the Web)
- Plain text XML is not a binary format, so you
can create and edit files with anything from a
standard text editor to a visual development
environment. That makes it easy to debug your
programs, and makes it useful for storing small
amounts of data. - Data Identification XML tells you what kind of
data you have, not how to display it. Because the
markup tags identify the information and break up
the data into parts, an email program can process
it, a search program can look for messages sent
to particular people, and an address book can
extract the address information from the rest of
the message. In short, because the different
parts of the information have been identified,
they can be used in different ways by different
applications. - Stylability When display is important, the
Stylesheet Standard, XSL, lets you dictate how to
portray the data. - Inline reusabilityUnlike HTML, XML entities can
be included "in line" in a document. The included
sections look like a normal part of the document
-- you can search the whole document at one time
or download it in one piece. That lets you
modularize your documents without resorting to
links. You can single-source a section so that an
edit to it is reflected everywhere the section is
used, and yet a document composed from such
pieces looks for all the world like a one-piece
document. - LinkabilityThe XLink protocol is a proposed
specification to handle links between XML
documents. In general, the XLink specification
targets a document or document-segment using its
ID. The XPointer specification defines mechanisms
for "addressing into the internal structures of
XML documents", without requiring the author of
the document to have defined an ID for that
segment - Easily Processed XML is a vendor-neutral
standard, you can choose among several XML
parsers, any one of which takes the work out of
processing XML data. - HierarchicalXML documents benefit from their
hierarchical structure. Hierarchical document
structures are, in general, faster to access
because you can drill down to the part you need,
like stepping through a table of contents. They
are also easier to rearrange, because each piece
is delimited. In a document, for example, you
could move a heading to a new location and drag
everything under it along with the heading,
instead of having to page down to make a
selection, cut, and then paste the selection into
a new location.
8How it can be used?(XML, Java, and the future of
the Web)
-
-
- Acme Pharmaceuticals Co.
-
- 7301 Smokey Boulevard
- Smallville
- Indiana
- 94571
-
-
- Matching start and end tags (must be followed,
unlike HTML, it is strict here) - Element A piece of information marked by tags
- Attributes (E.g. countryUS)
- Note the presence of nesting of tags
9How it can be used? (contd)(XML, Java, and the
future of the Web)
XML is a low-level syntax for representing
structured data. You can use this simple syntax
to support a wide variety of applications
(Following figure is taken from
http//www.W3C.org)
10How it can be used? (contd)(XML, Java, and the
future of the Web)
- XML can separate data from HTML
- XML can be used to exchange data.
- XML and B2B it is going to be the main language
for financial data exchange - XML can be used to share data.
- XML can be used to store data
- XML can be used to create new languages (WAP, WML)
11XML syntax(XML, Java, and the future of the Web)
- XML documents use a self describing (also
creators responsibility)and simple syntax -
-
- Tove
- Jani
- Reminder
- Don't forget me this weekend!
-
- XML documents must have a opening and a closing
tag - XML tags are case sensitive
- XML elements must be properly nested
- XML elements must have a root tag
- Values must be quoted
- XML strips off unnecessary tabs, spaces
- With XML CR/LF is always converted to LF
12XML elements(XML, Java, and the future of the
Web)
- XML documents can be extended to carry more
information - XML elements have relationship (parent-child
etc.) - In the last slide note is the root element (a
document must have a root element) - In the last slide To, From, etc. are called
children of the root and they are siblings to
each other - Elements can have different content
- Mixed
- Simple
- Attributes
- Element naming rules
- Names can contain letters, numbers, and other
characters - Names must not start with a number or other
punctuation characters - Names must not start with the letters xml (or XML
or Xml ..) - Names cannot contain spaces
13XML Attributes(XML, Java, and the future of the
Web)
- XML elements can have optionally attributes
-
-
- Quote styles demo.asp or demo.asp both are
valid - Elements can be stored in either as elements or
as attributes. Either the following -
- Anna
- Smith
-
- Or
-
- female
- Anna
- Smith
-
- is valid.
14XML Validation(XML, Java, and the future of the
Web)
- Well Formed XML XML document which follows the
XML syntax correctly - Valid XML XML document which is Well Formed and
also validated against the corresponding DTD. - You can define the corresponding DTD name inside
a Well Formed XML document. -
-
-
- Tove
- Jani
- Reminder
- Don't forget me this weekend!
-
15XML Validation(contd.)(XML, Java, and the future
of the Web)
- DTD (Document Type Definition) DTD defines the
legal elements of an XML document.The purpose of
a DTD is to define the legal building blocks of
an XML document. It defines the document
structure with a list of legal elements.A DTD can
be defined inline in XML doc or as an external
reference. -
-
-
-
-
-
-
-
- Tove
- Jani
- Reminder
- Don't forget me this weekend
16XML Validation-DTD(contd.)(XML, Java, and the
future of the Web)
The DTD above is interpreted like this!DOCTYPE
note (in line 2) defines that this is a document
of the type note.!ELEMENT note (in line 3)
defines the note element as having four elements
"to,from,heading,body".!ELEMENT to (in line 4)
defines the to element to be of the type
"PCDATA".!ELEMENT from (in line 5) defines the
from element to be of the type "PCDATA"and so
on (PCDATA Parsed Character DATA)
17XML Validation-DTD(contd.)(XML, Java, and the
future of the Web)
     Â
       (CCC  DDD)           Â
 Â
18XML Support(XML, Java, and the future of the Web)
- Netscape has promised full XML support in its
next browser. - IE 5.0 supports XML1.0 and the XML DOM (these are
set by W3C). IE 5.0 has the following support. - Viewing of XML documents
- Full support for W3C DTD standards
- XML embedded in HTML as Data Islands
- Binding XML data to HTML elements
- Transforming and displaying XML with XSL
- Displaying XML with CSS
- Access to the XML DOM
19XML Parsing(XML, Java, and the future of the Web)
- The following are some of the well known XML
parsers available in the market - GNOME XML (Unix/Linux/Windows)
- Library Oracle XML parser for Java (java)
- XP (Java)
- XML Validate (Java)
- Xerces-C (Win32 (MSVC 6.0 compiler) Linux
(RedHat 6.0), Unix ) - Oracle XML parser for C (Linux, Solaris 2.6 and
NT 4 / Service Pack 3 (and above) ) - Lark (Java)
- XML4cobol (Cobol)
- XML parser for PL/SQL (Oracle 8i)
- HEX (Java)
- TcIXML (Tcl)
- Xjparser (Java)
- ActiveDOM (Active X)
- Xmlproc (Python)
- Xparse (Javascript)
- Java Project X (Java)
- SAX2 XML Utilities (Java)
- Electric XML (Windows , Unix)
20XML Parsing(contd.)(XML, Java, and the future of
the Web)
- XML parser for C (C)
- DTDParser (Java (versions for windows, linux,
unix) ) - XMLParser (Perl)
- Xerces-P (Perl)
- XML4C (C )
- TinyXML (Java)
- XML for Java (Java)
- AElfred (Jaba)
- XmlTree (VB)
- XML Validator (C, binary available for Windows
and Linux-intel platforms. ) - XMLBooster (C, Cobol, Delphi, and Java.)
- SP(c)
- JAXP (Java)
- Larval (Java)
- Markup (OCaml)
- Fxp (SML)
- SXP Silfide XML Parser
- X-Fetch Performer (Windows)
- Microsoft XML Parser
21XML Parsing(contd.)(XML, Java, and the future of
the Web)
The XML Parsing details using Java needs some
basic introduction to Java. The next few slides
will talk about the new evolutionary programming
language Java.
22Java Efforts(XML, Java, and the future of the
Web)
- The Java Programming language has brought new
concepts of - Platform independent,
- 100 Object Oriented Methodology Supporting
- programming language which also has other good
features like - Automatic Garbage Collection,
- Simple Pointer-less programming concepts and
more. - No multiple inheritence
- Huge number of APIs
- Networking support
- CGI look-alike Servlet classes and
- Support to traditional programming as well as to
new industry trends.
23Java Efforts (contd.)(XML, Java, and the future
of the Web)
- Some of the products under the Java umbrella are
- JavaTM 2 Platform, Standard Edition (J2SETM
)The essential Java 2 SDK, tools, runtimes, and
APIs for developers writing, deploying, and
running applets and applications in the Java
programming language. Also includes earlier Java
Development Kit versions JDKTM 1.1 and JRE 1.1 - JavaTM 2 Platform, Enterprise Edition (J2EETM)
- Combines a number of technologies in one
architecture with a comprehensive Application
Programming Model and Compatibility Test Suite
for building enterprise-class server-side
applications. - JavaTM 2 Platform, Micro Edition (J2METM)
- A highly optimized Java runtime environment
targeting a wide range of consumer products,
including pagers, cellular phones, screenphones,
digital set-top boxes and car navigation systems. - Consumer Embedded Technologies Products
- The Java Consumer and Embedded technologies and
products let you write code for small devices
that are big on functionality but short on
resources.
24Java Efforts (contd.)(XML, Java, and the future
of the Web)
- COMPLETE PRODUCT LIST (by product group)
- Java 2 Platform, Standard Edition Product
FamilySoftware Development Kits Runtimes - JavaTM 2 SDK, Standard Edition, v 1.3
- JavaTM 2 SDK, Standard Edition, v 1.2.2
- JavaTM 2 SDK, Standard Edition, Source Release
- JavaTM 2 Runtime Environment, Standard Edition, v
1.2.2 - JavaTM Plug-in
- JavaTM Web Start
- Java Development Kit (JDKTM) 1.1.8 (JDK 1.1.8)
- JavaTM Runtime Environment 1.1.8 (JRE 1.1.8)
- JDKTM Japanese Supplement 1.1.x Related Products
- JavaBeansTM Development Kit (BDK)
- Java HotSpotTM Server Virtual Machine
- Application Programming Interfaces (APIs)- Core
to Java 2 platform - Collections Framework
- JavaTM Foundation Classes (JFC)
- Swing Components
- Pluggable Look Feel
- Accessibility
25Java Efforts (contd.)(XML, Java, and the future
of the Web)
- Java 2 Platform, Enterprise EditionTechnologies
- Enterprise JavaBeansTM Architecture
- JavaServer PagesTM
- JavaTM Servlet
- Java Naming and Directory InterfaceTM (JNDI)
- JavaTM IDL
- JDBCTM
- JavaTM Message Service (JMS)
- JavaTM Transaction (JTA)
- JavaTM Transaction Service (JTS)
- JavaMail
- RMI-IIOP
- Software Development Kit Application Model
- Java 2 SDK, Enterprise Edition
- Sun BluePrintsTM Design Guidelines for J2EE
26Java Efforts (contd.)(XML, Java, and the future
of the Web)
- Consumer Embedded Technologies
ProductsTechnologies - Java 2 Platform, Micro Edition (J2METM
technology) - Connected Device Configuration (CDC)
- Connected Limited Device Configuration (CLDC)
- C Virtual Machine (CVM)
- K Virtual Machine (KVM)
- PersonalJavaTM Application Environment
- PersonalJavaTM Technology, Source Edition
- EmbeddedJavaTM Application Environment
- EmbeddedJavaTM Technology, Source Edition
- Java CardTM
- JavaPhoneTM API
- Java TVTM API
- JiniTM Network Technology
- Mobile Information Device Profile (MIDP)
- Products
- Personal ApplicationsTM Suite
- Java Dynamic ManagementTM Kit
- Java Embedded ServerTM Software
27Java Efforts (contd.)(XML, Java, and the future
of the Web)
- Optional Packages
- Optional Packages define APIs that extend the
core Java platform API. - Forte FusionTM
- ForteTM for JavaTM
- HotJavaTM Product Family
- The JAINTM APIs JAINTM TCAP JAINTM OAM
- Java BlendTM
- JavaCheckTM
- JavaTM Electronic Commerce Framework
- JavaTM Internationalization Localization
Toolkit 2.0 - JavaTM Message Queue
- JavaServerTM Product Family
- JavaTM Shared Data Toolkit
- JavaSpacesTM
- JavaTM Speech API
- JavaTM Telephony API (JTAPI)
- JiniTM Network Technology
- JiroTM Technology
- OSS through JavaTM Initiative
28XML Parsing(contd.)(XML, Java, and the future of
the Web)
- There are two main types of parsing of XML
available in these parsers - SAX or Simple API for XML
- DOM or Document Object Model
- The Java SAX Parser API structure is as shown
here (Taken from Suns Java site)
29XML Parsing(contd.)(XML, Java, and the future of
the Web)
The Java DOM Parser API structure is as shown
here (Taken from Suns Java site)
30XML Parsing(contd.)(XML, Java, and the future of
the Web)
- When to use SAX and When to use DOM ?
- SAX
- If the information stored in your XML documents
is machine readable (and generated) data then SAX
is the right API for giving your programs access
to this information. Machine readable and
generated data include things like - Java object properties stored in XML format
- queries that are formulated using some kind of
text based query language (SQL, XQL, OQL) - result sets that are generated based on queries
(this might include data in relational database
tables encoded into XML). - So machine generated data is information that you
normally have to create data structures and
classes for in Java. A simple example is the
address book which contains information about
persons, as shown in Figure 1. This address book
XML file is not like a word processor document,
rather it is a document that contains pure data,
which has been encoded into text using XML.
31XML Parsing(contd.)(XML, Java, and the future of
the Web)
- When to use SAX and When to use DOM ?
- SAX
- When your data is of this kind, you have to
create your own data structures and classes
(object models) anyway in order to manage,
manipulate and persist this data. SAX allows you
to quickly create a handler class which can
create instances of your object models based on
the data stored in your XML documents. An example
is a SAX document handler that reads an XML
document that contains my address book and
creates an AddressBook class that can be used to
access this information. The first SAX tutorial
shows you how to do this. The address book XML
document contains person elements, which contain
name and email elements. My AddressBook object
model contains the following classes - AddressBook class, which is a container for
Person objects - Person class, which is a container for name and
email String objects. - So my "SAX address book document handler" is
responsible for turning person elements into
Person objects, and then storing them all in an
AddressBook object. This document handler turns
the name and email elements into String objects.
32XML Parsing(contd.)(XML, Java, and the future of
the Web)
When to use SAX and When to use DOM ? DOM If
your XML documents contain document data (e.g.,
Framemaker documents stored in XML format), then
DOM is a completely natural fit for your
solution. If you are creating some sort of
document information management system, then you
will probably have to deal with a lot of document
data. An example of this is the Datachannel RIO
product, which can index and organize information
that comes from all kinds of document sources
(like Word and Excel files). In this case, DOM is
well suited to allow programs access to
information stored in these documents. However,
if you are dealing mostly with structured data
(the equivalent of serialized Java objects in
XML) DOM is not the best choice. That is when SAX
might be a better fit.
33XML Displaying(XML, Java, and the future of the
Web)
To display XML document you can add CSS (Cascade
Style Sheet) files for all necessary
styles. type"text/css" href"cd_catalog.css"?
Empire
Burlesque Bob Dylan
USA ColumbiaANY 10.90 1985
34XML Displaying (contd.)(XML, Java, and the
future of the Web)
The CSS file may look like this CATALOG
background-color ffffff width 100 CD
display block margin-bottom
30pt margin-left 0 TITLE color
FF0000 font-size 20pt ARTIST color
0000FF font-size 20pt COUNTRY,PRICE,
Display block color 000000
margin-left 20pt YEAR,COMPANY Display
block color 00FF00 margin-left 20pt
35XML Displaying (contd.)(XML, Java, and the
future of the Web)
The output will look like Empire Burlesque Bob
Dylan USA Columbia 10.90 1985
36The Related Concepts/buzzwords(XML, Java, and
the future of the Web)
- CSS Cascade Style Sheet
- XSL eXtensible Style Sheet Language
- XSLT (XPATH) Extensible Stylesheet Language for
Transformations - RELAX Regular Language description for XML
- SOX Schema for Object-oriented XML
- TREXTree Regular Expressions for XML
- Schematron Schema for Object-oriented XML
- RDF Resource Description Framework
- XTM XML Topic Maps
- SMIL Synchronized Multimedia Integration
Language - MathML Mathematical Markup Language
- DrawMLDrawing Meta Language
- ICEInformation and Content Exchange
- ebXMLElectronic Business with XML
- Cxml Commerce XML
- CBL Common Business Library
37The Advanced Concepts(XML, Java, and the future
of the Web)
The Namespace, CDATA, Encoding, Server etc. (will
be discussed later)
38XML Applications/technologies(XML, Java, and the
future of the Web)
- The following types of applications are driving
the XML - Applications that require web clients to mediate
between two or more heterogeneous databases. - Applications that attempt to distribute a
significant portion of the processing load from
Web server to the Web client. - Applications that require the Web client to
present different views of the same data to
different users. - Applications in which intelligent Web agents
attempt to tailor information discovery to the
needs of individual users.
39A small XML application(1)(XML, Java, and the
future of the Web)
1) First we start with a simple XML
document. Take a look at our original
demonstration document, the CD catalog. version"1.0"? Empire
Burlesque Bob Dylan
USA ColumbiaY 10.90 1985
. . ... more ... . The full file is here
40A small XML application(2)(XML, Java, and the
future of the Web)
2) Load the document into a Data Island A Data
Island can be used to access the XML file. To get
your XML document "inside" an HTML page, add an
XML Data Island to the page. src"cd_catalog.xml" id"xmldso" async"false"
With the example code above, the XML file
"cd_catalog.xml" will be loaded into an
"invisible" Data Island called "xmldso". The
async"false attribute is added to the Data
Island to make sure that all the XML data is
loaded before any other HTML processing takes
place.
41A small XML application(3)(XML, Java, and the
future of the Web)
3) Bind the Data Island to an HTML Table An HTML
table can be used to display the XML data. To
make your XML data visible on your HTML page, you
must "bind" your XML Data Island to an HTML
element. To bind your XML data to an HTML table,
add a data source attribute to the table, and add
data field attributes to elements inside
the table data width"100" border"1" Title
Artist Year align"left" td
42A small XML application(4)(XML, Java, and the
future of the Web)
4) Bind the Data Island to or
elements or elements can be used to
display XML data. You don't have to use a table
to display your XML data. Data from a Data Island
can be displayed anywhere on an HTML page. All
you have to do is to add some or
elements to your page. Use the data source
attribute to bind the elements to the Data
Island, and the data field attribute to bind each
element to an XML element, like this
/Title datafld"TITLE"
Artist datasrc"xmldso" datafld"ARTIST"
/Year datafld"YEAR"
43A small XML application(5)(XML, Java, and the
future of the Web)
5) Add a Navigation Script to your XML Navigation
has to be performed by a script. To add
navigation to the XML Data Island, create a
script that calls the movenext() and
moveprevious() methods of the Data
Island.
function movenext() xxmldso.recordset if
(x.absoluteposition x.movenext() function moveprevious()
xxmldso.recordset if (x.absoluteposition 1)
x.moveprevious()
44A small XML application(6)(XML, Java, and the
future of the Web)
The result looks like this