Title: XML: Introduction to XML
1XMLIntroduction to XML
- Ethan Cerami
- New York University
2Road Map
- What is XML?
- A Brief Overview
- Origins of XML
- Creating XML Documents
- Basic Rules
- Example XML Documents
- Case Studies
3Brief Overview of XMLXML v. HTML
4What is XML?
- XML eXtensible Markup Language
- "XML, to a certain extent, is HTML done right."
- Simon St. Laurent - XML is HTML on steroids.
- XML
- Extensible can be extended to lots of different
applications. - Markup language language used to mark up data.
- Meta Language Language used to create other
languages.
5XML v.HTML
- The best way to first understand XML is to
contrast it with HTML. - XML is Extensible
- HTML restricted set of tags, e.g. ltTABLEgt,
ltH1gt, ltBgt, etc. - XML you can create your own tags
- Example Put a library catalog on the web.
- HTML You are stuck with regular HTML tags, e.g.
H1, H3, etc. - XML You can create your own set of tags
TITLE, AUTHOR, DATE, PUBLISHER, etc.
6Book Catalog in HTML
HTML conveys the look and feel of your
page. As a human, it is easy to pick out the
publisher. But, how would a computer pick out
the publisher? Answer XML
- ltHTMLgt
- ltBODYgt
- ltH1gtHarry Potterlt/H1gt
- ltH2gtJ. K. Rowlinglt/H2gt
- ltH3gt1999lt/H3gt
- ltH3gtScholasticlt/H3gt
- lt/BODYgt
- lt/HTMLgt
7Book Catalog in XML
- ltBOOKgt
- ltTITLEgtHarry Potterlt/TITLEgt
- ltAUTHORgtJ. K. Rowlinglt/AUTHORgt
- ltDATEgt1999lt/DATEgtltPUBLISHERgtScholasticlt/PUBLISHE
Rgt - lt/BOOKgt
Look at the new tags! A Human and a computer can
now easily extract the publisher data.
8XML v. HTML
- General Structure
- Both have Start tags and end tags.
- Tag Sets
- HTML has set tags
- XML lets you create your own tags.
- General Purposes
- HTML focuses on "look and feel
- XML focuses on the structure of the data.
- XML is not meant to be a replacement for HTML.
In fact, they are usually used together.
9Origins of XML
10Origins of XML
- XML is based on SGML Standard Generalized
Markup Language - SGML
- Developed in the 1970s
- Used by big organizations IRS, IBM, Department
of Defense - Focuses on content structure, not look and feel
- Good for creating catalogs, manuals.
- Very complex
11Origins of XML
- XML SGML-Lite 20 of SGML's complexity, 80
of its capacity. - HTML and XML are both based on SGML.
SGML
HTML
XML
12XML and the W3C
- XML is an official standard of the World Wide Web
Consortium (W3C) - The Official Version is 1.0
- Official information is available at
- http//www.w3.org/XML/
- The Official spec is available at
- http//www.w3.org/TR/1998/REC-xml-19980210
- The Official XML FAQ
- http//www.ucc.ie/xml/
- W3C sponsors many projects which seek to enhance
and improve on XML.
13Creating XML DocumentsBasic Rules
14Basic Definitions
- Tag a piece of markup
- Example ltPgt, ltH1gt, ltTABLEgt, etc.
- Element a start and an end tag
- Example ltH1gtHellolt/H1gt
- HTML Code
- ltPgtThis is a ltBgtsamplelt/Bgt paragraph.
- This code contains
- 3 tags, ltPgt, ltBgt, and lt/Bgt
- However, it only contains one element ltBgtlt/Bgt
15Rule 1 Well-Formedness
- XML is much more strict than HTML.
- XML requires that documents be well-formed
- every start tag must have an end tag
- all tags must be properly nested.
- XML Code
- ltPgtThis is a ltBgtsamplelt/Bgt paragraph.lt/Pgt
Note the end lt/Pgt
16Rule 1 Well-Formedness
- Another HTML Example
- ltbgtltigtThis text is bold and italiclt/bgtlt/igt
- This will render in a browser, but contains a
nesting error. - XML Code (with proper nesting)
- ltbgtltigtThis text is bold and italiclt/igtlt/bgt
17Rule 2 XML is Case Sensitive
- XML is Case Sensitive.
- HTML is not.
- The following is valid in HTML
- ltH1gtHello Worldlt/h1gt
- This will not work in XML. Would result in a
well-formedness error - H1 does not have a matching end H1 tag.
18Rule 3 Attributes must be quoted.
- In HTML you can get away with doing the
following - ltFONT FACEARIAL SIZE2gt
- In XML, you must put quotes around all your
attributes - ltBOOK ID894329gtHarry Potterlt/BOOKgt
19Examples
20Examples
- To get a feel for XML, lets take a look at
several examples - An XML Memo
- CD Catalog
- Plant Catalog
- Restaurant Menu
21Example 1 A Memo
- lt?xml version"1.0" encoding"ISO8859-1" ?gt
- ltnotegt
- lttogtClasslt/togt
- ltfromgtEthanlt/fromgt
- ltheadinggtIntroductionlt/headinggt
- ltbodygtThis is an XML document!lt/bodygt
- lt/notegt
This XML Note could be part of a message board
application.
22Example 2 CD Collection
- lt?xml version"1.0" encoding"ISO8859-1" ?gt
- ltCATALOGgt
- ltCDgt
- ltTITLEgtEmpire Burlesquelt/TITLEgt
- ltARTISTgtBob Dylanlt/ARTISTgt
- ltCOUNTRYgtUSAlt/COUNTRYgt
- ltCOMPANYgtColumbialt/COMPANYgt
- ltPRICEgt10.90lt/PRICEgt
- ltYEARgt1985lt/YEARgt
- lt/CDgt
-
A Disclaimer I did not pick these CDs! I just
got the example off the web -)
Continued...
23ltCDgt ltTITLEgtHide your heartlt/TITLEgt
ltARTISTgtBonnie Tylorlt/ARTISTgt
ltCOUNTRYgtUKlt/COUNTRYgt ltCOMPANYgtCBS
Recordslt/COMPANYgt ltPRICEgt9.90lt/PRICEgt
ltYEARgt1988lt/YEARgt lt/CDgt ltCDgt
ltTITLEgtUnchain my heartlt/TITLEgt ltARTISTgtJoe
Cockerlt/ARTISTgt ltCOUNTRYgtUSAlt/COUNTRYgt
ltCOMPANYgtEMIlt/COMPANYgt ltPRICEgt8.20lt/PRICEgt
ltYEARgt1987lt/YEARgt lt/CDgt lt/CATALOGgt
Note that indentation helps you follow the flow
of the document.
24Example 3 A Plant Catalog
- lt?xml version"1.0" encoding"ISO8859-1" ?gt
- ltCATALOGgt
- ltPLANTgt
- ltCOMMONgtBloodrootlt/COMMONgt
- ltBOTANICALgtSanguinaria canadensislt/BOTANICALgt
- ltZONEgt4lt/ZONEgt
- ltLIGHTgtMostly Shadylt/LIGHTgt
- ltPRICEgt2.44lt/PRICEgt
- ltAVAILABILITYgt031599lt/AVAILABILITYgt
- lt/PLANTgt
Continued...
25 ltPLANTgt ltCOMMONgtColumbinelt/COMMONgt
ltBOTANICALgtAquilegia canadensislt/BOTANICALgt
ltZONEgt3lt/ZONEgt ltLIGHTgtMostly Shadylt/LIGHTgt
ltPRICEgt9.37lt/PRICEgt ltAVAILABILITYgt030699lt/AV
AILABILITYgt lt/PLANTgt ltPLANTgt
ltCOMMONgtMarsh Marigoldlt/COMMONgt
ltBOTANICALgtCaltha palustrislt/BOTANICALgt
ltZONEgt4lt/ZONEgt ltLIGHTgtMostly Sunnylt/LIGHTgt
ltPRICEgt6.81lt/PRICEgt ltAVAILABILITYgt051799lt/AV
AILABILITYgt lt/PLANTgt lt/CATALOGgt
26Example 4 Restaurant Menu
- lt?xml version"1.0" encoding"ISO8859-1" ?gt
- ltbreakfast-menugt
- ltfoodgt
- ltnamegtBelgian Waffleslt/namegt
- ltpricegt5.95lt/pricegt
- ltdescriptiongttwo of our famous Belgian
Waffles with plenty of real maple
syruplt/descriptiongt - ltcaloriesgt650lt/caloriesgt
- lt/foodgt
Continued...
27 ltfoodgt ltnamegtStrawberry Belgian
Waffleslt/namegt ltpricegt7.95lt/pricegt
ltdescriptiongtlight Belgian waffles covered
with strawberrys and whipped cream
lt/descriptiongt ltcaloriesgt900lt/caloriesgt
lt/foodgt ltfoodgt ltnamegtBerry-Berry Belgian
Waffleslt/namegt ltpricegt8.95lt/pricegt
ltdescriptiongtlight Belgian waffles covered
with an assortment of fresh berries and
whipped cream lt/descriptiongt
ltcaloriesgt900lt/caloriesgt lt/foodgt
Continued...
28 ltfoodgt ltnamegtFrench Toastlt/namegt
ltpricegt4.50lt/pricegt ltdescriptiongtthick
slices made from our homemade sourdough bread
lt/descriptiongt ltcaloriesgt600lt/caloriesgt
lt/foodgt ltfoodgt ltnamegtHomestyle
Breakfastlt/namegt ltpricegt6.95lt/pricegt
ltdescriptiongttwo eggs, bacon or sausage, toast,
and our ever-popular hash brownslt/descriptiongt
ltcaloriesgt950lt/caloriesgt lt/foodgt lt/breakfast-m
enugt
29Case Studies
30Applications of XML
- Widely used today in major applications
- Search Engines
- News Distribution
- E-Commerce
- Real Estate
- Genetics
- Defense Department Applications
31Case Study 1Search the Web
32Case Study 1 Web Search
- Scenario
- You want to offer a web search functionality for
your site. - You want control over the look and feel of the
search results. - You do not want to support your own database of
millions of web sites.
33Case Study 1 Web Search
- XML to the Rescue
- Several companies provide XML Access to their Web
Search Databases. - For example
- Open a network connection and send search
criteria. - Third Party returns results in XML.
34How it Works
- How it works
- User initiates a search request.
- Servlet is invoked.
- Servlet opens a network connection to Third Party
and passes user search criteria. - Third Party searches is database, and returns an
XML document. - Servlet transforms XML into HTML and returns to
user.
35How it Works
Search Criteria
Search Criteria
Browser
Servlet
Third Party Web Database
XML
HTML
36Case Study 2Price Comparison
37Case Study 2 Price Comparison
- Scenario
- You want to create a site that compares prices of
books. - For example, a user enters a book title, and your
page displays the price at bn.com, amazon.com,
bestbuy.com, etc. - User can choose the cheapest price.
38How it might work
- How it works
- User sends book title
- Servlet makes three concurrent connections and
queries the bookstores - Amazon, bn.com, bestbuy.com
- Each Bookstore returns results in a standard XML.
- Servlet parses XML and creates a small price
comparison table.
39How it might work
Amazon
XML
Search Criteria
Browser
Servlet
BN.com
XML
HTML
BestBuy
XML
40Case Study 3 Genomics
41Case Study 3 Genomics
- Bioinformatic Sequence Markup Language (BSML)
- BSML provides a standard DTD for representing
genes and the DNA sequences that make up that
gene. - This data can then be viewed via an XML Genome
Browser (http//www.labbook.com) - The next three slides show an excerpt of BSML for
the gene that regulates insulin production.
42lt?xml version"1.0"?gt lt!DOCTYPE Bsml SYSTEM
"BSML2_2.DTD"gt ltBsmlgt ltDefinitionsgt ltSequencesgt ltS
equence id"G186439" title"HUMINSR"
molecule"rna ic-acckey"M10051" length"4723"
representation"raw" topology"linear"
strand"ds" comment"Human insulin receptor
mRNA, complete cds."gt ltAttribute name"version"
content"M10051.1 GI186439"/gt ltAttribute
name"source" content"Human placenta, cDNA to
mRNA, clones lambda-IR1-15."/gt ltAttribute
name"organism" content"Homo sapiens"/gt
43 ltFeature-tablesgt ltFeature-tablegt
ltReference dbxref"85176928" title"1
(bases 1 to 4723)"gt ltRefAuthorsgt
Ebina,Y., Ellis,L., Jarnagin,K., Edery,M.,
Graf,L., Clauser,E., Ou,J.-H., Masiarz,F.,
Kan,Y.W., Goldfine,I.D., Roth,R.A. and
Rutter,W.J. lt/RefAuthorsgt ltRefTitlegt
The human insulin receptor cDNA the structural
basis for hormone-activated transmembrane
signalling lt/RefTitlegt
44ltSeq-datagt ggggggctgcgcggccgggtcggtgcgcacacga Ga
aggacgcgcggcccccagcgctcttgggggccgcctcggagcat Acccc
cgcgggccagcgccgcgcgcctgatccgaggagaccccgcg Ctcccgca
gccatgggcaccgggggccggcggggggcggcggccgc Gccgctgctgg
tggcggtggccgcgctgctactgggcgccgcgggcc Cctgtaccccgga
gaggtgtgtcccggcatggatatccggaacaacctc Actaggttgcatg
agctggagaattgctctgtcatcgaaggacacttgcag atactcttgat
gttcaaaacgaggcccga
DNA Sequences!