Title: Introduction to XML: A Librarians Perspective
1Introduction to XMLA Librarians Perspective
- Delphine KhannaRutgers UniversityPalinet,
July/August 1999
2Overview of the workshop
- What is XML? How does it work?
- Why XML? What is it going to change?
- Overview of some XML-related standards.
- XML in libraries (standards projects).
- Practical skills
- Creating an XML document,
- Creating an XSL style sheet,
- Work with MS Internet Explorer 5.0.
3Workshop Web site
- http//scc01.rutgers.edu/ceth/intromat/xml/
- Contents
- This slide presentation,
- XML Samples used in this workshop,
- List of useful Web links and other resources.
4A First Look at XML
5Basics
- Simplified definition XML is a kind of
super-HTML where you can define your own tags.
6Term Clarification
- XML can be called a
- encoding format,
- language,
- standard.
- We will prefer standard.
7The XML Family A whole family of standards
- XML,
- XSL,
- XLINK, XPOINTER,
- Namespaces,
- RDF,
- XML Schemas,
- DOM,
- and more
8XML Who? When?
- XML family developed by W3C.
- Very recent
- XML 1.0 February 1998.
- Namespaces January 1999.
- RDF February 1999.
- XLINK, XPOINTER, XSL, XML Schemas still working
drafts.
9XML To develop the next generation of Web
applications
- People want to do more sophisticated things with
the Web. - HTML is too limited for that.
- Need for a more powerful language XML.
10XML Hype 2 myths
- XML will replace everything.
- (HTML, back-end relational databases, etc.)
- XML is completely different from Web technologies
we had before.
11Why is XML better?
12Lets look at a typical HTML document
- ltBODYgt
- ltH1gtLines Written in Early Springlt/H1gt
- ltH2gtWilliam Wordsworthlt/H2gt
- ltPgtI heard a thousand blended notes,ltBRgt
- While in grove I sate reclined,ltBRgt
- In that sweet mood when pleasant
thoughtsltBRgt - Bring sad thoughts to the mind.lt/Pgt
-
- ltPgtTo her fair works did nature linkltBRgt
- The human soul that through me ranltBRgt
- And much it griev'd me my heart to
thinkltBRgt - What man has made of man.lt/Pgt
- lt/BODYgt
13What is the problem?
- To do more fancy things with documents
- need to make their logical structure explicit.
- Otherwise software applications
- do not know what is what,
- do not have any handle over documents.
14Why XML is better Overall
- HTML
- Encoding too vague and messy.
- Logical structure is not clearly encoded.
- XML
- Allows us to create clean structured documents,
where logical structure of document is totally
explicit.
15The same document in XML
- lt?xml version"1.0"?gt
- ltPOEMgt
- ltTITLEgtLines Written in Early Springlt/TITLEgt
- ltAUTHORgtltFIRSTNAMEgtWilliamlt/FIRSTNAMEgt
ltLASTNAMEgtWordsworthlt/LASTNAMEgtlt/AUTHOR
gt - ltSTANZAgt
- ltLINE N1gtI heard a thousand blended
notes,lt/LINEgt - ltLINE N2gtWhile in grove I sate
reclined,lt/LINEgt - ltLINE N3gtIn that sweet mood when
pleasant thoughtslt/LINEgt - ltLINE N4gtBring sad thoughts to the
mind.lt/LINEgt - lt/STANZAgt
- ltSTANZAgt
- ltLINE N5gtTo her fair works did nature
linklt/LINEgt - ltLINE N6gtThe human soul that through
me ranlt/LINEgt - ltLINE N7gtAnd much it griev'd me my
heart to thinklt/LINEgt - ltLINE N8gtWhat man has made of
man.lt/LINEgt - lt/STANZAgt
16Why XML is better Reason 1
- HTML One single fixed tag set
- ltH1gt, ltH2gt, ltPgt, ltIMGgt, etc.
- XML You can define your own tag set
- ltPOEMgt, ltSTANZAgt, ltLINE N18gt.
- ltPgt, ltABSTRACTgt, ltFOOTNOTEgt, ltBIBL_ENTRYgt.
- gt Possible to describe the logical structure
exactly.
17Why XML is better Reason 2
- HTML Lack of syntax controlltAgtltPgtHellolt/Agtlt/Pgtlt
Pgtis considered OK. - XML Documents have to be at least
well-formedltPgtltAgtHellolt/Agtlt/Pgtis the only
form acceptable. - gt Code much cleaner.
18Why XML is better Reason 3
- HTML Logical structure and display are mixed up
- ltPgt, ltH1gt, ltH2gt.
- This text is ltFONT COLORbluegtimportantlt/FONTgt.
- XML Clear distinction between logical structure
and display - This text is ltEMPHgtimportantlt/EMPHgt.
- ltEMPHgt ltFONT COLORbluegt
- gt Code much cleaner.
19By the way, HTML is not that bad
- HTML
- Really simple Attractive to basic users.
- Works fine for basic Web pages.
- XML
- Clearly more complexWill scare off basic users.
- Probably an overkill for basic Web pages.
20What will XML change?Or, why do we need to
make the logical structure explicit?
21Different displays for different output devices
- Regular computer screens,
- Pocket computers, Palm Pilot,
- WebTV,
- Audio (visually-impaired, cars),
- Braille,
- Print.
22Term Clarification
- The Web based on a client/server architecture.
23Server-side Databases should speak to each other
- A very successful model
- relational databases on the server side.
- Next step data integration.
- Example 1 An online bookstore,
- Example 2 Medical records,
- Example 3 An index that knows which journals
are available in the library.
24XML representing structured data
- If XML can represent structured text,it can also
represent structured data. - XML is also very good at representing mixed data
seamlessly.
25XML for Interchange Example of a converted R-DB
record
- lt?xml version"1.0?gt
- ltPACKAGEgt
- ltIDgt33456lt/IDgt
- ltCATEGORYgtNext day deliverylt/CATEGORYgt
- ltSHIPPING_COSTgt15lt/ SHIPPING_ COSTgt
- ltLOC_DEPARTUREgtNew York Citylt/LOC_DEPARTUREgt
- ltLOC_ARRIVALgtPittsburghlt/LOC_ARRIVALgt
- ltDATE_DEPARTUREgt07/30/1999lt/ DATE_DEPARTUREgt
- ltDATE_ARRIVALgt07/31/1999lt/ DATE_ARRIVALgt
- ltPACKAGEgt
26Client-side The Web more than an online
fax-machine
- Web-browsers thin clients
- They just display documents.
- Clients can do more
- Client workstation has a lot of unused power,
- Less strain on the network and on the server,
- Example Viewing and sorting of a medical record.
27Client-side The Web more than an online
fax-machine (2)
- Clients can do more
- Personalized and sophisticated processing
possible. - Processing possibly provided by 3rd-party client
applications. - Example Bibliography manager.
28XML The nitty-gritty details
29Term Clarification
- Element,
- Tag (opening tag / closing tag / delimiter),
- Element content,
- Attribute (name / value).
- Example
- ltAUTHOR TYPEnovelistgtJohn Smithlt/AUTHORgt
30Differences in Syntax between XML and HTML
- XML Declarationlt?xml version"1.0?gt
- Every opening tag must have a closing tag.
- Empty tags have a different syntax ltBR/gt
- Tags are case sensitive ltSTANZAgt different from
ltstanzagt
312-Level Syntax Control
- XML documents can be
- Well-formed,
- Valid.
32Syntax Control Well-formed documents
- All XML documents must be well-formed.
- XML parsers check the well-formedness.
- Criteria of well-formedness
- Every opening tag must have a closing tag.
Illegal ltPgtHello - No overlapping elements Illegal
ltDIVgtltPgtHellolt/DIVgtlt/Pgt - One unique root element
33Tree Representation
- POEM
- TITLE AUTHOR STANZA STANZA
-
- FIRSTNAME LASTNAME LINE LINE
LINE LINE LINE LINE -
34Create your own XML document
- The cooking recipe document
- 1. Brainstorming on the structure of the
document, - 2. Creation of the document with a template.
35Editing XML Documents
- Textpad Internet Explorer 5 as a parser.
- Caution IE5 comes with limitations and
proprietary features. - Alternative
- XML editor (e.g. Softquads XMetal).
36To get started
- Create file in Textpad and load it in IE5.
- File extension xml.
- Save regularly and reload in IE5.
- Begin with
- lt?xml version"1.0"?gt
- ltRECIPEgt
- lt/ RECIPE gt
37Document Type Definitions (DTD)
38Document Type Definitions(DTD)
- Formal way of defining the tags used in a series
of documents. - A DTD
- specifies a list of tags,
- defines the relationships between these tags.
- Allows us to create consistency across a
collection of documents (e.g., 5000 poems).
39How does a DTD look like?
- lt!ELEMENT POEM (TITLE, AUTHOR, STANZA)gt
- lt!ELEMENT TITLE (PCDATA)gt
- lt!ELEMENT AUTHOR (FIRSTNAME, LASTNAME)gt
- lt!ELEMENT FIRSTNAME (PCDATA)gt
- lt!ELEMENT LASTNAME (PCDATA)gt
- lt!ELEMENT STANZA (LINE)gt
- lt!ELEMENT LINE (PCDATA)gt
- lt!ATTLIST LINE N CDATA REQUIREDgt
40Creating a DTD
- Non-trivial task.
- Higher level of expertise needed than for using a
DTD. - In-depth knowledge of XML,
- In-depth knowledge of the type of documents being
described. - Preliminary Document Analysis.
- A DTD can be dozens of pages long.
41Syntax ControlValid documents
- Higher level of control than well-formed
documents. - An XML document is valid if it conforms to its
DTD. - To validate an XML Document, it is necessary to
declare the name and location of its DTD.
42XML DTD declaration
- The DTD should be declared at the top of the XML
document. - Local file
- lt?xml version1.0 standaloneno?gt
- lt!DOCTYPE recipe SYSTEM poem.dtdgt
- URL
- lt?xml version1.0 standaloneno?gt
- lt!DOCTYPE recipe SYSTEM http//scc01.rutgers.
edu/ceth/intromat/xml/samples/poem/poem.dtdgt
43Validation with IE5
- When loading a documentThe IE5 parser does not
validate it. - Possible to validate a document through a script.
- Possible also to use a separate validating
parser. - For instance, the Scholarly Technology Groups
XML parser at Brown U. - (http//www.stg.brown.edu/service/xmlvalid/).
- Validating vs. non-validating parsers.
44Validation Strategy
- For now, best model
- When creating documents use a validating parser.
- (like the Scholarly Technology Group's XML
Parser) - When users download them parser only checks if
well formed.
45Namespaces
- Need to use elements from several DTDs in the
same document. - Scheme to identify the source of each element.
- Special case Same element name used by 2 DTDs.
46Namespace Example
- ltbook xmlnsbooks'urnloc.govbooks'
- xmlnsisbn'urlISBNhttp//www.isbn.o
rg/isbndtdgt - ltbookstitlegtCheaper by the
Dozenlt/bookstitlegt - ltisbnnumbergt1568491379lt/isbnnumbergt
- ltbooksnotesgtThis is funny
booklt/booksnotesgt - lt/bookgt
- Note Adapted from example in the Namespaces
recommendation.
47Namespace Example (2)Default Namespace
- ltbook xmlns'urnloc.govbooks'
- xmlnsisbn'urlISBNhttp//www.isbn.org
/isbndtdgt - lttitlegtCheaper by the Dozenlt/titlegt
- ltisbnnumbergt1568491379lt/isbnnumbergt
- ltnotesgtThis is funny booklt/notesgt
- lt/bookgt
- Note Adapted from example in the Namespaces
recommendation.
48More good things about XML
49Positive side-effects of XML (1)
- XML fosters the development of community-based
standards. - Concept of 2-level standard very powerful
- XML universal,
- DTDs community-specific.
- Now developing a new standard amounts to writing
a DTD. - Much easier than starting from scratch.
- E.g., Xlit.
50Positive side-effects of XML (2)
- Wide-spread standards are stronger than those
used by a limited community(regardless of their
intrinsic value). - HL7 --gt XML.
- Easier to hire programmers.
- More documentation available.
- Actively maintained by very large base of people.
51Positive side-effects of XML (3)
- A set of standards bundled together are stronger
than an isolated one. - Likely to appeal to more people (The Microsoft
Office idea). - The standards reinforce each others.
52Stylesheet Languages for XML
53Stylesheet Languages for XML
- Specify how to display logical elements.
- XML supports 2 stylesheet languages
- CSS
- Quite Limited,
- But eases transition HTML--gtXML.
- XSL
- Very powerful,
- Still a working draft.
54Extensible Stylesheet Language (XSL)
- 2 Parts
- Transformations
- Transform the XML document (reorder, hide, add
elements). - Formatting Objects (FO)
- Attach formatting properties to XML elements.
55XSL in IE 5.0
- Supports transformations but not the FO.
- Trick transform XML DTD-specific elements into
HTML elements. - Convenient because everybody knows HTML.
56XSL-to-HTML Stylesheets Syntax
- Style Sheet Excerpt XML Document Excerpt
- ltxsltemplate matchBOOK"gt ltBOOKgt
- ltPgtltxslapply-templates/gtlt/Pgt
ltAUTHORgtMary Brownlt/AUTHORgt - lt/xsltemplategt ltTITLEgtEasy
Cookinglt/TITLEgt - lt/BOOKgt
- ltxsltemplate matchAUTHORgt ltBOOKgt
- ltBgt ltxslvalue-of/gtlt/Bgt
ltAUTHORgtJohn Smithlt/AUTHORgt - lt/xsltemplategt ltTITLEgt101
Recipeslt/TITLEgt - lt/BOOKgt
- ltxsltemplate matchTITLE"gt ltBOOKgt
- ltxslvalue-of/gt ltAUTHORgtSue
Meyerlt/AUTHORgt - lt/xsltemplategt ltTITLEgtItalian
Cuisinelt/TITLEgt - lt/BOOKgt
- HTML Output
- ltPgtltBgtMary Brownlt/Bgt Easy Cookinglt/Pgt
- ltPgtltBgtJohn Smithlt/Bgt 101 Recipeslt/Pgt
- ltPgtltBgtSue Meyerlt/Bgt Italian Cuisinelt/Pgt
57Beginning of an XSL-to-HTML Stylesheet
- lt?xml version'1.0'?gt
- ltxslstylesheet xmlnsxsl"http//www.w3.org/TR/WD
-xsl"gt - ltxsltemplate match"/"gt
- ltxslapply-templates/gt
- lt/xsltemplategt
- ltxsltemplate match"POEM"gt
- ltHTMLgt
- ltBODYgt
- ltxslapply-templates/gt
- lt/BODYgt
- lt/HTMLgt
- lt/xsltemplategt
-
- lt/xslstylesheetgt
58Example of XSL-to-HTML Stylesheet
- See poem.xsl at
- http//scc01.rutgers.edu/ceth/intromat/xml/samples
/poem/poem.xsl
59Declaring an XSL Stylesheet in an XML document
- Just after the XML declaration (and the DTD
declaration if there is one). - Local file
- lt?xml-stylesheet typetext/xsl
hrefpoem.xsl?gt - URL
- lt?xml-stylesheet typetext/xsl
hrefhttp//scc01.rutg ers.edu/ceth/intromat/xml/
samples/poem/poem.xsl ?gt
60Creating your own Stylesheet
- The XSL-to-HTML recipe stylesheet
- XSL stylesheets can be tricky.
- Always use another stylesheet as a model.
- Name the file recipe.xsl.
- Make sure to declare it in the XML document.
- lt?xml-stylesheet type"text/xsl"
hrefrecipe.xsl"?gt - Always add one template at a time, and reload in
IE5 to make sure it works.
61Recipe Stylesheet Step 1
- lt?xml version"1.0"?gt
- ltxslstylesheet xmlnsxsl"http//www.w3.org/TR/WD
-xsl"gt - ltxsltemplate match"/"gt
- ltxslapply-templates/gt
- lt/xsltemplategt
- lt/xslstylesheetgt
62Recipe Stylesheet Step 2
- ltxsltemplate match"RECIPE"gt
- ltHTMLgt
- ltBODY BGCOLOR"FFFFCC"gt
- ltxslapply-templates/gt
- lt/BODYgt
- lt/HTMLgt
- lt/xsltemplategt
- ltxsltemplate match"TITLE"gt
- ltH1gtltCENTERgtltFONT COLOR"red"gt
- ltxslvalue-of/gt
- lt/FONTgtlt/CENTERgtlt/H1gt
- lt/xsltemplategt
63Recipe Stylesheet Step 3 and after
- For the rest of the stylesheet, see the sample
recipe.xsl at - http//scc01.rutgers.edu/ceth/intromat/xml/samples
/recipe/recipe.xsl
64Recipe Stylesheet Advanced
- Sorting the ingredients
- ltxsltemplate match"INGREDIENTLIST"gt
- ltHR/gt
- ltH2gtltFONT COLOR"red"gtIngredientslt/FONTgtlt/H2gt
- ltULgtltxslapply-templates select"INGREDIENT"
- order-by"PRODUCT/_at_AISLE
PRODUCT"/gtlt/ULgt - ltHR/gt
- lt/xsltemplategt
65XML Formatting ObjectsExample
- ltxsltemplate matchtitlegt
- ltfoblock font-weightbold font-colorrgb(0,255
,255) - font-size16ptgt
- ltxslapply-templates/gt
- lt/foblockgt
- lt/xsltemplategt
- Note Adapted from stylesheet created by Lynn
Lobash.
66Some Other XML-related Standards
67Linking Standards
- HTML links
- Really primitive and limited.
- Linking standards for XML
- Much more powerful.
- 2 parts
- XLink (aka. XLL),
- XPointer (aka. XLP).
- Still working drafts.
68XLink
- To define links to one or several documents.
- 2 types of links
- Simple,
- Extended.
69XLink Simple link
- Example
- ltrelated_poem xmllinksimple inlinefalse
hrefpoem1.xmlgtGo to related poemlt/related_poemgt
- Other attributes / Alternative values
- inline true, false (link to same document vs.
outside). - show replace, new, embed.
- actuate user, auto.
- title ( a caption).
- Similar to HTML links, but slightly more fancy.
70XLink Simple links (2)
- Example 2
- anchor
- ltpoem_anchor xmllinksimple rolepoem312gtlttit
legt Blue Mountainlt/titlegtlt/poem_anchorgt - link
- ltrelated_poem xmllinksimple inlinefalse
hrefpoem1 .xmlpoem312gtGoto related
poemlt/related_poemgt - Similar to HTML ltA NAMEpoem312gt.
71Xlink Extended Link
- One link, several targets.
- For instance, the link See related poems would
open as a list of links in a pop-up window.
72Xlink Extended Link (2)
- Example
- ltrelated_poems xmllinkextended inlinefalse
titleSee related poemsgt - ltpoem_target xmllinklocator inlinefalse
titleBlue Mountains hrefpoem1.xml/gt - ltpoem_target xmllinklocator inlinefalse
titlePink Flowers hrefpoem2.xml/gt - ltpoem_target xmllinklocator inlinefalse
titleSea of Green hrefpoem3.xml/gt - lt/related_poemsgt
73XPointer
- To define links that target points within
documents. - Special language to explain which spot is
targeted. - In HTML
- Need to manually insert a tag ltA NAMEgt.
- Hence need to own the document.
- With Xpointer
- No need to add anything to the target document.
74XPointer (2)
- Example
- ltrelated_poem xmllinksimple inlinefalse
hrefpoem1.xmlroot().child(2)gtGo to related
poemlt/related_poemgt - Other possibilities
- root().child(3).child(4)
- id(poem273)
- root().descendant(2, stanza)
- root().string(1, my heart)
- span(root().child(3), root().child(5))
75Resource Description Framework (RDF)
- Defines syntax for describing resources.
- Metadata
- Similar to information in OPAC records.
- Essential for identification and retrieval of
documents. - Illimited nesting.
76RDF Example
- ltrdfRDFgt
- ltrdfDescription about"http//www.w3.orggt
- ltPublishergtWorld Wide Web Consortiumlt/Publishergt
- ltTitlegtW3C Home Pagelt/Titlegt
- ltDategt1998-10-03T0227lt/Dategt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
- Note Adapted from example in the RDF
recommendation.
77RDF Example (with nesting)
- ltrdfRDFgt
- ltrdfDescription about"http//www.w3.orggt
- ltPublishergtWorld Wide Web Consortiumlt/Publishergt
- ltTitlegtW3C Home Pagelt/Titlegt
- ltDategt1998-10-03T0227lt/Dategt
- ltCreatorgt
- ltPerson about"http//www.w3.org/staffId/85740"
gt - ltNamegtOra Lassilalt/Namegt
- ltEmailgtlassila_at_w3.orglt/Emailgt
- lt/Persongt
- lt/Creatorgt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
- Note Adapted from example in the RDF
recommendation.
78RDF in practice
- RDF defines only the syntax, not the content.
- Can accommodate most document description
schemes. - Example of use Implementation of Dublin Core.
- Example of industry support ABC News, CNN and
Time Inc. - Both for HTML and XML documents.
- Details of implementation not entirely clear.
79XML Schemas
- An alternative to DTDs.
- Still a working draft.
- Easier because uses the XML syntax.
- Data typing possible, unlike DTDs
- (integer, floating number, date, string, etc.).
80XML Schema Example
- lt?xml version"1.0"?gt
- ltSchema name"poemSchema"
- xmlns"urnschemas-microsoft-comxml-data"
- xmlnsdt"urnschemas-microsoft-comdatatypes"
gt - ltElementType name"POEM" content"eltOnly"
model"closed"gt - ltelement type"TITLE"/gt
- ltelement type"AUTHOR"/gt
- ltelement type"STANZA" maxOccurs""/gt
- lt/ElementTypegt
- ltElementType name"TITLE" content"textOnly"
model"closed" dttype"string"/gt - ltElementType name"AUTHOR" content"eltOnly"
model"closed"gt - ltelement type"FIRSTNAME"/gt
- ltelement type"LASTNAME"/gt
- lt/ElementTypegt
- ...
- lt/Schemagt
81Unicode
- Default character encoding for XML.
- Great improvement for encoding of non-western
languages - more than 65,000 characters,
- Eventually will represent all alphabets and
writing systems, - Also includes special typographic characters (
¼ ).
82SGML, XML, HTMLWhat is the difference?
- XML SGML slightly simplified.
- HTML just an SGML DTD.
- Can be easily converted to an XML DTD.
- Relationship
- XML and SGML are meta-languages,
- HTML is a language.
83Searching XML Documents
84Models for XML RepositoriesFlat file system
- A bunch of XML documents in a folder.
- Native XML search engine
- an XML-aware Web site search engine.
- XML Query Language XQL
- Still in development
- Find word milk only when it appears in
attribute DIETINFO2 of element PRODUCT.
85Models for XML RepositoriesRegular relational
databases
- E.g., Web-based OPACs, Ovid, Amazon.
- Back-end relational DBMS
- MS Access or Oracle, for instance.
- Web interface
- uses scripts like CGI or Cold Fusion,
- Easy to change the scripts to output XML instead
of HTML, - Can even produce XML OR HTML according to the
capabilities of the requesting browser.
86Models for XML RepositoriesXML-aware relational
DBs
- Benefit from R-Databases AND XML advantages.
- Mixed record
- Nested structured text difficult to map to R-DB.
- However, many structured texts have a table-like
section (the bibliographic information). - R-Databases very mature technology (data
integrity, security, load balance, etc.).
87Models for XML RepositoriesXML-aware relational
DBs (2)
- Example of Oracle
- Enhanced full-text capabilities
- indexing,
- truncations, stemming, thesaurus, etc.,
- XML-like searching,
- can create SQL queries with embedded XML
subqueries. - Automatic mapping
- R-DB record --gt XML document,
- XML document --gt R-DB record,
- Virtual flat file system.
88Information Retrieval Standardfor XML
- Needed to implement cross-repository search
- To query across several XML servers seamlessly,
- Whatever the implementation on the server side
(Flat file system, R-DBMS, etc.).
89Information Retrieval Standard Z39.50
- Used in the library community.
- To query OPACs, indexes, etc.
- Possible to specify
- A Query Language,
- The format of the results,
- A session protocol.
90Information Retrieval Standard Z39.50 XML
- Currently beginning to integrate XML
- Defined as a possible output format,
- Some propositions to use XML as an alternative to
BER for overall Z39.50 syntax. - Once XQL is stabilized it could be ported to
Z39.50. - Good candidate to become the IR-Standard for XML.
- Little known outside the library community.
91XML in Libraries
92Which library projects are already using XML/SGML?
- Mostly academic institutions.
- (as well as Library of Congress and NYPL.)
- Usually in SGML.
- (Very recent ones in XML.)
- Mostly
- large and long-term digitization projects,
- involving the digitization of numerous texts.
- Converted to HTML on-the-fly.
93Text Encoding Initiative (TEI)
- Standard to encode primary sources in the
Humanities. - SGML-based. (It is an SGML DTD.)
- Currently being converted to XML.
- Maintained by TEI Consortium.
- Widely adopted in Humanities computing community.
- Has spread to libraries.
94Examples of TEI Projects
- Special collections
- Library of Congresss American Memory Project,
- Literary texts
- U. of Virginias E-text Collection,
- Browns Women Writers Project,
- Historical editions (MEP DTD)
- Abraham Lincoln Papers,
- Susan B. Anthony Papers.
95Encoding Archival Description (EAD)
- Finding Aids to Special Collections and Archives.
- SGML/XML-based standard. (It is a DTD.)
- Maintained by the Library of Congress.
- Widely adopted.
96Examples of EAD Projects
- Among many others California Digital Librarys
Online Archive of California. - Union DatabaseRLGs Archival Resources Project
- (MARC AMC records and EAD finding aids).
97Materials Used by Libraries
- Reference Materials
- Oxford English Dictionary,
- American National Biography,
- Electronic Journals
- Springer-Verlags Link.
98XML in Libraries What will it change? (1)
- EAD finding aids
- Offer precise and controlled search capabilities,
- Make the creation of union databases possible.
99XML in Libraries What will it change? (2)
- Full-text databases of primary sources
- Easy to search, display, etc.
- Next step, union databases.
- With precise and controlled search capabilities.
- Full-text databases of e-journals, monographs.
- Competition with PDF/page images, though.
- Again next step, union databases.
100XML in Libraries What will it change? (3)
- More sophisticated and customized clients
- Bibliography manager,
- Concordance program.
- New library standards based on XML
- TEI, EAD
- MARC (!)
- XML not just a fad, more than 10 years of
SGML-based TEI.
101XML in Libraries What will it change? (4)
- XML is more likely than any other formats to
resist obsolescence - Platform independent,
- Open standard
- (not proprietary),
- Written in ASCII/Unicode plain text
- (no binary encoding, the simplest text editor can
read it), - Tags are human-readable.
102Web-based referenceWhat will XML change? (1)
- Topic-specific meta-search engines
- e.g., job search or book search.
- Will become ubiquitous.
- Already exist but awkward for developers.
- Small communities --gt can agree on a DTD.
- See agreement CNN all on RDF.
- Can also work without common DTD.
- In the vendors interest.
- All database-based --gt easy conversion.
103Web-based referenceWhat will XML change? (1)
- General search will not improve for a long time
- A lot of legacy data
- The whole current WWW!.
- Numerous users will not switch to XML.
- Especially basic users.
- How to deal with thousands of different DTDs?
104Should you use XML in your project today?
- Are your data made of a repetition of similar
objects? (e.g., 3000 poems) - Is your project database-based?
- Is your project large?
- Do you plan to
- deliver to different output devices?
- integrate your project with others? (e.g. union
database) - develop advanced capabilities? (server-side or
client-side)