Title: Using XML, XSLT, and CSS in a Digital Library
1Using XML, XSLT, and CSS in a Digital Library
- Markup Transformations
- SGML to XML Conversions
- Metadata Schema Generation
- Robert Ferrer
- r-ferrer_at_uiuc.edu
- ASIS Annual Meeting 2000
2SGML to XML Conversions - Modular
3SGML to XML Conversions - Basic
- Empty tags ltemptygt to lt .. /gt
- lt?Processing Instructiongt to lt? ... ?gt
- CDATA to CDATA sections lt!CDATA gt
- Named entities remain unchanged - alpha
- lt!DOCTYPE ...gt refers to XML DTD containing only
character entity definitions to Unicode points
lt!ENTITY alpha 945gt
4SGML to XML Conversions - Linking
- Attributes to facilitate internal linking
- ltCITEREF REFID"bib5" idli_occurrence3 /gt
- External links represented as XLinks
- ltFIG NAMEF1 xlinktypesimple
xlinkhreffig1.jpg xlinkshownew
xlinkactuateuser /gt
5SGML to XML Conversions - Math
- SGML Math converted to MathML
Presentational MathML ltmath xmlnshttp//www.w3.o
rg/gt ltmsubsupgt
ltmrowgtltmigtalphalt/migtlt/mrowgt
ltmrowgtltmigtilt/migtlt/mrowgt
ltmrowgtltmogt-lt/mogtltmngt2lt/mngtlt/mrowgt
lt/msubsupgt lt/mathgt
ISO 12083 Math ltdformulagt ltggtalt/ggt
ltsupgt-2lt/supgt ltinfgtilt/infgt lt/dformulagt
Identify translate mathematical character
references Identify tokenize mathematical
content
6SGML to XML Conversions - Math
- Recognize transform mathematical markup
- ltxsltemplate matchdformulagt ltxslwhen
test"sup or inf"gt - ltxslfor-each select"childnode()"gt
- ltxslchoosegt
- ltxslwhen test"name(selfnode())'su
p' and name(following siblingnode()1)'inf'"gt - ltxslelement name"msubsup
namespacehttp//www.w3.org/gt - ltxslelement name"mrow
namespacehttp//www.w3.org/gt - ltxslapply-templates
select"preceding-siblingnode()1"/gt - lt/xslelementgt
7SGML to XML Conversions - TeX
- TeX converted to GIF images
- ltFORM NOTATION"TEX" HIDE"TRUE"gt
(j_0-a_2')\,\rm mod\,P lt/FORMgtltuie name
uie1 xlinktype"simple" xlinkhref"fig1.gif"
xlinkshow"new" xlinkactuate"user /gt - TeX converted into MathML
- IBM TechExplorer
(j_0-a_2')\,\rm mod\,P
ltmathgtltmogt(lt/mogtltmsubgt ltmrowgtltmigtjlt/migtlt/mrowgtltmr
owgtltmngt0lt/mngtlt/mrowgtlt/msubgtltmigtminuslt/migt ltmsub
supgtltmrowgtltmigtalt/migt lt/mrowgtltmrowgtltmngt2lt/mngt..
8SGML to XML Conversions - DTD
- XML DTD does not permit inclusions and exclusions
- SGMLlt!ELEMENT Article - - (front, body)
(i.float)gt - XMLlt!ELEMENT Article (front body
i.float)gt - XML DTD does not permit the connector
- XML DTD does not permit the use of mixed content
models - lt!ELEMENT Other ((author, journal) (PCDATA))gt
9Metadata - Usage
- Metadata Within the DLI Testbed
- Normalize key fields from different publisher
DTDs to facilitate searching - Provide common and easily displayable
intermediate search results - Add value in the form of links to cited or citing
articles within the Testbed, external abstracts
and indexes, etc.
10Metadata - Schema
- Resource Description Framework (RDF) provides
standardized way to represent metadata using XML - Encapsulates metadata elements
- Provides varying levels of granularity
- RDF container objects describe the relations
between repeated metadata elements
11Metadata - Schema
- Dublin Core (DC) model is used to encapsulate all
searchable metadata - Provides the semantic framework for describing
each object in the collectionContent Intellectu
al Property InstantiationTitle Creator Date
Subject Publisher FormatDescription Contributo
r IdentifierType Rights LanguageSourceRel
ationCoverage
12Metadata - Schema
- Extensive custom IDLI tags are included
- Offer a further level of granularity
- ltDCDescriptiongtltidliAbstractgtlt/DCDescriptiongt
- Search clients familiar with IDLI schema can
achieve much greater precision - Dublin Core Qualifiers (DCQ) substructure to
replace many of the project-specific IDLI
elements - ltDCDescriptiongtltDCQAbstractgtlt/DCDescriptiongt
13Metadata - Schema
ltrdfseqgt ltrdfligt ltdcCreatorgt
ltidliauthor_namegtGiust, G. K.lt/idliauthor_namegt
ltidliorganization_namegtDepartment of
Electrical Engineering, Arizona State
Universitylt/idliorganization_namegt
lt/dcCreatorgt lt/rdfligt ltrdfligt
ltdcCreatorgt ltidliauthor_namegtSigmon,
T.W.lt/idliauthor_namegt
ltidliorganization_namegtDepartment of Computer
Science, Illinois State University
lt/idliorganization_namegt lt/dcCreatorgt
lt/rdfligt lt/rdfseqgt
14Metadata - Extracting
- Metadata is extracted from the base XML files
- Utilization of XML Header
- DTD is used to resolve entities
- XML-Stylesheet processing instruction
- Visual Basic application serves as parser
- Document Object Model (DOM)
- XSLT Style Sheets
15Metadata - Extracting
- Utilization of XSLT Style Sheets
- XSLT transformative features to generate base
metadata file and forward citation fragment - XSLT scripting features to generate elementsnot
directly expressed in the document - XSLT instantiation of ActiveX objects to test for
links
16Metadata - Extracting
- Utilization of DOM
- Insert pseudo elements (e.g. bibliographic data)
- Search reference citations from the generated
metadata object to insert forward references into
other metadata files