Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web - PowerPoint PPT Presentation

About This Presentation
Title:

Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web

Description:

University of Illinois at Urbana-Champaign ... Funded 1994-98 under DLI-I (NSF, DARPA, & NASA) Continued 1998-2001 under CNRI's ... Embed MathML in full XML document ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 10
Provided by: timoth70
Category:

less

Transcript and Presenter's Notes

Title: Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web


1
Illinois D-Lib Testbed Technologies for
Converting Legacy Mathematics for Display on
the Web
  • Timothy W. Cole
  • Thomas G. Habing
  • William H. Mischo
  • Grainger Engineering Library Information Center
  • University of Illinois at Urbana-Champaign
  • ? http//dli.grainger.uiuc.edu/Publications/MathML
    Conf/ thabing_at_uiuc.edu

2
Project Background Objectives
  • Funded 1994-98 under DLI-I (NSF, DARPA, NASA)
    Continued 1998-2001 under CNRIs D-Lib Test Suite
  • Objectives
  • Construct Large-Scale, Multipublisher,
    Markup-Based Full-Text Journal Testbed.
  • Investigate Processing, Indexing, Normalization,
    Retrieval, Rendering and Linking.
  • Study End-User Searching Behavior and Needs.
  • Testbed contains 60,000 Articles from 50 Journal
    Titles
  • Received as SGML (various DTDs) converted to XML
  • Content support from AIP, APS, ASCE, IEE, ASM,
    ACM, Elsevier
  • Additional support from IEEE, NRL, NTT Learning
    Systems

3
Project Background (cont.)
  • Accomplishments
  • Process Retrieve from Multiple Publishers
    Heterogeneous DTDs.
  • SGML to XML Conversion.
  • Metadata Extraction, Representation, Merging.
  • Dynamic Linking Forward/Backward, from/to A I
    DBs.
  • Current Investigations
  • Mathematics Markup Rendering Issues
  • Metadata Harvesting Replicative Distributed
  • E-Journal Archiving
  • Local Resource Resolution
  • Asynchronous Searching of Multiple Resources

4
Converting Legacy Markup to MathML
  • Goal Convert publisher-specific XML math markup
    to standard presentation MathML
  • Desired result can then focus on single
    rendering solution
  • Groundrules
  • Minimize need for human intervention
  • Utilize standards-based techniques (e.g., XSLT,
    JavaScript, DOM)
  • Embed MathML in full XML document
  • Validate success of conversion based on quality
    of presentation
  • Strive for consistency across MathML viewers
  • Scope
  • E.g. in 17,000 APS articles, gt 2.3 M instances of
    math (100 K block)
  • ? http//dli.grainger.uiuc.edu/MathMLStyle/math_sa
    mple.htm

5
Mathematics Markup Transformations
  • Identify translate mathematical character
    references
  • Identify tokenize mathematical content
  • Recognize transform mathematical markup (e.g.,
    embellishments, script limit schemtas, etc.)

Presentational MathML ltmath xmlnshttp//www.w3.o
rg/gt ltmsubsupgt
ltmrowgtltmigtalphalt/migtlt/mrowgt
ltmrowgtltmigtilt/migtlt/mrowgt
ltmrowgtltmngt2lt/mngtlt/mrowgt lt/msubsupgt lt/mathgt
ISO 12083 Math ltdformulagt ltggtalt/ggt
ltsupgt2lt/supgt ltinfgtilt/infgt lt/dformulagt
6
Approach Algorithim
  • For each XML document Identify mathematical
    nodes (e.g., ltdformulagt, ltformulagt)
  • Recursively apply templates to every child node
    within mathematical nodes
  • Look up entities special characters and Convert
    to appropriate MathML characters tokenize
    (JavaScript)
  • Tokenize remaining PCDATA (JavaScript)
  • Convert Postfix markup to MathML (e.g., ltsupgt,
    ltinfgt)
  • Re-tag one-to-one transformations (e.g., ltsumgt,
    ltulgt, ltllgt)
  • Transformed mathematical nodes (ltmathgt) replace
    original mathematical nodes in document
  • Include default namespace attribute

7
Approach Algorithim (cont.)
  • Illustrative XSLT
  • ltxslwhen test"sup or inf"gt
  • ltxslfor-each select"childnode()"gt
  • ltxslchoosegt
  • ltxslwhen test"name(selfnode())'su
    p' and name(following-siblingnode()1)'inf'"gt
  • ltxslelement name"msubsup
    namespacehttp//www.w3.org/gt
  • ltxslelement name"mrow
    namespacehttp//www.w3.org/gt
  • ltxslapply-templates
    select"preceding-siblingnode()1"/gt
  • lt/xslelementgt
  • ltxslelement name"mrow
    namespacehttp//www.w3.org/gt
  • ltxslapply-templates select"following-sibl
    ingnode()1"/gt
  • lt/xslelementgt
  • ltxslelement name"mrow" namespacehttp//www.w
    3.org/gt
  • ltxslapply-templates select"selfnode()"/
    gt
  • lt/xslelementgt
  • lt/xslelementgt
  • lt/xslwhengt
  • . . . THERE ARE FOUR MORE CASES TO
    HANDLE !

8
Remaining Issues
  • JavaScript from within XSLT
  • Rely on MS-specific mechanisms to invoke
    extension functions
  • Inconsistent Rendering by MathML Viewers
  • Validating against TechExplorer, Amaya, Mozilla,
    MS IE (w/ CSS)
  • Incomplete MathML implementations
  • Ambiguity Overuse of ltmrowgt
  • Limited impact on appearance
  • Verbosity -- 60 increase for inline, 15
    increase for block
  • Character / glyph issues
  • STIX project / Unicode update will provide some
    relief
  • Automated Checking for Errors / Problems
  • Rendering System Performance

9
Status
  • Developing publisher-specific XSLT stylesheets
  • See sample transformed issue of Physical Review
    Letters ?
  • XSLT allows us to generate standard MathML from
    publisher-dependent SGML math markup
  • Moves customization to pre-processing stage
  • Allows for single, common rendering solution
  • MathML can be rendered in some browsers / tools
    without the need to style (Mozilla, techexplorer,
    Mathematica)
Write a Comment
User Comments (0)
About PowerShow.com