From DTD generation to XML conversion: - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

From DTD generation to XML conversion:

Description:

... EDoc Server ... Structured ETDs at Humboldt's EDoc Server. Conversion Process Using Open ... at Virginia Tech ('ETD'), http://edoc.hu-berlin.de/diml ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 26
Provided by: uwem8
Category:

less

Transcript and Presenter's Notes

Title: From DTD generation to XML conversion:


1
From DTD generation to XML conversion
  • Uwe MüllerHumboldt University, Berlin
  • Electronic Publishing Groupu.mueller_at_cms.hu-berli
    n.de

2
Background
  • Humboldt University 800 1.000 dissertations /
    year
  • Germany duty to publish dissertations
  • traditional methods
  • publishing house
  • microfiche
  • 40 200 printed copies (depending on faculty
    regulations)
  • Humboldt U. not mandatory to submit an ETD
  • ¼ dissertations published electronically
  • XML as central strategy

3
Why XML?
PDF
  • Standardized format
  • Long term preservation
  • easily convertible to
  • presentation formats (HTML, PDF)
  • other XML structures
  • qualified full text retrieval
  • contains structural and contextual information
    in a machine readable format

digital signature
Office document
digital signature
XML
digital signature
HTML
4
XML Restrictions to deal with
  • XML source does not contain layout information
  • rather linear structure
  • XML is not used as Authoring System
  • authors use their 'own' systems
  • Microsoft Word
  • LaTeX
  • Open Office / Star Office
  • Framemaker
  • Word Perfect

5
How to overjump the gap?
  • get the authors where they are
  • instructions and guidelines for authors
  • usage of style files (e.g., dissertation-hu.dot)
  • manuals, support hotline, regular courses
  • different conversion processes
  • SGML author (plug in for MS Word lt 97)
  • Open Office / Star Office
  • exploit genuine XML format
  • MS Office 2003
  • XML according to DiML DTD
  • common pitfalls tables, pictures

6
(No Transcript)
7
Conversion Process Using Open Office
example.doc
front.html
chapter1.html
chapter2.html
Open Office
example.sxw (zip file) . . . . . . . .
chapter3.html
.gif
.jpg
example.html
content.xml
front.xml
example_stl.xml
chapter1.xml
chapter2.xml
chapter3.xml
example.xml
8
(No Transcript)
9
Principal Structure of a DiML document
  • ltetdgt
  • ltfrontgt..title...author...abstract...lt/frontgt
  • ltbodygt
  • ltchaptergt
  • ltsectiongt
  • ...
  • lt/bodygt
  • ltbackgt..bibliography...appendix...vita...lt/backgt
  • lt/etdgt

10
From flat structure to Hierarchy
  • only two types of styles in Word
  • paragraph styles
  • character styles
  • e.g., in case of th first occurring Heading 1
    paragraph style the converter has to know
  • Heading 1 is the beginning of a chapter
  • Heading 1 implies a head element
  • the element chapter can only occur in body
  • lt/frontgt
  • ltbodygt
  • ltchaptergt
  • lthead id"anyID"gtIntroductionlt/headgt

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
One Core Multiple Views
  • HTML generation (static or dynamic)
  • performance problems with XSLT and huge documents
  • solution division of XML sources into components
    (easier and fast to process)
  • PDF Print on Demand (http//www.proprint-service
    .de)
  • Current problems
  • changing Office systems and versions
  • ongoing implementations and adaptations necessary
  • but might be restricted to XSL coding

18
Towards a universal DTD?
  • DiML originally taken from an SGML DTD at
    Virginia Tech ("ETD"), http//edoc.hu-berlin.de/di
    ml
  • already many elements (gt 100)
  • combines elements of different description levels
  • extended and adapted to local needs
  • special requirements from several departments
    (e.g., literature / dramatics, humanities,
    geography, )
  • necessity to include external DTDs (e.g.,
    CALS-Table, MathML, MusicML, )
  • publication types other than theses and
    dissertations
  • conference proceedings, electronic journals,
    other series,
  • first approach extend DTD aiming at a universal
    'mega' DTD
  • problems complexity, difficult maintenance
  • other possibility create a completely new DTD
    for each purpose
  • loss of interoperability

19
Modular DTD Approach
  • idea individually adapted DTDs
  • split up DTD into modules, such as
  • text, structure, citation, dramatics
  • handle external DTDs as modules as well, e.g.,
  • MathML, MusicML, CALS-Table
  • recombine a DTD out of user selected modules
  • result
  • a DTD with only the needed elements and modules
  • individual reference and sample documents

20
Modular DTD Approach Benefits
  • modules are easily maintainable
  • distributed development
  • version numbers for each module
  • reusability
  • define (several) styles for each module
  • reference information for each module
  • support different languages
  • get a DTD that exactly fits your needs

21
DTDSys Principal Architecture
  • modules small packages of elements belonging to
    each other
  • stored in separate files in the DTDBase
  • include metadata, e.g., descriptive information,
    version numbers, and dependences to other modules
  • DTDSys generates DTD and reference files using
  • XSL / XSLT
  • Java
  • Web Interfaces

22
Modules and Dependences
  • text br, em, strong, sup, sub, u, tt, pre
  • common p, head, caption, url, name, foreign
  • structure chapter, section, subsection
  • citation quotations and references
  • documents page numbers, footnotes, endnotes,
  • diml front, body, back, abstract

23
DTD Generation Process
XSL
dependences.html
xdiml.dtd
DTDBase
XSL
XSL
selection.xml
full-dtd.xml
module-text.xml
module-text.xml
XSL
module-text.xml
reference . . . . .
dtd-reference.xml
  • including
  • element info
  • description
  • dependences

p.php
JavaXSL
chapter.php
24
Outlook
  • SCOPE Service Core for Open Publishing
    Environments
  • development of Publication Components (authoring
    tools, conversion mechanisms, layout and style
    definitions)
  • management system to maintain versions and
    dependences
  • publication system
  • workflow component
  • Long Term Preservation activities
  • Implementation of OAIS reference model
  • Sun Center of Excellence

25
Thanks
  • to Sabine Henneberger, Jakob Voß, Matthias Schulz
  • Thank you!
  • Questions?
  • u.mueller_at_cms.hu-berlin.de
  • http//edoc.hu-berlin.de/
Write a Comment
User Comments (0)
About PowerShow.com