Consistent Electronic Publishing from Inconsistent Sources - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Consistent Electronic Publishing from Inconsistent Sources

Description:

Fantasy ' ... e.g. HR department wants name, address, past jobs, degrees, publications, ... generator. XSLT. transformer. Synthesize. Pattern #3. compiles. serializer ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 24
Provided by: philipma
Category:

less

Transcript and Presenter's Notes

Title: Consistent Electronic Publishing from Inconsistent Sources


1
Consistent Electronic Publishing from
Inconsistent Sources
  • Philip Mansfield
  • Yuri Khramov
  • Ahmet Gurcan
  • Nov 17 XML 2004 Conference Washington, DC

2
Advantages of XML-BasedContent Management
  • Accessibility
  • Consistent, organization-wide standards
  • Integrate well with others
  • Leverage widely-available XML tools
  • Easily validate, re-purpose, combine, transform,
    style, search, and render

3
Fantasy
  • If I can just get my whole organization using
    the same XML-based authoring tools in the same
    way, then I will be able to build the ultimate
    enterprise-wide content management solution.

4
Reality
  • Every division acts autonomously
  • People do not always read instructions, let alone
    follow them
  • Native authoring tool formats continue to be
    binary, not XML
  • Tool vendors preserve market share by making it
    difficult to migrate

5
Up-Conversion
  • Popular document formats do not encode structure
    at a useful semantic level
  • e.g. HR department wants name, address, past
    jobs, degrees, publications, etc. from resumes.
    Yet these categories of text are
    indistinguishable in the submitted
    word-processing documents and PDF files
  • Up-conversion is needed

6
Requirements
  • An effective content management solution
  • should not assume anything about source formats
  • should be highly flexible, adaptable and
    re-configurable over time
  • should have strong up-conversion capabilities

7
Solution
  • Assemble content management solutions from a
    toolkit of useful components, not a monolithic
    application
  • Use a pipeline architecture to manage data flow
    and order of execution of components
  • The primary function of a component is to convert
    from one format to another

8
Phases of Processing
  • Extract
  • Import all documents and data to XML
  • Synthesize
  • Up-convert and transform to a more useful form of
    XML
  • Publish
  • Export for the Web or print

9
Cocoon
  • The Apache Cocoon project provides one possible
    framework for pipelining conversion components,
    tailored for Web publishing
  • Pipelines are defined in sitemap files
  • Components can generate, transform or serialize
    XML
  • A Cocoon-based content management system called
    Lenya is under development

10
Use CaseData-Driven Graphics
  • A business reporting system requires graphs and
    charts to be drawn on the fly from current data
  • Data is variously available in Microsoft Excel
    files, database tables and XML

11
Use CaseOnline Newspapers
  • Articles from newspapers around the world are to
    be published as HTML pages
  • Newspapers are available in PDF form
  • The PDF file is to be augmented with hyperlinks
    to the HTML page for each article
  • The articles are to be indexed by title, author,
    section, etc.

12
Use CaseConference Proceedings
  • Publish the XML 2004 conference papers as HTML,
    PDF and SVG
  • Source documents are Microsoft Word files and
    DocBook XML files with links to SVG, PNG, GIF,
    JPEG and BMP files
  • Index the papers by author, city, country,
    keyword, organization, time, title and track
  • Provide cross-reference hyperlinks throughout

13
ExtractPattern 1
ltgt lt/gt
generator
01 10
14
ExtractPattern 2
ltgt lt/gt
query
database
ltgt lt/gt
generator
15
SynthesizePattern 1
ltgt lt/gt
profile
ltgt lt/gt
transformer
ltgt lt/gt
16
SynthesizePattern 2
XSLTgenerator
ltgt lt/gt
XSLTtransformer
ltgt lt/gt
17
SynthesizePattern 3
serializer
ltgt lt/gt
compiles
ltgt lt/gt
transformer
ltgt lt/gt
18
SynthesizePattern 4
ltgt lt/gt
transformer
batch process
01 10
serializer
transformer
ltgt lt/gt
19
PublishPattern 1
ltgt lt/gt
XSLT
01 10
XSL-FOserializer
XSLT transformer
ltgt lt/gt
PDF
20
PublishPattern 2
ltgt lt/gt
XSLT
ltgt lt/gt
XMLserializer
XSLT transformer
ltgt lt/gt
XHTML SVG
21
Demonstrations
  • Data-driven graphics
  • Online newspapers
  • Conference proceedings

22
References
  • Cocoon Projecthttp//cocoon.apache.org/
  • SchemaSoft Catwalkhttp//www.schemasoft.com/catwa
    lk/
  • NewspaperDirecthttp//www.newspaperdirect.com
  • SchemaSoft Word to DocBook Converterhttp//www.sc
    hemasoft.com/DocBook/
  • XML 2004 Proceedingshttp//www.idealliance.org/pr
    oceedings/xml04/

23
http//www.schemasoft.com
Write a Comment
User Comments (0)
About PowerShow.com