Processing of structured documents - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Processing of structured documents

Description:

... of that XML source content that was intended by the designer of that stylesheet ... the presentation in terms of pages and their collections of areas) and ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 41
Provided by: helenaah
Category:

less

Transcript and Presenter's Notes

Title: Processing of structured documents


1
Processing of structured documents
2
XSL Formatting model
  • An XSL stylesheet processor accepts a document in
    XML and an XSL stylesheet and produces the
    presentation of that XML source content that was
    intended by the designer of that stylesheet
  • two parts
  • tree transformation constructing a result tree
    from the XML source
  • formatting interpreting the result tree to
    produce formatted results suitable for
    presentation on a display, on paper, in speech,
    or onto other media

3
XSL formatting process
  • XSLT -gt element and attribute tree
  • objectify formatting objects tree
  • refinement
  • area tree
  • rendering

4
Tree transformation
  • Allows the structure of the result tree to be
    significantly different from the structure of the
    source tree
  • one could add a table-of-contents ( filtered
    selection of an original source document)
  • one could rearrange source data into a sorted
    tabular presentation
  • in constructing the result tree, the tree
    transformation process also adds the information
    necessary to format the result tree

5
Formatting
  • Formatting is enabled by including formatting
    semantics in the result tree (cf. CSS semantics)
  • formatting semantics are expressed in terms of
    classes of formatting objects
  • the nodes of the result tree are formatting
    objects
  • the classes of formatting objects denote
  • typographic abstractions such as page,
    paragraph, table,
  • finer control formatting properties
  • indenting control, word- and letter-spacing
    widow, orphan, and hyphenation control

6
In tree transformation
  • Tree transformation constructs the result tree
  • in XSL, the result tree is called the element and
    attribute tree
  • in the tree, the objects are primarily in the
    formatting objects namespace (fo)
  • a formatting object is represented as an XML
    element, with the formatting properties
    represented by a set of XML attribute-value pairs
  • the content of the formatting object is the
    content of the XML element
  • transformations are defined in the XSLT spec

7
Formatting objects
  • Formatting interprets the result tree in its
    formatting object tree to produce the
    presentation
  • Semantically, each formatting object represents a
    specification for a part of the pagination,
    layout and styling information that will be
    applied to the content of that formatting object
    as a result of formatting the whole result tree
  • e.g. block formatting object represents the
    breaking of content of a paragraph into lines
  • formatting of a paragraph also depends on the
    layout structure (i.e. aspects not defined in
    block fo)

8
Formatting objects
  • Block-level objects, inline-level objects
  • refer to the types of areas that are generated
  • areas refer to their default placement method
  • Inline areas are collected into lines
  • stacking direction inline-progression-direction
    (in Western writing systems left-to-right)
  • Lines are block-level
  • stacking direction block-progression-direction
    (in Western writing systems top-to-bottom)

9
Formatting properties
  • The formatting properties associated with an
    instance of a formatting object control the
    formatting of that object
  • CSS properties are included in the formatting
    properties
  • some of the properties, e.g., color, directly
    specify the formatted result
  • other properties, e.g., space-before, only
    constrain the set of possible formatted results
    without specifying any particular formatted
    result

10
Refinement
  • Refinement is a computational process which
    finalizes the specification of properties based
    on the attribute values in the XML result tree
  • refinement involves
  • propagating the inherited values of properties
  • evaluating expressions in property value
    specifications into actual values
  • converting relative numerics to absolute numerics
  • constructing some composite properties from more
    than one attribute

11
Area tree
  • Formatting consists of the generation of a tree
    of geometric areas, called the area tree
  • the areas are positioned on a sequence of one or
    more pages
  • each area has
  • a position on the page
  • a specification of what to display in that area
  • may have a background, padding and borders
  • areas may be nested
  • a character within a line, within a block, within
    a page

12
Area tree
  • As a general rule, the order of the area tree
    parallels the order of the formatting object tree
  • if one formatting object precedes another in the
    depth-first traversal of the formatting object
    tree, with neither containing the other, then all
    the areas generated by the first will precede all
    the areas generated by the second (in the
    depth-first traversal of the area tree), unless
    otherwise specified
  • typical exceptions side floats, footnotes

13
Refinement
  • Some of the refinement operations (particularly
    evaluating expressions) depend on knowledge of
    the area tree
  • thus refinement is not necessarily a
    straightforward, sequential procedure
  • may involve look-ahead, backtracking, or
    control-splicing with other processes in the
    formatter
  • constraints may conflict
  • it is implementation-defined which constraints
    should be relaxed and in what order to satisfy
    the others

14
Rendering
  • Rendering takes the area tree ( the abstract
    model of the presentation in terms of pages and
    their collections of areas) and
  • causes a presentation to appear on the relevant
    medium
  • e.g., a browser window on a computer display
    screen or sheets of paper

15
Alternatives for XML formatting
  • XSLT transformation produces HTML
  • CSS stylesheet attached to XML document
  • XSLT transformation makes structural changes and
    attaches a CSS stylesheet to the result
  • XSLT transformation produces formatting objects
  • e.g. FOP can make a conversion to PDF
  • XSmiles editor (HUT)
  • See
  • https//www.cs.helsinki.fi/I/hahonen/rado/style_ex
    .html (link from the course material page)

16
XML Namespaces
  • How XML namespaces help to modularize and reuse
    existing definitions?
  • Documents (or their structure definitions,
    processing applications, etc.) are not always
    created from scratch, but more and more existing
    definitions are reused and combined
  • extremely important especially in E-commerce and
    other data interchange
  • agreement of common vocabularies

17
Author A writes a document
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltnamegtABC
Newslt/namegt ltlink hrefhttp//www.abcnews.com
/gt lt/referencesgt
18
Author B adds some rating.
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltratinggt5
starslt/ratinggt ltnamegtABC Newslt/namegt
ltlink hrefhttp//www.abcnews.com/gt
ltratinggt3 starslt/ratinggt lt/referencesgt
19
Also Author C wants to add some rating...
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt
ltratinggtGlt/ratinggt ltnamegtABC Newslt/namegt
ltlink hrefhttp//www.abcnews.com/gt
ltratinggtPGlt/ratinggt lt/referencesgt
20
Author D would like to combine the documents...
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltratinggt5
starslt/ratinggt ltratinggtGlt/ratinggt
ltnamegtABC Newslt/namegt ltlink
hrefhttp//www.abcnews.com/gt ltratinggt3
starslt/ratinggt ltratinggtPGlt/ratinggt lt/reference
sgt
21
Which rating? -gt different names
lt?xml version1.0?gt ltreferencesgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltqa-ratinggt5
starslt/qa-ratinggt ltpa-ratinggtGlt/pa-ratinggt
ltnamegtABC Newslt/namegt ltlink
hrefhttp//www.abcnews.com/gt ltqa-ratinggt3
starslt/qa-ratinggt ltpa-ratinggtPGlt/pa-ratinggt lt/
referencesgt
22
Namespaces give a disciplined method for naming
lt?xml version1.0?gt ltreferences
xmlnsqahttp//joker.com/2000/star-rating
xmlnspahttp//penguin.xmli.com/2
000/review
xmlnshttp//pineapplesoft.com/1999/refgt
ltnamegtMacmillanlt/namegt ltlink
hrefhttp//www.mcp.com/gt ltqaratinggt5
starslt/qaratinggt ltparatinggtGlt/paratinggt
... lt/referencesgt
23
Namespaces
  • xmlnsqahttp//joker.com/2000/star-rating
  • qa prefix
  • http//joker.com/2000/star-rating
  • the namespace
  • a unique name (URI guarantees) no need to
    retrieve anything from the address
  • xmlns http//pineapplesoft.com/1999/refgt
  • the default namespace
  • elements without prefixes belong to this
    namespace
  • references, name, link

24
Namespaces
  • qarating
  • a qualified name (Qname)
  • scoping
  • The namespace is valid for the element where it
    is declared and all the elements within its
    content

25
Scoping
lt?xml version1.0?gt ltrefreferences
xmlnsrefhttp//pineapplesoft.com/1999/refgt
ltrefnamegtMacmillanlt/namegt ltreflink
hrefhttp//www.mcp.com/gt ltparating
xmlnspahttp//penguin.xmli.com/2000/reviewgtGlt/
paratinggt ltrefnamegtABC Newslt/namegt
ltreflink hrefhttp//www.abcnews.com/gt
ltqarating xmlnsqahttp//joker.com/2000/star-r
atinggt 3 starslt/qaratinggt lt/refrefer
encesgt
26
Namespaces and DTD
  • XML 1.0 DTDs are not namespace-aware
  • all the elements and attributes that are in some
    namespace have to be declared using the
    corresponding prefix
  • for elements with prefix pre
  • an attribute xmlnspre has to be declared

27
Namespaces and DTD
lt?xml version1.0?gt lt!DOCTYPE refreferences
lt!ELEMENT refreferences
(refname, reflink, (parating
qarating))gt lt!ATTLIST refreferences xmlnsref
CDATA REQUIREDgt lt!ELEMENT refname
(PCDATA)gt lt!ELEMENT reflink EMPTYgt lt!ATTLIST
reflink href CDATA REQUIREDgt lt!ELEMENT
parating (PCDATA)gt lt!ATTLIST parating xmlnspa
CDATA REQUIREDgt lt!ELEMENT qarating
(PCDATA)gt lt!ATTLIST qarating xmlnsqa CDATA
REQUIREDgt gt
28
DTD external and internal subsets
  • external and internal subset make up the DTD
    internal has higher precedence
  • syntax
  • lt!DOCTYPE root-type-name SYSTEM ex.dtd
    lt!-- external subset in file ex.dtd --gt
    lt!-- internal subset may come here
    --gt gt
  • internal subset may declare new elements (with
    attributes) or new attributes for existing
    elements
  • namespaces facilitate the control of name
    conflicts

29
Namespaces and XML Schema
  • An XML Schema document contains declarations of
    namespaces that are used in the document
  • e.g. xmlnsxsdhttp//www.w3.org/2001/XMLSchema
    for the elements with special XML Schema
    semantics
  • Target namespace these definitions included in
    this schema give definition to this namespace
  • targetNamespaceurimywork

30
Namespaces and XML Schema
  • In XML Schema, schema components from different
    target namespaces can be used together
  • -gt enables the schema validation of instance
    content defined across multiple namespaces

31
Importing schema declarations
  • Every top-level schema component is associated
    with a target namespace (or, explicitly, with
    none, if the target namespace is not defined)
  • a component may refer to another component that
    is in a different namespace, using an import
    element

32
Import
ltschema xmlnshttp//www.w3.org/2001/XMLSchema
xmlnshtmlhttp//www.w3.org/1999/x
html targetNamespaceurimywork
xmlnsmyurimyworkgt ltimport
namespacehttp//www.w3.org/1999/xhtmlgt ltcompl
exType namemyTypegt ltsequencegt
ltelement refhtmlp minOccurs0/gt
lt/sequencegt lt/complexTypegt ltelement
namemyElt typemymyTypegt lt/schemagt
33
Type libraries
  • As XML schemas become more widespread, schema
    authors will want to create simple and complex
    types that can be shared and used as the basic
    building blocks for building new schemas
  • XML Schemas already provide types that play this
    role the simple types
  • other examples currency, units of measurement,
    business addresses

34
Example currencies
ltschema targetNamespacehttp//www.example.com/Cu
rrency xmlnschttp//www.example
.com/Currency xmlnshttp//www.w3
.org/2000/08/XMLSchemagt ltcomplexType
nameCurrencygt ltsimpleContentgt
ltextension basedecimalgt ltattribute
namenamegt ltsimpleTypegt
ltrestriction basestringgt
ltenumeration valueAED/gt
ltenumeration valueAFA /gt ltenumeration
valueALL /gt
35
Extending content models
  • Mixed content models
  • an element can contain, in addition to
    subelements, also arbitrary character data
  • import
  • an element can contain elements whose types are
    imported from external namespaces
  • e.g. this element may contain an HTML p element
    here
  • more flexible way
  • any element, any attribute

36
Example
ltpurchaseReport xmlnshttp//www.example.com/Rep
ortgt ltregionsgt lt!-- part sales by regions --gt
lt/regionsgt ltpartsgt lt!-- part descriptions --gt
lt/partsgt lthtmlExamplegt lttable
xmlnshttp//www.w3.org/1999/xhtml
border0 width100gt lttrgt ltth
alignleftgtZip Codelt/thgt ltth
alignleftgtPart Number lt/thgt ltth
alignleftgtQuantitylt/thgt lt/trgt
lttrgtlttdgt95819lt/tdgtlttdgt lt/tdgt lttdgt lt/tdgtlt/trgt
lttrgtlttdgt lt/tdgtlttdgt872-AAAlt/tdgtlttdgt1lt/tdgtlt/trgt
...
37
Including an HTML table
  • To permit the appearance of HTML in the instance
    document we modify the report schema by declaring
    the content of the element htmlExample by the any
    element
  • in general, an any element specifies that any
    well-formed XML is permissible in a types
    content model
  • in the example, we require the XML to belong to
    the namespace http//www.w3.org/1999/xhtml
  • -gt the XML should be XHTML

38
Schema declaration with any
ltelement namepurchaseReportgt ltcomplexTypegt
ltsequencegt ltelement nameregions
typerRegionsType/gt ltelement
nameparts typerPartsType/gt ltelement
namehtmlExamplegt ltcomplexTypegt
ltsequencegt ltany
namespacehttp//www.w3.org/1999/xhtml
minOccurs1 maxOccursunbounded
processContentsskip/gt
lt/sequencegt ...
39
Schema validation
  • The attribute processContents
  • skip no validation
  • strict an XML processor is obliged to obtain the
    schema associated with the required namespace and
    validate the HTML appearing within the
    HTMLExample element

40
anyAttribute
ltelement namehtmlExamplegt ltcomplexTypegt
ltsequencegt ltany namespacehttp//w
ww.w3.org/1999/xhtml
minOccurs1 maxOccursunbounded
processContentsskip/gt
lt/sequencegt ltanyAttribute
namespacehttp//www.w3.org/1999/xhtml/gt
lt/complexTypegt lt/elementgt
Write a Comment
User Comments (0)
About PowerShow.com