Title: XML Syntax Writing XML and Designing DTD's
1XML Syntax - Writing XML and Designing DTD's
2HTML 1st Example
- lthtmlgtltheadgtlttitlegtChocolate Cakelt/titlegtltbodygt
- ltbgtIngredient Listlt/bgtlthr /gt
- ltbrgt2 cups flour
- ltbrgt1 cup sugar
- ltbrgt2 bars chocolate
- ltbrgt1 cup milk
- ltbrgtltbrgtltbgtInstructionslt/bgt
- lthrgtltbrgtMix flour, sugar and milk
- ltbrgtEat chocolate
- ltbrgtBake at 400 degrees
- lt/bodygtlt/htmlgt
3XML Document Structure
- Text file containing Elements, Attributes Text
- lt?xml version1.0 ?gt
- ltRecipe nameChocolate Cake typeDesert gt
- ltIngredientListgt
- ltIngredientgt2 cups flourlt/Ingredientgt
- ltIngredientgt1 cup sugarlt/Ingredientgt
- lt/IngredientListgt
- ltInstructiongtSift the flourlt/Instructiongt
- lt/Recipegt
4XML Document Structure
- Text file containing Elements, Attributes Text
- lt?xml version1.0 ?gt
- ltRecipe nameChocolate Cake typeDesert gt
- ltIngredientListgt
- ltIngredientgt2 cups flourlt/Ingredientgt
- ltIngredientgt1 cup sugarlt/Ingredientgt
- lt/IngredientListgt
- ltInstructiongtSift the flourlt/Instructiongt
- lt/Recipegt
510 Rules Well Formed XML1. Must start with XML
declaration
62. Must be only one document element
- Valid Example(s)
- lt?xml version1.0 ?gt
- ltrecipegt
- lt/recipegt
- or
- ltrecipeBookgt
- ltrecipegtlt/recipegt
- ltrecipegtlt/recipegt
- lt/recipeBookgt
- Invalid Example
- lt?xml version1.0?gt
- ltrecipegt
- lt/recipegt
- ltrecipegt
- lt/recipegt
73. Match opening closing tags
- Carry over from html origins
- lthrgt ltpgt or ltboldgtltitalicgtlt/boldgtlt/italicgt
- Browsers forgive, XML Parsers do NOT
- ltpgtlt/pgt or ltbr /gt
- ltboldgtltitalicgtlt/italicgtlt/boldgt
- ltrecipegtlt/recipegt
84. Comments allowed, but not inside attribute or
element tag
- lt!-- Isnt XML really cool? --gt
- lt!-- Just like being a student!!! --gt
95. Elements and Attributes must start with a
letter
- ltRecipegt OK
- ltSecond thirdfalsegt OK
- lt2ndgt INVALID
- ltRecipe 2ndtruegt INVALID
106. Attributes must go in the opening tag
- Valid
- ltrecipe nameChocolate Cake
- categoryDesertgtlt/recipegt
- Invalid
- ltrecipegtlt/recipe nameChocolate Cakegt
117. Attributes must be enclosed in matching quotes
- Can use either single or double quotes but must
use same type to start and end attribute value - NameAustralian Computer Society
- NameAustralian Computer Society
12Lets finish these rules!
- 8. Only simple text for attributes, no nested
values. Nesting is allowed in elements, not in
attributes. - 9. Use lt amp gt quot and apos for
special characters. lt gt - 10. Write empty elements using ltrecipe /gt syntax
if no nested values, can still have attributes in
tag ltrecipe typedesert /gt.
13With these 10 rules, we have a Well Formed xml
document
- It means the xml can be read, processed or
parsed. - Doesnt mean the structure makes sense.
- ltrecipe modelHoldengt
- ltchaptergtlt/chaptergt
- ltengine cylinders4gtlt/enginegt
- ltrecipegt
14Examples
- Buggy dictionary
- Non-buggy dictionary
- FIDA
15DTD Document Type Definition
- Allows us to define the exact elements and
attributes for the document - These effectively become the rules of our own
markup language, the extensible part of xml - DTD really only defines the structure, limited
in what you can validate in regards to the text
values of the element or attribute.
16Recipe DTD
- lt!ELEMENT Recipe (Name, Description?,
Ingredients?, Instructions?)gt - lt!ELEMENT Name (PCDATA)gt
- lt!ELEMENT Ingredient (Qty, Item)gt
- lt!ELEMENT Qty (PCDATA)gt
- lt!ATTLIST Qty unit CDATA REQUIREDgt
- lt!ELEMENT Item (PCDATA)gt
- lt!ATTLIST Item optional CDATA 0 isVegetarian
CDATA truegt
17Elements
- Basic rules
- Start tag lttag_namegt and end tag lt/tag_namegt
- Tags must be nested
- lttag1gtlttag2gtlt/tag2gtlt/tag1gt
- Tags may be empty (no enclosed data)
- ltempty_tag/gt
- Whitespace in element content usually ignored
- ltsectiongtltpgt lt/pgtlt/sectiongt
- ltsectiongt ltpgt lt/pgtlt/sectiongt
18Element Declarations
- Used to define new elements and their content
- lt!ELEMENT name (PCDATA)gt ? ltnamegt lt/namegt
- Empty element has no content
- lt!ELEMENT name EMPTYgt ? ltname/gt
- When children allowed - any or model group
- lt!ELEMENT name ANYgt
- lt!ELEMENT person (name, e-mail)gt
19Model Groups
- Used to define content of elements
- lt!ELEMENT person (name, e-mail)gt
- Used to define hierarchies of elements
- lt!ELEMENT name (fname, surname)gt lt!ELEMENT
fname (PCDATA)gtlt!ELEMENT surname
(PCDATA)gtlt!ELEMENT e-mail (PCDATA)gt - Control organisation of elements
- Sequence connector - ',' - (A, B, C) then
- Choice connector - '' - (A B C) or
20Model Group Quantity Indicators
- Describe constraints on elements in DTD A? May
occur 0..1 A Must occur 1.. A May
occur 0.. A B Either A or B A, B A
followed by B (A, B) ((A,B?) C)
21Attributes
- Provide additional information about an element
- Enclosed by quotes - either " or '
- Case-sensitive
- May be character data or tokenized
- value"Blue Peter" (character data)
- value "blue" (single token)
- value "red green blue" (tokens)
- Values may be enumerated or defaulted (DTD)
22Attribute Declarations
- Attributes can be attached to elements
- Declared separately in ATTLIST declaration
- lt!ATTLIST tag gt
- Rest of definition specifies
- attribute name
- attribute type
- default value
23Attribute Names and Types
- Attribute name
- lt!ATTLIST tag nme type defaultgt
- lt!ATTLIST tag first_attr
secon_attr third_attr gt - Attribute types
CDATA NMTOKEN NMTOKENS ENTITY ENTITIES
ID IDREF IDREFS NOTATION name group
24Attribute Types
- CDATA
- Character data
- NMTOKEN
- Single token
- NMTOKENS
- Multiple tokens
- ENTITY
- Attribute is entity ref
- ENTITIES
- Multiple entity ref's
- ID
- Unique ID
- IDREF
- Match to ID
- IDREFS
- Match to multiple ID's
- NOTATION
- Describe non-XML data
- Name group
- Restricted list
25Attribute Types
- CDATA
- lt!ATTLIST person name CDATA gt
- NMTOKEN
- lt!ATTLIST mug color NMTOKEN gt
- NMTOKENS
- lt!ATTLIST temp values NMTOKENS gt
- ENTITY
- lt!ATTLIST person photo ENTITY gt
- ENTITIES
- lt!ATTLIST album photos ENTITIES gt
- ID
- lt!ATTLIST person id ID gt
- IDREF
- lt!ATTLIST person father IDREF gt
- IDREFS
- lt!ATTLIST person children IDREFS gt
- NOTATION
- lt!ATTLIST image format NOTATION (TeXTIFF) gt
- Name group
- lt!ATTLIST point coord (XYZ) gt
26Attribute Types
- CDATA
- name "Tom Jones"
- NMTOKEN
- color"red"
- NMTOKENS
- values"12 15 34"
- ENTITY
- photo"MyPic"
- ENTITIES
- photos"pic1 pic2"
- ID
- ID "P09567"
- IDREF
- IDREF"P09567"
- IDREFS
- IDREFS"A01 A02"
- NOTATION
- FORMAT"TeX"
- Name group
- coord"X"
27Default Attribute Values
- Can specify a default attribute value for when
its missing from XML document, or state that
value must be entered - REQUIRED Must be specified
- IMPLIED May be specifed
- "default" Default value if unspecified
- FIXED Only one value allowed
ltATTLIST tag name type
defaultgt lt!ATTLIST seqlist sepchar NMTOKEN
REQUIRED type (alphanum)
"num"
28Declarations
- Instructions for the XML processor
- Format - lt! gt or lt! lt! gtgt
- Document type - lt!DOCTYPE gt
- Character data - lt!CDATA gt
- Entities - lt!ENTITY gt
- Notation - lt!NOTATION gt
- Element - lt!ELEMENT gt
- Attributes - lt!ATTLIST gt
- lt!INCLUDEgt and lt!IGNOREgt
29Document Type Declaration
- Identifies the name of the document root element
- lt!DOCTYPE My_XML_Docgt
- May also add entity definitions and DTD
- lt!DOCTYPE My_XML_Doc gtltMy_XML_Docgt
...lt/My_XML_Docgt
30Comment Declaration
- Comments are not considered part of XML document
and should not be published - lt!-- A comment --gt
- Cannot have additional '--' in comment
- Cannot embed inside other declarations
31Character Data Declaration
- For occasions when text must contain
uninterpreted markup characters - Press ltltltENTERgtgtgt
- lt!CDATAPress ltltltENTERgtgtgtgt
32Processing Instructions
- Information required by an external application
- Processing Instructions
- Format - lt? ?gt
- XML PI - lt?xml version'1.0 ?gt
- Confusingly, this is called the XML declaration,
but is a processing instruction
33Entities
- XML document may be distributed among a number of
files - Each unit of information is called an entity
- Each entity has a name to identify it
- Defined using an entity declaration
- Used by calling an entity reference
34When to use Entities
- Use an entity when the information
- Is used in several places
- May be represented differently
- Is part of a larger document that needs to be
split up to be manageable - Conforms to a data format other than XML
35Types of Entity
- General Entity
- Referred to in XML document
- Parameter Entity
- Referred to in markup declarations in DTD
- Internal Entity
- Stored in main document
- Text content only
- External Entity
- Stored externally to the main document
- Text or binary
- Can use to group many internal entities together
36General Entities
- Declared in 'Document Type Declaration'
- lt!DOCTYPE My_XML_Doc lt!ENTITY name
"replacement"gt gt - lt!ENTITY xml "eXtensible Markup Language"gt
- The xml includes entities
- The eXtensible Markup Language includes entities
37Parameter Entities
- Declared in 'Document Type Declaration'
- lt!DOCTYPE My_XML_Doc lt!ENTITY name
"replacement"gt gt - lt!ENTITY param "(para list)"gt
- lt!ELEMENT section (param)gt
38External Entities
- External Text Entities
- Location specified with SYSTEM keyword
- lt!ENTITY ent SYSTEM "/ENTS/MYENT.XML"gt
- May specify with public identifier
- lt!ENTITY ent PUBLIC "-//EBI//ENTITIES ents//EN"
gt - External Binary Entities
- Need to identify format of data - NDATA
- lt!ELEMENT pic EMPTYgtlt!ATTLIST pic name ENTITY
REQUIREDgtlt!ENTITY photo SYSTEM
"/ENTS/photo.tif" NDATA TIFFgt - Referenced by empty element
- A photograph ltpic name"photo"/gt.
39Restrictions on Entities
- General text entities
- Can appear in element content
- ltparagt ent lt/paragt
- Can appear in attribute value
- ltpara name"ent"gt lt/paragt
- Can appear in internal entity content
- lt!ENTITY cod "ent"gt
- Cannot appear in other parts of DTD
40Restrictions on Entities (2)
- Binary entities
- If entity content is not XML, the entity cannot
be used as a textual reference - Error - lt!ELEMENT sec (paraphoto)gt
- Error - ltparagt photo lt/paragt
- Binary entity can only appear as an attribute of
type ENTITY - lt!ENTITY photo SYSTEM "photo.tif" NDATA
TIFFgtlt!ELEMENT pic (PCDATA)gtlt!ATTLIST pic
name ENTITY REQUIREDgt
41Parameter Entities
- Use parameter entities within DTD
- lt!ENTITY common "(paralisttable)"gtlt!ELEMENT
chapter ((common), section)gtlt!ELEMENT
section (common)gt - Safest to include parentheses in entity
definition and around entity reference
42Putting it all together...
- Have now been introduced to the main components
and rules of XML and DTDs - Entities, elements, declarations, processing
instructions, attribute lists - Use all these components in the 'Document
Definition Type' (DTD) to specify the rules about
the format of the XML document