Sistemi basati su conoscenza XML - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Sistemi basati su conoscenza XML

Description:

HTML (1990) was designed to display data (documents), and to focus on how data looks ... sp who='Faust' desc='leise' xml:lang='de' l Habe nun, ach! ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 45
Provided by: mariateres5
Category:

less

Transcript and Presenter's Notes

Title: Sistemi basati su conoscenza XML


1
Sistemi basati su conoscenzaXML
  • Prof. M.T. PAZIENZA
  • a.a. 2004-2005

2
Introduction to XML
  • HTML (1990) was designed to display data
    (documents), and to focus on how data looks
  • XML (1996) was designed to describe data
    (documents), and to focus on what data is
  • HTML is about displaying information,
  • XML is about describing information
  • both derive from SGML (1988)
  • XML is a standard for describing content in
    addition to presentation aspects.

3
HTML
  • HTML is a markup language it augments regular
    text with marks that hold special meaning for
    Web browser handling the document.
  • Commands in the language are called tags (start
    end),
  • ltTAGgt, lt/TAGgt
  • and have a fixed meaning.
  • HTML is adequate to represent the structure of
    documents only for display purposes.

4
XML (EXtensible Markup Language)
  • XML tags are not predefined in XML. The author
    must define his own tags and his own document
    structure.
  • XML uses a DTD (Document Type Definition) to
    describe any type data (document).
  • XML with a DTD is designed to be
    self-descriptive.
  • XML is free and extensible
  • XML is as a cross-platform, software and hardware
    independent tool for transmitting information.

5
XML does not DO anything
  • XML was not designed to DO anything.
  • It is just pure information wrapped in XML
    tags. Someone must write a piece of software to
    send it, receive it or display it.
  • XML is not a language it is a syntax supporting
    creation of personalized markup languages.

6
XML does not DO anything
  • Ex
  • lt?xml version"1.0"? EncodingISO_8859-1gt
  • ltnotegt
  • lttogtTovelt/togt ltfromgtJanilt/fromgt
    ltheadinggtReminderlt/headinggt
  • ltbodygtDon't forget me this weekend!lt/bodygt
    lt/notegt
  • The note has a header, and a message body. It
    also has sender and receiver information. But
    still, this XML document does not DO anything.
    Someone must write a piece of software to send
    it, receive it or display it.

7
XML to Exchange Data
  • XML was designed to store, carry and exchange
    data.With XML, data can be exchanged between
    incompatible systems.
  • In the real world, computer systems and databases
    contain data in incompatible formats.
  • Converting the data to XML can greatly reduce the
    complexity of data exchange and create data that
    can be read by many different types of
    applications.

8
XML used to Share Data
  • With XML, plain text files can be used to share
    data.
  • Since XML data is stored in plain text format,
    XML provides a software- and hardware-independent
    way of sharing data.
  • This makes it much easier to create data that
    different applications can work with. It also
    makes it easier to expand or upgrade a system to
    new operating systems, servers, applications, and
    new browsers. 

9
XML used to Store Data
  • With XML, plain text files can be used to store
    data.
  • XML can also be used to store data in files or in
    databases. Applications can be written to store
    and retrieve information from the store, and
    generic applications can be used to display the
    data.

10
XML Syntax
  • The syntax rules of XML are very simple and very
    strict. The rules are very easy to learn, and
    very easy to use.
  • Creating software that can read and manipulate
    XML is very easy to do.

11
XML Syntax
  • Element (also called tag) is the primary building
    block of an XML document. Xml elements are case
    sensitive and must be properly nested. Elements
    are related as parents and children.
  • With XML, the tag ltLettergt is different from the
    tag ltlettergt. Opening and closing tags must
    therefore be written with the same case.
  • Attributes provide additional information about
    the element. Their values (enclosed in quotes)
    are inside the start tag of an element. An
    attribute is a name-value pair separated by an
    equal sign ()
  • Entities are shortcuts for portions of common
    text (entity reference starts with and ends
    with )

12
XML Syntax
  • Comments arbitrary text- may be inserted
    anywhere in an XML document (comment starts with
    lt!- and ends with -gt)
  • Comment      'lt!--' ((Char - '-') ('-' (Char
    - '-'))) '--gt
  • An example of a comment
  • lt!-- declarations for ltheadgt ltbodygt --gt
  • Note that the grammar does not allow a comment
    ending in ---gt.
  • The following example is not well-formed
  • lt!-- B, B, or B---gt

13
XML Syntax
  • Document type declaration (DTD) is the set of
    rules that allows to specify own set of elements,
    attributes and entities. A DTD specifies which
    elements can be used and constraints on elements
  • A DTD defines the legal elements of an XML
    document that is the legal building blocks of an
    XML document. It defines the document structure
    with a list of legal elements.
  • XML Schema is an XML based alternative to DTD.

14
Why use a DTD?
  • XML provides an application independent way of
    sharing data.
  • With a DTD, independent groups of people can
    agree to use a common DTD for interchanging data.
  • Any application can use a standard DTD to verify
    that data received from the outside world is
    valid.

15
XML Syntax
  • All XML elements (a part XML declaration )must
    have a closing tag. The XML declaration
  • lt?xml version"1.0"? EncodingISO_8859-1gt
  • is not a part of the XML document itself. It is
    not an XML element, and it should not have a
    closing tag.
  • The XML declaration defines the XML version and
    the character encoding used in the document.

16
XML Syntax
  • All XML documents must have a root tag
  • The first tag in an XML document is the root tag.
  • All XML documents must contain a single tag pair
    to define the root element (ex.ltnotegt ).
  • All other elements must be nested within the root
    element.
  • All elements can have sub-elements (children).
    Sub-elements must be correctly nested within
    their parent element
  • ltrootgt ltchildgt ltsubchildgt.....lt/subchildgt
    lt/childgt lt/rootgt
  • In previous example there are 4 child elements of
    the root (to, from, heading, body)

17
XML Syntax
  • Attribute values must always be quoted
  • With XML, it is illegal to omit quotation marks
    around attribute values. 
  • XML elements can have attributes in name/value
    pairs just like in HTML.
  • In XML the attribute value must always be quoted.

18
XML Syntax
  • lt?xml version"1.0"?gt ltnote date12/11/99gt
    lttogtTovelt/togt ltfromgtJanilt/fromgt
    ltheadinggtReminderlt/headinggt ltbodygtDon't forget me
    this weekend!lt/bodygt lt/notegt
  • Incorretto
  • lt?xml version"1.0"?gt ltnote date"12/11/99"gt
    lttogtTovelt/togt ltfromgtJanilt/fromgt
    ltheadinggtReminderlt/headinggt ltbodygtDon't forget me
    this weekend!lt/bodygt lt/notegt
  • corretto

19
XML Syntax
  • White Space is Preserved
  • CR / LF is Converted to LF
  • A new line is always stored as LF

20
XML Syntax
  • There is nothing special about XML. It is just
    plain text with the addition of some XML tags
    enclosed in angle brackets.
  • Software that can handle plain text can also
    handle XML. In a simple text editor, the XML tags
    will be visible and will not be handled
    specially.
  • In an XML-aware application however, the XML tags
    can be handled specially. The tags may or may not
    be visible, or have a functional meaning,
    depending on the nature of the application.

21
XML Elements
  • XML Elements are Extensible
  • XML documents can be extended to carry more
    information.
  • XML Elements have Relationships
  • Elements are related as parents and children

22
XML Elements
  • Book Title My First XML
  • Chapter 1 Introduction to XML
  • What is HTML
  • What is XML
  • Chapter 2 XML Syntax
  • Elements must have a closing tag
  • Elements must be properly nested

23
XML element (book description)
  • ltbookgt
  • lttitlegtMy First XMLlt/titlegt
  • ltprod id"33-657" media"paper"gtlt/prodgt
  • ltchaptergtIntroduction to XML
  • ltparagtWhat is HTMLlt/paragt
  • ltparagtWhat is XMLlt/paragt
  • lt/chaptergt
  • ltchaptergtXML Syntax
  • ltparagtElements must have a closing taglt/paragt
  • ltparagtElements must be properly nestedlt/paragt
  • lt/chaptergt
  • lt/bookgt

24
XML element (book description)
  • book is the root element.
  • title, prod, and chapter are child elements of
    book.
  • book is the parent element of siblings (or sister
    elements) because they have the same parent.

25
Elements have Content
  • Elements can have different content types.
  • An XML element is everything from (including) the
    element's start tag to (including) the element's
    end tag.
  • An element can have element content, mixed
    content, simple content, or empty content. An
    element can also have attributes.

26
Elements have Content
  • In the book description
  • book has element content, because it contains
    other element
  • chapter has mixed content because it contains
    both text and other elements
  • para has simple content (or text content) because
    it contains only text
  • prod has empty content because it carries no
    information.

27
Element Naming
  • Names can contain letters, numbers, and other
    characters
  • Names must not start with a number or punctuation
    character
  • Names must not start with the letters xml (or XML
    or Xml ..)
  • Names cannot contain spaces

28
Element Naming
  • XML documents often have a corresponding
    database, in which fields exist corresponding to
    elements in the XML document. A good practice is
    to use the naming rules of the database for the
    elements in the XML documents

29
Ex. XML News document
  • lt?xml version"1.0"?gt
  • ltnitfgt ltheadgt
  • lttitlegtColombia Earthquakelt/titlegt lt/headgt
  • ltbodygt ltbody.headgt ltheadlinegt lthl1gt143 Dead in
    Colombia Earthquakelt/hl1gt lt/headlinegt
  • ltbylinegt ltbytaggtBy Jared Kotler, Associated Press
    Writerlt/bytaggt lt/bylinegt
  • ltdatelinegt ltlocationgtBogota,Colombialt/locationgt
    ltstory.dategtMonday January 25 1999 728
    ETlt/story.dategt lt/datelinegt lt/body.headgt lt/bodygt
    lt/nitfgt

30
DTD
  • A DTD is enclosed in
  • lt!DOCTYPE name DTD declaration gt
  • where name is the name of the outermost enclosing
    tag, and DTD declaration is the text of the
    rules of the DTD
  • The DTD starts with the outermost element, called
    the root of the element

31
Internal DTD
  • lt?xml version"1.0"?gt lt!DOCTYPE note
  • lt!ELEMENT note (to,from,heading,body)gt
  • lt!ELEMENT to (PCDATA)gt
  • lt!ELEMENT from (PCDATA)gt
  • lt!ELEMENT heading (PCDATA)gt
  • lt!ELEMENT body (PCDATA)gt gt
  • ltnotegt lttogtTovelt/togt ltfromgtJanilt/fromgt
    ltheadinggtReminderlt/headinggt ltbodygtDon't forget me
    this weekend!lt/bodygt lt/notegt
  • The DTD is interpreted like this!ELEMENT note
    defines the element "note" as having four
    elements "to,from,heading,body".!ELEMENT to
    defines the "to" element  to be of the type
    "CDATA".!ELEMENT from defines the "from" element
    to be of the type "CDATA"and so on.....

32
CDATA Sections
  • CDATA sections may occur anywhere character data
    may occur they are used to escape blocks of text
    containing characters which would otherwise be
    recognized as markup.
  • CDATA sections begin with the string "lt!CDATA"
  • and end with the string
  • "gt"

33
CDATA Sections
  • CDSect      CDStart CDataCDEnd
  • CDStart      'lt!CDATA
  • CData      (Char - (Char 'gt' Char))
  • CDEnd      'gt
  • Within a CDATA section, only the CDEnd string is
    recognized as markup, so that left angle brackets
    and ampersands may occur in their literal form
    they need not (and cannot) be escaped using
    "lt" and "amp". CDATA sections cannot nest.

34
CDATA Sections
  • An example of a CDATA section, in which
    "ltgreetinggt" and "lt/greetinggt" are recognized as
    character data, not markup
  • lt!CDATAltgreetinggtHello,world!lt/greetinggtgt

35
External DTD
  • This is the same XML document with an external
    DTD 
  • lt?xml version"1.0"?gt
  • lt!DOCTYPE note SYSTEM "note.dtd"gt
  • ltnotegt
  • lttogtTovelt/togt
  • ltfromgtJanilt/fromgt
  • ltheadinggtReminderlt/headinggt
  • ltbodygtDon't forget me this weekend!lt/bodygt
  • lt/notegt

36
External DTD
  • This is a copy of the file "note.dtd" containing
    the Document Type Definition
  • lt?xml version"1.0"?gt
  • lt!ELEMENT note (to,from,heading,body)gt
  • lt!ELEMENT to (PCDATA)gt
  • lt!ELEMENT from (PCDATA)gt
  • lt!ELEMENT heading (PCDATA)gt
  • lt!ELEMENT body (PCDATA)gt

37
XML document (with DTD)
  • An example of an XML document with a document
    type declaration
  • lt?xml version"1.0"?gt
  • lt!DOCTYPE greeting SYSTEM "hello.dtd"gt
    ltgreetinggtHello, world!lt/greetinggt
  • The system identifier "hello.dtd" gives the
    address (a URI reference) of a DTD for the
    document

38
XML document (with DTD)
  • The declarations can also be given locally, as in
    this example
  • lt?xml version"1.0" encoding"UTF-8" ?gt lt!DOCTYPE
    greeting lt!ELEMENT greeting (PCDATA)gt gt
    ltgreetinggtHello, world!lt/greetinggt

39
XML document (with DTD)
  • If both the external and internal subsets are
    used, the internal subset is considered to occur
    before the external subset.
  • This has the effect that entity and
    attribute-list declarations in the internal
    subset take precedence over those in the external
    subset.

40
Language identification
  • In document processing, it is often useful to
    identify the natural or formal language in which
    the content is written.
  • A special attribute named xmllang may be
    inserted in documents to specify the language
    used in the contents and attribute values of any
    element in an XML document.
  • In valid documents, this attribute, like any
    other, must be declared if it is used.

41
Language identification
  • A simple declaration for xmllang might take the
    form
  • xmllang NMTOKEN IMPLIED
  • The intent declared with xmllang is considered
    to apply to all attributes and content of the
    element where it is specified, unless overridden
    with an instance of xmllang on another element
    within that content.
  • Specific default values may also be given, if
    appropriate. In a collection of French poems for
    English students, with glosses and notes in
    English, the xmllang attribute might be declared
    this way
  • lt!ATTLIST poem xmllang NMTOKEN 'fr'gt
  • lt!ATTLIST gloss xmllang NMTOKEN 'en'gt
  • lt!ATTLIST note xmllang NMTOKEN 'en'gt

42
Language identification
  • ltp xmllang"en"gtThe quick brown fox jumps over
    the lazy dog.lt/pgt
  • ltp xmllang"en-GB"gtWhat colour is it?lt/pgt
  • ltp xmllang"en-US"gtWhat color is it?lt/pgt
  • ltsp who"Faust" desc'leise' xmllang"de"gt
    ltlgtHabe nun, ach! Philosophie,lt/lgt ltlgtJuristerei,
    und Medizinlt/lgt ltlgtund leider auch Theologielt/lgt
    ltlgtdurchaus studiert mit heißem Bemüh'n.lt/lgt
  • lt/spgt

43
Well formed XML documents
  • A well formed XML document has correct XML
    syntax (i.e. is a document that conforms to the
    XML syntax rules.
  • A valid XML document is a well formed XML
    document which also conforms to the rules of a
    DTD (Document Type Definition).

44
  • http//www.xml.com/pub/a/98/10/guide0.html
  • http//xmlfiles.com/xml/default.asp
  • http//www.brics.dk/amoeller/XML/index.html
  • http//msdn.microsoft.com/library/default.asp?url
    /library/en-us/xmlsdk30/htm/xmtutxmltutorial.asp
  • http//www.w3.org/TR/2000/REC-xml-20001006sec-wel
    l-formed
Write a Comment
User Comments (0)
About PowerShow.com