Text Annotation Techniques - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Text Annotation Techniques

Description:

Elements can contain text, other elements, or be empty. ... from several hundred dedicated researchers and engineers working for Member ... – PowerPoint PPT presentation

Number of Views:1263
Avg rating:3.0/5.0
Slides: 33
Provided by: brianw2
Category:

less

Transcript and Presenter's Notes

Title: Text Annotation Techniques


1
Text Annotation Techniques
  • by Brian Wanner

2
Annotated Text
Ordinary Text (Eg.) This is an ordinary text
document.
Annotated text (Eg.) lthtmlgt lttitlegt Sample
Document lt/titlegt ltbodygt This is an annotated
text document. lt/bodygt lt/htmlgt
3
Document Type Definition (DTD)
  • It is a specification that accompanies an
    annotated document.
  • It aids the parser in identifying what the codes
    (or markup) are that separate paragraphs,
    identify topic headings
  • It also indicates to the parser how each tag is
    to be processed.
  • The DTD for every document is generally placed on
    top of the document.

4
DTD Example
  • This is an XML document with a Document Type
    Definition

lt?xml version"1.0"?gt lt!DOCTYPE note
lt!ELEMENT note (to,from,heading,body)gt
lt!ELEMENT to (PCDATA)gt lt!ELEMENT from
(PCDATA)gt lt!ELEMENT heading (PCDATA)gt
lt!ELEMENT body (PCDATA)gt gt
ltnotegt lttogtJoelt/togt ltfromgtJanelt/fromgt
ltheadinggtReminderlt/headinggt ltbodygtDon't forget
the homeworklt/bodygt lt/notegt
5
Why use DTD?
  • each file can carry a description of its own
    format with it.
  • independent groups of people can agree to use a
    common DTD for interchanging data.
  • Your application can use a standard DTD to verify
    that the data you receive from the outside world
    is valid.

6
Standard Generalized Markup Language (SGML)
  • SGML is a metalanguage
  • a language for writing languages in.
  • SGML is used to define the abstract structure of
    a DTD

7
SGML
  • Each markup language defined in SGML is called an
    SGML application. An SGML application is
    generally characterized by
  • An SGML declaration.
  • The SGML declaration specifies which characters
    and delimiters may appear in the application.
  • A DTD
  • A specification that describes the semantics to
    be ascribed to the markup. This specification
    also imposes syntax restrictions that cannot be
    expressed within the DTD.
  • Document instances containing data (content) and
    markup. Each instance contains a reference to the
    DTD to be used to interpret it.

8
(No Transcript)
9
HTML HyperText Markup Language XML eXtensible
Markup Language WML Wireless Markup
Language XHTML eXtensible HyperText Markup
Language SVG Scalable Vector Graphics SMIL
Synchronized Multimedia Integration Language
10
  • Seen from a DTD point of view, all XML documents
    (and HTML documents) are made up by the following
    simple building blocks
  • Elements
  • Tags
  • Attributes
  • Entities
  • PCDATA
  • CDATA

11
Elements
  • Elements are the main building blocks of both XML
    and HTML documents.
  • Examples of HTML elements are "body" and "table".
    Examples of XML elements could be "note" and
    "message". Elements can contain text, other
    elements, or be empty. Examples of empty HTML
    elements are "hr", "br" and "img".

12
Tags
  • Tags are used to markup elements.
  • A starting tag like ltelement_namegt marks up the
    beginning of an element, and an ending tag like
    lt/element_namegt  marks up the end of an element.
  • Examples
  • body element marked up with body tags
  • ltbodygtbody text in betweenlt/bodygt.
  • message element marked up with message tags
  • ltmessagegtsome message in betweenlt/messagegt

13
Attributes
  • Attributes provide extra information about
    elements.
  • Attributes are always placed inside the starting
    tag of an element. Attributes always come in
    name/value pairs. The following "img" element has
    additional information about a source file
  • ltimg src"computer.gif" /gt
  • The name of the element is "img". The name of the
    attribute is "src". The value of the attribute is
    "computer.gif". Since the element itself is empty
    it is closed by a " /".

14
Entities
  • Entities are variables used to define common
    text.
  • Entity references are references to entities.

15
PCDATA
  • PCDATA means parsed character data.
  • Think of character data as the text found between
    the start tag and the end tag of an XML element.
  • PCDATA is text that will be parsed by a parser.
    Tags inside the text will be treated as markup
    and entities will be expanded. 

16
CDATA
  • CDATA also means character data.
  • CDATA is text that will NOT be parsed by a
    parser. Tags inside the text will NOT be treated
    as markup and entities will not be expanded.

17
What is HTML?
  • HTML is a non-proprietary format based upon SGML,
    and can be created and processed by a wide range
    of tools, from simple plain text editors - you
    type it in from scratch- to sophisticated WYSIWYG
    authoring tools.

18
(No Transcript)
19

HTML uses tags such as lth1gt and lt/h1gt to
structure text into headings, paragraphs, lists,
hypertext links etc.
Heading 1 Heading 2 Heading 3 Heading 4
  • ltH1gtHeading 1lt/H1gt ltH2gtHeading 2lt/H2gt
    ltH3gtHeading 3lt/H3gt ltH4gtHeading 4lt/H4gt

20
What is XML?
  • Extensible Markup Language (XML) is a simple,
    very flexible text format derived from SGML.
  • Originally designed to meet the challenges of
    large-scale electronic publishing, XML is also
    playing an increasingly important role in the
    exchange of a wide variety of data on the Web and
    elsewhere.

21
XML Sample
  • lt?xml version"1.0"?gt
  • ltorder orderid"THX1138 customerNumber"3263827"gt
  • ltlineitem itemid"C33"gt
  • ltquantitygt36lt/quantitygt
  • ltunitprice currency"dollars"gt.35lt/unitpricegt
  • lt/lineitemgt
  • ltlineitem itemid"M48"gt
  • ltquantitygt1lt/quantitygt
  • ltunitprice currency"dollars"gt2200lt/unitpricegt
  • lt/lineitemgt
  • lt/ordergt

22
XML/HTML
  • XML is not a replacement for HTML.XML and HTML
    were designed with different goals
  • XML was designed to describe data and to focus on
    what data is.HTML was designed to display data
    and to focus on how data looks.
  • HTML is about displaying information, XML is
    about describing information.

23
XML/HTML
  • The tags used to markup HTML documents and the
    structure of HTML documents are predefined. The
    author of HTML documents can only use tags that
    are defined in the HTML standard.
  • XML allows the author to define his own tags and
    his own document structure.

24
What is XHTML?
  • The Extensible HyperText Markup Language (XHTML)
    is a family of current and future document types
    and modules that reproduce, subset, and extend
    HTML, reformulated in XML.
  • XHTML Family document types are all XML-based,
    and ultimately are designed to work in
    conjunction with XML-based user agents.
  • XHTML is the successor of HTML, and a series of
    specifications has been developed for XHTML.

25
SMIL
  • SMIL authoring offers a new way to assemble and
    deliver streaming multimedia presentations.
    Rather than the traditional way of creating a
    presentation by compiling a set of media into a
    single distributable file, SMIL lets authors
    choreograph separate media assets quickly and
    easily, with tools as simple as a text editor.
    Perhaps the best feature of SMIL is the ability
    to generate the code on-the-fly, as many Web
    pages are already created, and thereby offer
    personalized streaming multimedia.
  • SMIL Demo
  • SMIL Source

26
SVG
  • SVG is a language for describing two-dimensional
    graphics in XML
  • SVG allows for three types of graphic objects
  • vector graphic shapes (e.g., paths consisting of
    straight lines and curves)
  • Images
  • text

27
SVG Sample
  • ltsymbol id"whiteYellowBezier" overflow"visible"gt
  • ltpath style"strokeblackfillnone" d"M 0,0 C
    0.25,-0.1 0.75,-0.1 1,0"gt
  • ltanimate id"whiteYellowBezierAnim"
    attributeName"d" values"M 0,0 C 0.25,- 0.1
    0.75,-0.1 1,0 M 0,0 C25,-10 75,-10 100,0"
    dur"5s" repeatCount"3"/gt
  • ltanimate attributeName"stroke-width"
    values"13" dur"5s" repeatCount"3" /gt
  • ltanimate attributeName"stroke"
    values"whiteyellow" dur"5s" repeatCount"3" /gt
  • lt/pathgt
  • lt/symbolgt

28
(No Transcript)
29
WML
  • an annotation technique that allows the text
    portions of Web pages to be presented on cellular
    telephone and personal digital assistants
    (personal digital assistant) via wireless access.
  • Though HTML can be used WML is used as it has
    lesser bandwidth resources.
  • Also WML uses lesser power to process compared to
    HTML.

30
Future Direction
  • TEI (Text Encoding Initiative)
  • an international project to develop guidelines
    for the preparation and interchange of electronic
    texts for scholarly research.
  • Supported and promoted the use of SGML.

31
  • W3C (World Wide Web Consortium)
  • Vision Contributions from several hundred
    dedicated researchers and engineers working for
    Member organizations, from the W3C Team , and
    from the entire Web community enable W3C to
    identify the technical requirements that must be
    satisfied if the Web is to be a truly universal
    information space.
  • Design W3C designs Web technologies to realize
    this vision, taking into account existing
    technologies as well as those of the future.
  • Standardization W3C contributes to efforts to
    standardize Web technologies by producing
    specifications (called "Recommendations") that
    describe the building blocks of the Web. W3C
    makes these Recommendations freely available to
    all.

32
  • Paper Critique
Write a Comment
User Comments (0)
About PowerShow.com