Tutorial on Standoff Markup - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Tutorial on Standoff Markup

Description:

Don't keep all your data in one big document ... show CDATA #FIXED 'embed' actuate CDATA #FIXED 'auto' Standoff Example (1): Words XML ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 24
Provided by: amyi
Category:

less

Transcript and Presenter's Notes

Title: Tutorial on Standoff Markup


1
Tutorial on Standoff Markup
  • as used in
  • HCRC Map Task Corpus
  • MATE/NITE Workbench
  • Amy Isard
  • HCRC Language Technology GroupUniversity of
    Edinburgh

2
Standoff Annotation
  • Dont keep all your data in one big document
  • One document for each annotation level (with its
    own DTD)
  • Links between documents

3
LTG link syntax (1)
  • an element can point to one or more contiguous
    elements in the same or a different document
  • each element is identified by a unique ID
  • a link is shown as an attribute on an element
  • default attributes in the DTD tell a program that
    this is a link

4
LTG link syntax (2)
  • attributes to describe a link which will be
    embedded in the original element output document
  • href CDATA IMPLIED
  • xmllink CDATA FIXED "simple
  • show CDATA FIXED "embed
  • actuate CDATA FIXED "auto"

5
Standoff Example (1)Words XML
  • lt!DOCTYPE SYSTEM words.dtdgt
  • ltwordsgt
  • ltword idw1gtturnlt/wordgt
  • ltword idw2gtrightlt/wordgt
  • ltword idw3gtforlt/wordgt
  • ltword idw4gtthreelt/wordgt
  • ltword idw5gtcentimetreslt/wordgt
  • ltword idw6gtokaylt/wordgt
  • lt/wordsgt

6
Standoff Example (2)Moves XML
  • lt!DOCTYPE SYSTEM moves.dtdgt
  • ltmovesgt
  • ltmove typeinstruct speakerspk1 idm1
  • hrefwords.xmlid(w1)..id(w5)/gt
  • ltmove typealign speakerspk1 idm2
  • hrefwords.xmlid(w6)/gt
  • lt/movesgt

7
Standoff Example (3)Moves and Words XML
  • lt!DOCTYPE SYSTEM words.dtdgt
  • ltwordsgt
  • ltword idw1gtturnlt/wordgt
  • ltword idw2gtrightlt/wordgt
  • ltword idw3gtforlt/wordgt
  • ltword idw4gtthreelt/wordgt
  • ltword idw5gtcentimetres
  • lt/wordgt
  • ltword idw6gtokaylt/wordgt
  • lt/wordsgt
  • lt!DOCTYPE SYSTEM moves.dtdgt
  • ltmovesgt
  • ltmove typeinstruct speakerspk1 idm1
    hrefwords.xmlid(w1)..id(w5)/gt
  • ltmove typealign speakerspk1 idm2
  • hrefwords.xmlid(w6)/gt
  • lt/movesgt

8
Advantages of Standoff Annotation
  • It is possible to have levels of annotation which
    have crossing branches (not normally possible in
    XML)
  • New levels of annotation can be added without
    disturbing existing ones
  • Editing one level of annotation has minimal
    knock-on effects on others
  • People can work on different levels at the same
    time without worrying about creating different
    versions

9
Example Map Task Annotation Structure
Dialogue Games
Game instruct
Dialogue Moves
M instruct
M ack
M instruct
M ack
M align
M align
three
centimetres
okay
three
or
four
centimetres
okay
S1
turn
right
for
Words
right
right
S2
reparandum
repair
Disfluencies
Disfluency
10
HCRC Map Task XML Corpus Architecture
Gaze
Timed Units
Disfluencies
Landmark References
Tokens
Moves
Transactions
Tagged Words
Other Speakers Words
Automatic Syntax
Games
11
Tools and Software
  • LTXML tools www.ltg.ed.ac.uk/software
  • MATE workbench (NITE)
  • mate.nis.sdu.dk (nite.nis.sdu.dk)
  • Map Task XML
  • www.hcrc.ed.ac.uk/maptask

12
knit
  • Part of the LTXML toolkit
  • Allows you to expand links according to how
    they have been defined in the DTD (e.g. replace
    or embed)
  • Command line program, can be used in pipelines

13
Standoff Example (3)Moves and Words XML
  • lt!DOCTYPE SYSTEM words.dtdgt
  • ltwordsgt
  • ltword idw1gtturnlt/wordgt
  • ltword idw2gtrightlt/wordgt
  • ltword idw3gtforlt/wordgt
  • ltword idw4gtthreelt/wordgt
  • ltword idw5gtcentimetres
  • lt/wordgt
  • ltword idw6gtokaylt/wordgt
  • lt/wordsgt
  • lt!DOCTYPE SYSTEM moves.dtdgt
  • ltmovesgt
  • ltmove typeinstruct speakerspk1 idm1
    hrefwords.xmlid(w1)..id(w5)/gt
  • ltmove typealign speakerspk1 idm2
  • hrefwords.xmlid(w6)/gt
  • lt/movesgt

14
Standoff Example (4)Moves XML with embed links
  • lt!DOCTYPE SYSTEM moves.dtdgt
  • ltmovesgt
  • ltmove typeinstruct speakerspk1 idm1
    hrefwords.xmlid(w1)..id(w5)gt
  • ltword idw1gtturnlt/wordgt
  • ltword idw2gtrightlt/wordgt
  • ltword idw3gtforlt/wordgt
  • ltword idw4gtthreelt/wordgt
  • ltword idw5gtcentimetreslt/wordgt
  • lt/movegt
  • ltmove typealign speakerspk1 idm2
    hrefwords.xmlid(w6)gt
  • ltword idw6gtokaylt/wordgt
  • lt/movegt
  • lt/movesgt

15
Standoff Example (4)Moves XML with replace links
  • lt!DOCTYPE SYSTEM moves.dtdgt
  • ltmovesgt
  • ltword idw1gtturnlt/wordgt
  • ltword idw2gtrightlt/wordgt
  • ltword idw3gtforlt/wordgt
  • ltword idw4gtthreelt/wordgt
  • ltword idw5gtcentimetreslt/wordgt
  • ltword idw6gtokaylt/wordgt
  • lt/movesgt

16
Working with knit
  • Use knit on one XML document to work with one
    hierarchical view of the data
  • To work across hierarchies, knit several views
    and navigate using the structures plus the unique
    ids of elements

17
Stylesheets
  • style sheet template rules
  • pattern which specifies which tree it applies to
  • pattern which specifies which tree it should
    output
  • stylesheet processor
  • reads XML document and stylesheet
  • carries out the instructions in the stylesheet
  • outputs a new XML document or

18
Template Matching
  • XPath is a language for addressing parts of an
    XML document, and is used by XSLT in the match
    attribute of a template e.g. lttemplate
    matchsentencegt matches any sentence element.
  • A stylesheet processor goes through the XML
    document matching elements to templates and
    carries out the instructions in the template.

19
Standard Stylesheet Example
  • lttemplate matchdialgt
  • lttablegt
  • ltapply-templates/gt
  • lt/tablegt
  • lt/templategt
  • lttemplate matchmovegt
  • lttrgt
  • ltapply-templates/gt
  • lt/trgt
  • lt/templategt
  • lttemplate matchwordgt
  • lttdgt
  • ltapply-templates/gt
  • lt/tdgt
  • lt/templategt

20
The MATE Workbench
  • For display, querying, and especially annotation
    of XML corpora
  • Flexible user-defined user interfaces
  • Uses stylesheets to create Java display objects
    which have defined user interface behaviours
  • In MATE internal data representation, elements
    with link pointers are viewed as parent elements

21
MATE query language
  • Easy to write queries over more than one
    hierarchy
  • In MATE query language you define variables by
    element type and then relationships between them
  • (a b) means that element a is a parent of
    element b, either in the same document, or via a
    link.

22
MATE example query
  • Find all words which are in a move whose label is
    instruct and which are part of a disfluency
  • (w word)(m move)(d disfluency)
  • (m w) and (m label instruct) and
  • (d w)

23
Conclusions
  • Standoff markuup is not just theoretically a good
    idea
  • Map Task standoff annotations in place for 5
    years, used regularly
  • Accessible to linguists with modest technical
    backgrounds
Write a Comment
User Comments (0)
About PowerShow.com