The Guidelines P5 of the Text Encoding Initiative TEI - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

The Guidelines P5 of the Text Encoding Initiative TEI

Description:

In core, most people need p, q, list, pb and head ... alphabetical lists of classes, macros, elements. each chapter describes a distinct module ... – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 49
Provided by: tel2
Category:

less

Transcript and Presenter's Notes

Title: The Guidelines P5 of the Text Encoding Initiative TEI


1
The Guidelines (P5) of the Text Encoding
Initiative (TEI)
  • Laurent Romary
  • Max Planck Digital Library
  • With many thanks to Lou Burnard (OUCS)

2
In the beginning
Michael Sperberg-McQuen
Lou Burnard
Antonio Zampolli
1. Novembre 1987 Vassar College, Poughkeepsie
3
The basic scenario?
4
Early Americas Digital Archive
  • material
  • transcriptions of early printed books
  • organization
  • library metaphor
  • technologies
  • XSLT rendering in the browser simple search via
    php
  • TEI source visible?
  • Yes, via view source

http//www.mith2.umd.edu/eada/
5
Samyukta Agama
  • material
  • multiple recensions from different traditions of
    the Buddhist Canon
  • organization
  • aligned texts for each sutra
  • technologies
  • Cocoon, eXist, Xquery
  • TEI source visible?
  • yes

http//buddhistinformatics.chibs.edu.tw/BZA/
6
Can scientists bear standards?
  • Standards are essentially bad for scientists
  • Freezing knowledge
  • Making one lose time that could be dedicated to
    research
  • Forcing diverging views to agree
  • especially if the work is done by others
  • A positive view on standards
  • Documenting data
  • Giving semantics to data
  • Pooling data from various origins
  • Allowing interoperability of tools
  • A possible answer
  • Standards as specification platforms
  • Does the TEI provide you with this?

7
An overview of the TEI elements
8
Basic structure(s)
  • Every TEI-conformant document comprises a header
    followed by (at least one) text
  • the header contains
  • mandatory file description
  • optional encoding, profile and revision
    descriptions
  • the header is essential for
  • bibliographic control and identification
  • resource documentation and processing

9
Structure of a TEI text
  • In the simplest case, a text just consists of
    paragraphs or clauses or verse lines
  • A TEI text has a little more structure it
    contains
  • optional front matter
  • optional back matter
  • a body

10
The body of a text usually has divisions
  • usually nested one with another
  • the type attribute labels a particular level e.g.
    as "part" or "chapter
  • the n attribute gives a particular division a
    name or number
  • the xmlid attribute gives a particular division
    a unique identifier

11
For example...
  • Book I.
  • xmlid"JA0101"
  • Of writing lives in
    general...

12
TEI global attributes
  • The attribute class att.global defines these for
    all elements
  • xmlid supplies a unique identifier
  • n supplies a (non-unique) name or number
  • rend gives a suggestion about rendition
    (appearance)
  • xmllang identifies the language using an ISO
    standard code
  • The linking module extends this class with
  • corresp, synch, ana for specific association
    types
  • next, prev for aggregating fragmented elements

13
Text components
  • What are divisions composed of?
  • prose is mostly paragraphs (p)
  • verse is mostly lines (l), sometimes in
    hierarchic groups (lg)
  • drama is mostly speeches (sp) containing p or l
    elements interspersed with stage directions
    (stage)
  • These may be mixed, and may also appear directly
    within
  • undivided texts
  • .... but divisions can also contain embedded text
    or quote
  • elements.

14
For example
  • Of Man's first disobedience, and the
    fruit
  • Of that forbidden tree whose mortal
    taste
  • Brought death into the World, and all our
    woe,
  • With loss of Eden...
  • ....
  • Summer grass
  • all that's left
  • of warriors' dreams

15
For example
  • Enter Barnardo and Francisco, two
    Sentinels,at several doors
  • Who's there?
  • Nay, answer me. Stand and unfold
    yourself.
  • Long live the king!
  • Barnardo?
  • He.

16
not to mention
  • .... And he wrote on one side of the paper
  • HELP!
  • PIGLIT (ME)
  • and on the other side
  • IT'S ME PIGLIT, HELP HELP
  • Then he put the paper in the bottle...

17
What are speeches, paragraphs, and lines made of?
  • phrases that are conventionally typographically
    distinct
  • data-like (names, numbers, dates, times,
    addresses)
  • editorial interventions (corrections,
    regularizations, additions, omissions ...)
  • cross references and links
  • lists, notes, graphics, tables, bibliographic
    citations...
  • all kinds of annotations!
  • Which of these you need to markup will depend on
    your research
  • agenda

18
for example...
  • Of writing lives in general,and
    particularly of Pamela, with a
    word by the bye of Colley
    Cibber and others.
  • It is a trite but true observation, that
    examples work more forcibly on the mind than
    precepts
  • Mr. Joseph Andrews,
    the hero of our ensuing history, was
    esteemed to be ...

19
Direct speech
  • Use the who attribute to show speakers
  • Speeches can be nested in other speeches
  • Spaulding, he came down into the
    office just this
  • day eight weeks with this very paper in his hand,
    and he says
  • I wish to the Lord, Mr.
    Wilson, that I was a red-headed man.

20
Foreign language phrases
  • The xmllang attribute may be attached to any
    element
  • Use if nothing else is available
  • Use ISO 639-2 code to identify language
  • Have you read Die
    Dreigroschenoper?
  • Savoir-faire
    is French for know-how.
  • John has real savoir-faire
    .

21
Inter class elements
  • lists of all kinds
  • notes (authorial or editorial)
  • pictures or figures
  • tables
  • bibliographic descriptions

22
for example...
  • For my true love
  • three calling birds
  • two french hens
  • a partridge in a pear
    tree
  • For Uncle Joe
  • socks as usual

23
Example
  • Mr Fezziwig's Ball
  • A Cruikshank engraving showing Mr
    Fezziwig leading a group of revellers.

24
Feeling overwhelmed?
  • All of this is just one way of looking at the
    TEI.
  • 1. The TEI is a modular system you use it to
    build an encoding scheme appropriate to your
    needs, by selecting specific modules
  • 2. Each module defines a group of elements and
    attributes
  • 3. Elements are classified structurally and
    semantically
  • Define your goals before using the TEI!

25
Some other modules
  • Your choice from
  • 1. Transcription of spoken texts
  • 2. Dictionaries and lexica
  • 3. Varieties of linguistic annotation
  • 4. Nonstandard characters and glyphs
  • 5. Linking, alignment, non-hierarchic
    structures
  • 6. Detailed metadata (the TEI Header)
  • 7. Manuscript Description
  • 8. Text-critical apparatus
  • 9. Physical description
  • 10. Onomastics and ontologies
  • 11. The ODD system

26
Customizing the TEIHow do you use the TEI
modular structure?
27
The global TEI architecture
28
Following the TEI spirit
  • Conformance to the TEI means
  • Sharing a common text encoding culture
  • Sharing the same vocabulary (when applicable)
  • Allowing user autonomy in defining modifications
    (extensions, customization), but sharing the
    mechanisms to do so
  • The TEI gives you a lot of help in following
    these rules.

29
Important concepts
  • The TEI's literary programming with ODD (One
    Document Does it
  • all) provides
  • Schema specification
  • User oriented documentation
  • Modularity all specifications pertaining to a
    coherent sub-domain of the TEI
  • Classes identifying shared behaviours or
    semantics
  • Extensibility a consequence of the above
    mechanisms

30
The TEI ODD in practice
  • The TEI Guidelines, its schema, and its schema
    fragments, are
  • all produced from a single XML resource
    containing
  • 1. Descriptive prose (lots of it)
  • 2. Examples of usage (plenty)
  • 3. Formal declarations for components of the TEI
    Abstract Model
  • elements and attributes
  • Modules
  • classes and macros

31
Possibilities of customizing the TEI
  • The TEI has over 20 modules. A working project
    will
  • Choose the modules they need
  • Probably narrow the set of elements within a
    module
  • Probably add local datatype constraints
  • Possibly add new elements
  • Possibly localize the names of elements

32
Quick and simple access to the TEI
  • Imagine that you have seen your colleague next
    door doing some
  • encoding with the TEI and want to do the same
    thing
  • Go to Roma at http//tei.oucs.ox.ac.uk/Roma/
  • Generate a schema Schema
  • Make a trial with the editor, creating a simple
    document
  • Get back to Roma and make basic documentation

33
RomaStart
34
RomaSchema Select
35
RomaGenerate Doc
36
Subsetting the TEI
  • Suppose you now feel you want to use some more of
    the TEI, but
  • not all of it
  • Go to Roma...
  • Look at Modules
  • Explore default modules by pointing to main
    elements (by order of interest). You can throw
    away most things, but
  • In textstructure, you should really keep TEI,
    text, body and div
  • In core, most people need p, q, list, pb and head
  • From header, keep everything unless you really
    understand the details
  • Start checking out elements
  • Make editorial choices (numbered vs. unnumbered
    heads)

37
Roma Modules
38
Roma Change Module
39
Adding TEI objects
  • You can add your own elements and attributes. But
  • make very sure you are not just making something
    which is syntactic sugar for an existing TEI
    concept
  • do not rename existing elements - you can do that
    directly in ODD
  • if you want facilities from a very different
    field of discourse, such as maths or vector
    graphics, use the existing standards in that area
  • consider interoperability

40
Roma Add Element
41
Under the hood
  • TEI customizations are themselves expressed in
    TEI XML, using
  • elements from the tagdocs module.
  • For example
  • This is TEI Lite with simplified
    heads

42
What does an ODD look like?

  • A unique
    identifier
  • supplies the identifier of
    the person or group pausing. Its value is the
    identifier of a person
  • or persGrp element
    in the TEI header.


43
... from which we generate
  • element pause pause.content, pause.attributes
    pause.content empty
  • pause.attributes
  • att.global.attributes,
  • att.timed.attributes,
  • att.typed.attributes,
  • att.ascribed.attributes,
  • model.divPart.spoken pause
  • att.timed pause
  • att.typed pause
  • att.ascribed pause

44
.. or
  • att.global.attributes
  • att.timed.attributes
  • att.typed.attributes
  • att.ascribed.attributes
  • "x.model.divPart.spoken n.event
    n.kinesic
  • n.pause n.shift n.u
  • n.vocal n.writing"

45
... and, indeed
46
MPDL CoLaboratory (MPDL CoLab)
  • Platform for community building and knowledge
    exchange
  • Aim
  • improve exchange of explicit knwoledge and make
    tacit and individual know-how explicit
  • Supports community-building processes
  • Connects people with similar fields of interest
    and goals
  • within the MPS MPDL, librarians, scientists
  • Outside underlying basis of our national and
    international collaborations
  • Provide information about existing standards and
    best practices in the domain of supporting
    scientific life cycles
  • Ensuring long-term compatibility between local
    and centralized initiatives within the MPDL

47
(No Transcript)
48
Exploring TEI P5
  • Visit http//www.tei-c.org/release/doc/tei-p5-doc/
    html/
  • alphabetical lists of classes, macros, elements
  • each chapter describes a distinct module
  • each module presents a semantically related list
    of elements, with examples of their use
  • Feedback and advice available to all on
    tei-l_at_listserv.brown.edu
Write a Comment
User Comments (0)
About PowerShow.com