Title: The Guidelines P5 of the Text Encoding Initiative TEI
1The Guidelines (P5) of the Text Encoding
Initiative (TEI)
- Laurent Romary
- Max Planck Digital Library
- With many thanks to Lou Burnard (OUCS)
2In the beginning
Michael Sperberg-McQuen
Lou Burnard
Antonio Zampolli
1. Novembre 1987 Vassar College, Poughkeepsie
3The basic scenario?
4Early Americas Digital Archive
- material
- transcriptions of early printed books
- organization
- library metaphor
- technologies
- XSLT rendering in the browser simple search via
php - TEI source visible?
- Yes, via view source
http//www.mith2.umd.edu/eada/
5Samyukta Agama
- material
- multiple recensions from different traditions of
the Buddhist Canon - organization
- aligned texts for each sutra
- technologies
- Cocoon, eXist, Xquery
- TEI source visible?
- yes
http//buddhistinformatics.chibs.edu.tw/BZA/
6Can scientists bear standards?
- Standards are essentially bad for scientists
- Freezing knowledge
- Making one lose time that could be dedicated to
research - Forcing diverging views to agree
- especially if the work is done by others
- A positive view on standards
- Documenting data
- Giving semantics to data
- Pooling data from various origins
- Allowing interoperability of tools
- A possible answer
- Standards as specification platforms
- Does the TEI provide you with this?
7An overview of the TEI elements
8Basic structure(s)
- Every TEI-conformant document comprises a header
followed by (at least one) text - the header contains
- mandatory file description
- optional encoding, profile and revision
descriptions - the header is essential for
- bibliographic control and identification
- resource documentation and processing
9Structure of a TEI text
- In the simplest case, a text just consists of
paragraphs or clauses or verse lines - A TEI text has a little more structure it
contains - optional front matter
- optional back matter
- a body
10The body of a text usually has divisions
- usually nested one with another
- the type attribute labels a particular level e.g.
as "part" or "chapter - the n attribute gives a particular division a
name or number - the xmlid attribute gives a particular division
a unique identifier
11For example...
-
-
-
-
-
-
- Book I.
- xmlid"JA0101"
- Of writing lives in
general... -
-
-
-
-
-
-
-
-
-
12TEI global attributes
- The attribute class att.global defines these for
all elements - xmlid supplies a unique identifier
- n supplies a (non-unique) name or number
- rend gives a suggestion about rendition
(appearance) - xmllang identifies the language using an ISO
standard code - The linking module extends this class with
- corresp, synch, ana for specific association
types - next, prev for aggregating fragmented elements
13Text components
- What are divisions composed of?
- prose is mostly paragraphs (p)
- verse is mostly lines (l), sometimes in
hierarchic groups (lg) - drama is mostly speeches (sp) containing p or l
elements interspersed with stage directions
(stage) - These may be mixed, and may also appear directly
within - undivided texts
- .... but divisions can also contain embedded text
or quote - elements.
14For example
-
- Of Man's first disobedience, and the
fruit - Of that forbidden tree whose mortal
taste - Brought death into the World, and all our
woe, - With loss of Eden...
- ....
-
-
- Summer grass
- all that's left
- of warriors' dreams
15For example
- Enter Barnardo and Francisco, two
Sentinels,at several doors -
- Who's there?
-
-
- Nay, answer me. Stand and unfold
yourself. -
-
- Long live the king!
-
-
- Barnardo?
-
-
- He.
16not to mention
- .... And he wrote on one side of the paper
-
- HELP!
- PIGLIT (ME)
-
- and on the other side
- IT'S ME PIGLIT, HELP HELP
-
- Then he put the paper in the bottle...
17What are speeches, paragraphs, and lines made of?
- phrases that are conventionally typographically
distinct - data-like (names, numbers, dates, times,
addresses) - editorial interventions (corrections,
regularizations, additions, omissions ...) - cross references and links
- lists, notes, graphics, tables, bibliographic
citations... - all kinds of annotations!
- Which of these you need to markup will depend on
your research - agenda
18for example...
- Of writing lives in general,and
particularly of Pamela, with a
word by the bye of Colley
Cibber and others. - It is a trite but true observation, that
examples work more forcibly on the mind than
precepts -
- Mr. Joseph Andrews,
the hero of our ensuing history, was
esteemed to be ...
19Direct speech
- Use the who attribute to show speakers
- Speeches can be nested in other speeches
- Spaulding, he came down into the
office just this - day eight weeks with this very paper in his hand,
and he says - I wish to the Lord, Mr.
Wilson, that I was a red-headed man.
20Foreign language phrases
- The xmllang attribute may be attached to any
element - Use if nothing else is available
- Use ISO 639-2 code to identify language
- Have you read Die
Dreigroschenoper? - Savoir-faire
is French for know-how. - John has real savoir-faire
.
21Inter class elements
-
- lists of all kinds
-
- notes (authorial or editorial)
-
- pictures or figures
-
- tables
-
- bibliographic descriptions
22for example...
-
- For my true love
-
-
- three calling birds
- two french hens
- a partridge in a pear
tree -
-
- For Uncle Joe
- socks as usual
23Example
-
- Mr Fezziwig's Ball
- A Cruikshank engraving showing Mr
Fezziwig leading a group of revellers. -
-
24Feeling overwhelmed?
- All of this is just one way of looking at the
TEI. -
- 1. The TEI is a modular system you use it to
build an encoding scheme appropriate to your
needs, by selecting specific modules - 2. Each module defines a group of elements and
attributes - 3. Elements are classified structurally and
semantically - Define your goals before using the TEI!
25Some other modules
- Your choice from
- 1. Transcription of spoken texts
- 2. Dictionaries and lexica
- 3. Varieties of linguistic annotation
- 4. Nonstandard characters and glyphs
- 5. Linking, alignment, non-hierarchic
structures - 6. Detailed metadata (the TEI Header)
- 7. Manuscript Description
- 8. Text-critical apparatus
- 9. Physical description
- 10. Onomastics and ontologies
- 11. The ODD system
26Customizing the TEIHow do you use the TEI
modular structure?
27The global TEI architecture
28Following the TEI spirit
- Conformance to the TEI means
- Sharing a common text encoding culture
- Sharing the same vocabulary (when applicable)
- Allowing user autonomy in defining modifications
(extensions, customization), but sharing the
mechanisms to do so - The TEI gives you a lot of help in following
these rules.
29Important concepts
- The TEI's literary programming with ODD (One
Document Does it - all) provides
- Schema specification
- User oriented documentation
- Modularity all specifications pertaining to a
coherent sub-domain of the TEI - Classes identifying shared behaviours or
semantics - Extensibility a consequence of the above
mechanisms
30The TEI ODD in practice
- The TEI Guidelines, its schema, and its schema
fragments, are - all produced from a single XML resource
containing - 1. Descriptive prose (lots of it)
- 2. Examples of usage (plenty)
- 3. Formal declarations for components of the TEI
Abstract Model - elements and attributes
- Modules
- classes and macros
31Possibilities of customizing the TEI
- The TEI has over 20 modules. A working project
will - Choose the modules they need
- Probably narrow the set of elements within a
module - Probably add local datatype constraints
- Possibly add new elements
- Possibly localize the names of elements
32Quick and simple access to the TEI
- Imagine that you have seen your colleague next
door doing some - encoding with the TEI and want to do the same
thing - Go to Roma at http//tei.oucs.ox.ac.uk/Roma/
- Generate a schema Schema
- Make a trial with the editor, creating a simple
document - Get back to Roma and make basic documentation
33RomaStart
34RomaSchema Select
35RomaGenerate Doc
36Subsetting the TEI
- Suppose you now feel you want to use some more of
the TEI, but - not all of it
- Go to Roma...
- Look at Modules
- Explore default modules by pointing to main
elements (by order of interest). You can throw
away most things, but - In textstructure, you should really keep TEI,
text, body and div - In core, most people need p, q, list, pb and head
- From header, keep everything unless you really
understand the details - Start checking out elements
- Make editorial choices (numbered vs. unnumbered
heads)
37Roma Modules
38Roma Change Module
39Adding TEI objects
- You can add your own elements and attributes. But
- make very sure you are not just making something
which is syntactic sugar for an existing TEI
concept - do not rename existing elements - you can do that
directly in ODD - if you want facilities from a very different
field of discourse, such as maths or vector
graphics, use the existing standards in that area - consider interoperability
40Roma Add Element
41Under the hood
- TEI customizations are themselves expressed in
TEI XML, using - elements from the tagdocs module.
- For example
-
- This is TEI Lite with simplified
heads -
-
-
-
-
-
-
-
-
-
-
42What does an ODD look like?
-
-
-
-
-
-
-
-
-
-
-
- A unique
identifier - supplies the identifier of
the person or group pausing. Its value is the
identifier of a person - or persGrp element
in the TEI header. -
-
-
-
-
43... from which we generate
- element pause pause.content, pause.attributes
pause.content empty - pause.attributes
- att.global.attributes,
- att.timed.attributes,
- att.typed.attributes,
- att.ascribed.attributes,
- model.divPart.spoken pause
- att.timed pause
- att.typed pause
- att.ascribed pause
44.. or
-
-
- att.global.attributes
- att.timed.attributes
- att.typed.attributes
- att.ascribed.attributes
-
- "x.model.divPart.spoken n.event
n.kinesic - n.pause n.shift n.u
- n.vocal n.writing"
45 ... and, indeed
46MPDL CoLaboratory (MPDL CoLab)
- Platform for community building and knowledge
exchange - Aim
- improve exchange of explicit knwoledge and make
tacit and individual know-how explicit - Supports community-building processes
- Connects people with similar fields of interest
and goals - within the MPS MPDL, librarians, scientists
- Outside underlying basis of our national and
international collaborations - Provide information about existing standards and
best practices in the domain of supporting
scientific life cycles - Ensuring long-term compatibility between local
and centralized initiatives within the MPDL
47(No Transcript)
48Exploring TEI P5
- Visit http//www.tei-c.org/release/doc/tei-p5-doc/
html/ - alphabetical lists of classes, macros, elements
- each chapter describes a distinct module
- each module presents a semantically related list
of elements, with examples of their use - Feedback and advice available to all on
tei-l_at_listserv.brown.edu