Title: TEI : Text Encoding Initiative
1TEI Text Encoding Initiative
- Computing in the Humanities
July 11, 2003 John A. Mess
2Overview
- Origins of Text Encoding Initiative
- Features of Encoded Text
- Tagging of Text
- Examples
Adapted in part from Lou Burnards
Workshop http//www.tei-c.org/Talks/ESS2001/elsnet
-4.htm Project Gutenbergs TEI Guide for HTML
Writers Guild http//gutenberg.hwg.org/teidtds.htm
l
3Text Encoding Initiative
- 1988, the date of the Poughkeepsie Conference
- 1999, when the process of setting up the TEI
Consortium began - Originally, a research project within the
humanities - Sponsored by leading professional associations
- Major influences of
- digital libraries and text collections
- language corpora
- scholarly datasets
- International consortium established June 1999
(see http//www.tei-c.org/)
4Goals of the TEI
- better interchange and integration of scholarly
data - support for all texts, in all languages, from all
periods - guidance for the perplexed
- what to encode
- user-driven codification of existing best
practice - assistance for the specialist how to encode
- loose framework into which unpredictable
extensions can be fitted - These apparently incompatible goals result in a
highly flexible, modular, environment for DTD
customization.
5TEI Deliverables
- A set of recommendations for text encoding,
covering both generic text structures and some
highly specific areas based on (but not limited
by) existing practice - A very large collection of element
definitionscombined into a very loose document
type declaration - A mechanism for creating multiple views (DTDs) of
the foregoing
One such view and associated tutorial TEI Lite
(http//www.tei-c.org/TEI/Lite/) for the full
picture see http//www.tei-c.org/TEI/Guidelines/
6Legacy of the TEI
- Way of looking at what text really is a
codification of current scholarly practice - A set of shared assumptions and priorities about
the digital agenda - focus on content and function (rather
thanpresentation) - identify generic solutions (rather
thanapplication-specific ones)
7Designing a DTD for the TEI
- How can a single mark-up scheme handle a large
variety of requirements ? - all texts are alike
- every text is different
- learn from the database designers
- one construct, many views
- each view a selection from the whole
8The Chicago Pizza Model
- A useful metaphor for expressing modularity now
implemented athttp//www.hcu.ox.ac.uk/TEI/pizza.h
tml lt!ENTITY base "deepDishthinCruststuffed"
gt lt!ENTITY topping "pepperonimushrooms
sausageanchovies... " gt lt!ELEMENT pizza (
base, tomatoSauce, cheese, (topping)) gt
9To build a TEI pizza, take...
- The core tagsets, the base of your choice, the
toppings of your choice, (optionally) a reference
to your extensions - lt? xml version1.0 ?gt
- lt!DOCTYPE tei.2 SYSTEM tei2.dtdgt
- lttei.2gt
- ltteiHeadergt .... lt/teiHeadergt
- lttextgt .... lt/textgt
- lt/tei.2gt
10The core tagsets
- Detailed metadata provision the TEI Header tags
for a large set of common textual requirements - paragraphs
- highlighted phrases
- names, dates, number, abbreviations...
- editorial tags
- notes, cross-references, bibliography
- verse and drama
11The base tagsets define
- basic high-level structure of document one must
be chosen from - prose, verse, or drama
- transcribed speech
- dictionaries and terminology
- or combine two or more using either of the
general base (anything anywhere) - the mixed base (homogenous divisions)
12TEI additional tagsets
- sets of elements for specialised application
areas can be mixed and matched freely - currently provided
- linking and alignment
- analysis feature structures certainty
- physical transcription
- textual criticism, names and dates
- graphs and trees figures and tables
- language corpora....
- in preparation...
- manuscript description
13The Lampeter corpus in the Oxford Text Archive
- Fairly typical requirements for language corpora
- light presentational tagging
- structural markup for access
- demographic information about text production
- small number of tags to ease data capture and
validation - Implementation tagsets
- prose base, and
- tags from four additional sets some extensions,
many exclusions
14Issues with the TEI
- Unmodified TEI offers authors too many choices
- four different types of bibliographic citation
- three (or four) different tags for proper names
- an indigestably rich choice of text editing tags
- At the same time, unmodified TEI lacks
- detailed table model
- detailed tags for mathematical and other formulae
- front matter for modern publications
- tags for multimedia objects
- All this can be addressed by TEI customization
15Why bother?
- The TEI is a well-known reference point
- Using the TEI enables
- sharing of data and resources
- shared modular software development
- lower learning curve and reduced training costs
- The TEI is stable, rigorous, and well-documented
- The TEI is also flexible, customizable, and
extensible in documented ways - The architectural approach offers the best
compromise for practical work.
16Using the TEI for authoring
- A DTD for authoring should be prescriptive rather
than descriptive closely tied to current
authoring practice very easy to use - This suggests that we need content-full tagging
only the tags we need and all the tags we need - For details of version 4, see
- http//www.tei-c.org/P4X/index.html
- http//etext.lib.virginia.edu/tei/uvatei.html
17Graphical Layout
18The overall structure of a unitary text
- ltTEI.2gt
- ltteiHeadergt lt!-- ... --gt lt/teiHeadergt
- lttextgt
- ltfrontgt
- lt!-- front matter of copy text goes here. --gt
- lt/frontgt
- ltbodygt
- lt!-- body of text goes here. --gt
- lt/bodygt
- ltbackgt
- lt!-- back matter of text, if any, here. --gt
- lt/backgt
- lt/textgt
- lt/TEI.2gt
19Core Structural Elements
- lt!ELEMENT TEI.2
- (teiHeader, text) gt
- lt!ELEMENT teiHeader
- (fileDesc, encodingDesc,
profileDesc, revisionDesc?) gt - lt!ELEMENT text
- ((index interp interpGrp lb
milestone pb gap - anchor), (front, (index interp
interpGrp lb milestone - pb gap anchor))?, (body
group), (index interp - interpGrp lb milestone pb
gap anchor), (back, - (index interp interpGrp lb
milestone pb gap - anchor))?) gt
- lt!ELEMENT group
- ((argument byline docAuthor
docDate epigraph head - opener salute signed index
interp interpGrp - lb milestone pb gap
anchor), (text group),
20TEI Header structure
- ltteiHeadergt ltfileDescgt
ltencodingDescgt ltprofileDescgt
ltrevisionDescgt - lt/teiHeadergt
21The File Description ltfileDescgt
- Mandatory
- Supplies full description of the electronic file
itself, and its source/s - Must specify at least a title, a publication
statement, and a source - Use of authority control is advisable but not
required
22The File Description
- ltfileDescgt
- lttitleStmtgt
- lteditionStmtgt 250
- ltpublicationStmtgt
- ltextentgt 300
- ltsourceDescgt
- ltnotesStmtgt 786
- lt/fileDescgt
23The source description
- May contain common TEI bibliographic
elementselements ltbiblgt, ltbiblStructgt, - or a nested file description ltbiblFullgt
- or a list ltlistBiblgt
- or a prose description
- or specialised elements for transcribed speech
- Or (for the born-digital document) simply the
text Original
24Crosswalks
lttitle typemaingt DC.title.main 246
ltauthorgt DC.creator.name 100
ltpublicationStmtgt DC.publisher.name 260
ltsourceDescgt DC.source 500,534
ltclassDeclgt DC.subject.schema 6xx
25The Body of a Text
26Front Matter
- lttextgt
- ltfrontgt
- lttitlePagegt
- ltdocTitlegt
- lttitlePartgtRIDERS OF THE PURPLE
SAGElt/titlePartgt - lt/docTitlegt
- ltdocAuthorgt
- ZANE GREY
- lt/docAuthorgt
- lt/titlePagegt
- lt/frontgt
- ...rest of book content here...
- lt/textgt
27Within a Text
- ltdiv1 type"chapter"gt
- lthead n"1"gtCHAPTER I.lt/headgt
- lthead n"chaptitle"gtLASSITERlt/headgt
- ltpgt
- A sharp clip-crop of iron-shod hoofs deadened
and died away, and - clouds of yellow dust drifted from under the
cottonwoods out over - the sage.
- lt/pgt
- ...
- lt/div1gt
28Multi-Part Books
- lttextgt
- ltfrontgt
- ...book front content here...
- lt/frontgt
- ltgroupgt
- lttextgt
- ltfrontgt ...part 1 front content
here...lt/frontgt - ltbodygt ...part 1 body content
here...lt/bodygt - ltbackgt ...part 1 back content (if
any) here...lt/backgt - lt/textgt
- lttextgt ...content of part 2...lt/textgt
- lttextgt ...content of part 3...lt/textgt
- ...etc...
- lt/groupgt
- ltbackgt
- ...book back content(if any) here...
- lt/backgt
29Drama Markup
- ltstage type"enterance"gtEnter HELENAlt/stagegt
- ltspgt
- ltspeaker who"Hermia"gtHERMIAlt/speakergt
- ltlgtGod speed fair Helena! whither away?lt/lgt
- lt/spgt
- ltspgt
- ltspeaker who"Helena"gtHELENAlt/speakergt
- ltlgtCall you me fair? that fair again
unsay.lt/lgt - ltlgtDemetrius loves your fair O happy
fair!lt/lgt - ltlgtYour eyes are lode-stars and your
tongue's sweet airlt/lgt - ltlgtMore tuneable than lark to shepherd's
ear,lt/lgt - ltlgtWhen wheat is green, when hawthorn buds
appear.lt/lgt - ltlgtSickness is catching O, were favour
so,lt/lgt - ltlgtYours would I catch, fair Hermia, ere I
golt/lgt
30Examples from Lady in Boomtown
- ltpgtThe statistics on ltrs type"place"
key"GOLD1"gtGoldfieldlt/rsgt are from an article by
Charles F. Spillman - that appeared in the ltrs type"place"
key"NEVA1"gtNevadalt/rsgt News Letter of January 1,
- ltdate value"1916"gt1916lt/rsgt, a copy of which I
had preserved. His figures corresponded to my own
memory - sufficiently for me to accept their accuracy.lt/pgt
- ltpgtDuring ltrs type"person" key"TASK1"gtTasker
Oddieslt/rsgt lifetime we talked for many happy
hours - about the old days, and much I have written came
out of those conversations. - He also furnished me with campaign literature
pertaining to - himself and ltrs type"person" key"KEYP1"gtKey
Pittmanlt/rsgt, together with magazine clippings
about the - battleship Nevada. All of these were returned to
ltrs type"person" key"TASK1"gtSenator Oddielt/rsgt,
so I - have no record of the published sources from
which they were clipped. - Most of the gossip came to me through my brother
and my brother-in- - law, who joined us when ltrs type"place"
key"TONO1"gtTonopahlt/rsgt began to boom. Whenever
either - one of them picked up ltqgtoff the streetlt/qgt an
exciting bit of information, - they shared it with me and I added it to my
notes.lt/pgt