Title: Early English Books Online Text Creation Partnership EEBOTCP
1Early English Books OnlineText Creation
Partnership(EEBO-TCP)
- Encoding early printed texts
- Jonathan Blaney, Emma Leeson, Judith Siefring
- University of Oxford
..waking up in the British Library British
Library Conference Centre 25 October 2004
2Encoding Early Printed Books
- 1. General Aims, Processes, Texts
- 2. Feature tagging
- 3. Character-level capture
3Aims of EEBO-TCP encoding
- To accurately transcribe what is printed
- To organize the text into divisions, using XML
encoding based on TEI (Text Encoding Initiative)
guidelines - To tag structural features of the text, such as
lists, tables, quotations, etc. - To facilitate searching within and across texts
and navigation within each text
4The review process
- Proofread 5 of each text to check accuracy
- Assess the structure of the text
- Edit existing tagging to reflect that structure
correctly - Add additional tagging to any features not
properly encoded - Assign types (e.g. title page, preface) to the
divisions of the text
5Aims
- To accurately transcribe what is printed
- To organize the text into divisions
- To tag structural features of the text, such as
lists, tables, quotations, etc. - To facilitate navigation within the texts and
searching within and across texts
Plain-text view of encoding
6Aims
- To accurately transcribe what is printed
- To organize the text into divisions
- To tag structural features of the text, such as
lists, tables, quotations, etc. - To facilitate navigation within the texts and
searching within and across texts
Tags-on view of encoding, using XMetaL
7Structure
- Simple, e.g. sermons
- More complex but regular, e.g. drama
- Partly regular, with some jumbled elements
- Very complex and/or haphazard
- Aids for structure include tables of contents,
running headers, section headings
8Textual division types
- Straightforward title page, table of contents,
act, scene - Similar types preface or address to reader?
license or imprimatur? - Conflicting clues
- Sections entirely in a foreign language
- Generic part, subpart, section, subsection
9Some features we tag
- Opening material salutes, arguments
- Closing material signatures, dates, datelines,
postscripts - Letters
- Lists, tables
- Speakers stage directions
- Quotations, bibliographic references, epigraphs
- Notes, milestones
10and some we dont
- Non-Roman alphabet Greek, Hebrew
- Complex mathematical material
- Music
- Illegible characters
- Handwritten material
- Damaged or missing material
11Marginal note or milestone?
- We use milestone for marginal structural
information - Keyers often cant differentiate them from
marginal notes
12Part 2 SUBDIV tagging
Complex lists
13Common keying problems
- I versus J
- Only one character in Gothic type
- Confusion has led to J being captured as I
- Particularly true in italic type
- U versus V
- Lower case v looks like upper case U
- Modern usage creeps in
A list encoded in XMetaL
14Marginal note or milestone?
- We use milestone for marginal structural
information - Keyers often cant differentiate them from
marginal notes
15Syllogisms
16(No Transcript)
17Character-level capture
- Aim to get it right from the start
- Detailed keying guidelines available
- Fast and detailed response to any queries
- Record of all past queries and responses
18Examples
- Ligatures
- Ampersands
- Roman numerals
- Abbreviations
- Symbols
- Options ltabbrgtlt/abbrgt fire n oe, ae, ss
ye
19A highly abbreviated text
-
-
-
-
- Et frater meus celerarius fatebatur hoc esse
licitum in tempore magne necessitatis /
alias non allegauit quod Juriste dicunt
/ quod non est licitum in lege necessitas
facit licitum. Sed ipse dicebat contra
atque tenebat quod in omni tempore
per illud dictum s. Jacobi 'Confitemini
c.' preterea voluit habere homines
peregre proficiescentes in derisum ob quam
causam non venit nunc ad memoriam. - Et fr meus celerariabus fatebatur hoc esse
licitu in tabpere magne necessitatis / alias
no allegauit abquod Iuriste dicut /
abquod no est licitu in lege necessitas
facit licitu. Sed ipse dicebat abcontra
atabque tenebat abquod i oi tabpere
abper illud dictu s. Iacobi Cofitemini c.
pterea voluit habere hoies peregre
proficiscetes in derisum ob qua causam no
venit nuc ad memoriam.
20Common keying problems
- I versus J
- Only one character in Gothic type
- Confusion has led to J being captured as I
- Particularly true in italic type
- U versus V
- Lower case v looks like upper case U
- Modern usage creeps in
21Problem? A new character.
- Rare characters not anticipated by the guidelines
- Characters used only in one book
22Problem? A character with multiple meanings.
- z
- Can represent at least 5 different meanings (m
dram z yogh 3) - Serves as an abbreviation stroke for at least a
half dozen words and morphemes (e.g. oz. viz.) - An open triangle
- Can mean fire, trine, delta, generic marker
23Problem? Multiple meanings.
- Recipe or Responsus, but also?
- Rotolo
- Abbreviation
- By Jupiter?
- J used for I
24Problem? Stray characters.
- Characters that appear from nowhere are a problem
- Upside down characters are captured the right way
up we correct the printing error
25What does this mean for searchers?
- Be aware of common keying problems (I or J, U or
V, f or long s) - End of line soft hyphens are captured with or
- Searching for, for example, atque will also
retrieve occurrences of atabque - Non-standard entity references (flower yogh)
cannot be searched for at present, and are
displayed as text - Be aware of printer errors and idiosyncrasies
26Further information
- www.odl.ox.ac.uk/eebo
- eebo.chadwyck.com/home
- Contact details
- eebo_at_sers.ox.ac.uk
- 44 (0)1865 280026
- SERS Building, Osney Mead, Oxford, OX2 0ES