Early English Books Online Text Creation Partnership EEBOTCP - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Early English Books Online Text Creation Partnership EEBOTCP

Description:

To organize the text into divisions, using XML encoding based on TEI (Text ... license or imprimatur? Conflicting clues. Sections entirely in a foreign language ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 27
Provided by: desktopsup
Category:

less

Transcript and Presenter's Notes

Title: Early English Books Online Text Creation Partnership EEBOTCP


1
Early English Books OnlineText Creation
Partnership(EEBO-TCP)
  • Encoding early printed texts
  • Jonathan Blaney, Emma Leeson, Judith Siefring
  • University of Oxford

..waking up in the British Library British
Library Conference Centre 25 October 2004
2
Encoding Early Printed Books
  • 1. General Aims, Processes, Texts
  • 2. Feature tagging
  • 3. Character-level capture

3
Aims of EEBO-TCP encoding
  • To accurately transcribe what is printed
  • To organize the text into divisions, using XML
    encoding based on TEI (Text Encoding Initiative)
    guidelines
  • To tag structural features of the text, such as
    lists, tables, quotations, etc.
  • To facilitate searching within and across texts
    and navigation within each text

4
The review process
  • Proofread 5 of each text to check accuracy
  • Assess the structure of the text
  • Edit existing tagging to reflect that structure
    correctly
  • Add additional tagging to any features not
    properly encoded
  • Assign types (e.g. title page, preface) to the
    divisions of the text

5
Aims
  • To accurately transcribe what is printed
  • To organize the text into divisions
  • To tag structural features of the text, such as
    lists, tables, quotations, etc.
  • To facilitate navigation within the texts and
    searching within and across texts

Plain-text view of encoding
6
Aims
  • To accurately transcribe what is printed
  • To organize the text into divisions
  • To tag structural features of the text, such as
    lists, tables, quotations, etc.
  • To facilitate navigation within the texts and
    searching within and across texts

Tags-on view of encoding, using XMetaL
7
Structure
  • Simple, e.g. sermons
  • More complex but regular, e.g. drama
  • Partly regular, with some jumbled elements
  • Very complex and/or haphazard
  • Aids for structure include tables of contents,
    running headers, section headings

8
Textual division types
  • Straightforward title page, table of contents,
    act, scene
  • Similar types preface or address to reader?
    license or imprimatur?
  • Conflicting clues
  • Sections entirely in a foreign language
  • Generic part, subpart, section, subsection

9
Some features we tag
  • Opening material salutes, arguments
  • Closing material signatures, dates, datelines,
    postscripts
  • Letters
  • Lists, tables
  • Speakers stage directions
  • Quotations, bibliographic references, epigraphs
  • Notes, milestones

10
and some we dont
  • Non-Roman alphabet Greek, Hebrew
  • Complex mathematical material
  • Music
  • Illegible characters
  • Handwritten material
  • Damaged or missing material

11
Marginal note or milestone?
  • We use milestone for marginal structural
    information
  • Keyers often cant differentiate them from
    marginal notes

12
Part 2 SUBDIV tagging
Complex lists
  • Complex lists

13
Common keying problems
  • I versus J
  • Only one character in Gothic type
  • Confusion has led to J being captured as I
  • Particularly true in italic type
  • U versus V
  • Lower case v looks like upper case U
  • Modern usage creeps in

A list encoded in XMetaL
14
Marginal note or milestone?
  • We use milestone for marginal structural
    information
  • Keyers often cant differentiate them from
    marginal notes

15
Syllogisms
16
(No Transcript)
17
Character-level capture
  • Aim to get it right from the start
  • Detailed keying guidelines available
  • Fast and detailed response to any queries
  • Record of all past queries and responses

18
Examples
  • Ligatures
  • Ampersands
  • Roman numerals
  • Abbreviations
  • Symbols
  • Options ltabbrgtlt/abbrgt fire n oe, ae, ss
    ye

19
A highly abbreviated text
  • Et frater meus celerarius fatebatur hoc esse
    licitum in tempore magne necessitatis /
    alias non allegauit quod Juriste dicunt
    / quod non est licitum in lege necessitas
    facit licitum. Sed ipse dicebat contra
    atque tenebat quod in omni tempore
    per illud dictum s. Jacobi 'Confitemini
    c.' preterea voluit habere homines
    peregre proficiescentes in derisum ob quam
    causam non venit nunc ad memoriam.
  • Et fr meus celerariabus fatebatur hoc esse
    licitu in tabpere magne necessitatis / alias
    no allegauit abquod Iuriste dicut /
    abquod no est licitu in lege necessitas
    facit licitu. Sed ipse dicebat abcontra
    atabque tenebat abquod i oi tabpere
    abper illud dictu s. Iacobi Cofitemini c.
    pterea voluit habere hoies peregre
    proficiscetes in derisum ob qua causam no
    venit nuc ad memoriam.

20
Common keying problems
  • I versus J
  • Only one character in Gothic type
  • Confusion has led to J being captured as I
  • Particularly true in italic type
  • U versus V
  • Lower case v looks like upper case U
  • Modern usage creeps in

21
Problem? A new character.
  • Rare characters not anticipated by the guidelines
  • Characters used only in one book

22
Problem? A character with multiple meanings.
  • z
  • Can represent at least 5 different meanings (m
    dram z yogh 3)
  • Serves as an abbreviation stroke for at least a
    half dozen words and morphemes (e.g. oz. viz.)
  • An open triangle
  • Can mean fire, trine, delta, generic marker

23
Problem? Multiple meanings.
  • Recipe or Responsus, but also?
  • Rotolo
  • Abbreviation
  • By Jupiter?
  • J used for I

24
Problem? Stray characters.
  • Characters that appear from nowhere are a problem
  • Upside down characters are captured the right way
    up we correct the printing error

25
What does this mean for searchers?
  • Be aware of common keying problems (I or J, U or
    V, f or long s)
  • End of line soft hyphens are captured with or
  • Searching for, for example, atque will also
    retrieve occurrences of atabque
  • Non-standard entity references (flower yogh)
    cannot be searched for at present, and are
    displayed as text
  • Be aware of printer errors and idiosyncrasies

26
Further information
  • www.odl.ox.ac.uk/eebo
  • eebo.chadwyck.com/home
  • Contact details
  • eebo_at_sers.ox.ac.uk
  • 44 (0)1865 280026
  • SERS Building, Osney Mead, Oxford, OX2 0ES
Write a Comment
User Comments (0)
About PowerShow.com