Digital encoding of text - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Digital encoding of text

Description:

Translations into Latin and 3 modern languages ... Alignment of all 16 transcriptions and translations: understanding through comparison ... – PowerPoint PPT presentation

Number of Views:507
Avg rating:3.0/5.0
Slides: 15
Provided by: matija4
Category:

less

Transcript and Presenter's Notes

Title: Digital encoding of text


1
Digital encoding of text
  • Toma Erjavec

2
Scholarly digital editions of Slovenian
literature http//nl.ijs.si/e-zrc/
  • Content provider Institute of Slovenian
    Literature Scientific research centre of the
    Slovenian Academy of Sciences and Arts, Ljubljana
  • Technology providerDepartment of Knowledge
    TechnologiesJoef Stefan Institute, Ljubljana

3
Freising Manuscripts (FM)
  • Three religious texts
  • FM I a confession form
  • FM II a homily on penitence and remission
  • FM III a confession form
  • Provenance Upper Carinthia or Freising(Austria,
    Germany)
  • Place of use Carinthian estates of the Freising
    diocese
  • Written after 27 May, 972 not after 1023

4
The history of the Freising Manuscripts
  • Discovered by B. J. Docen in 1806 in the Munich
    State Library
  • Many printed editions since then
  • First diplomatic transcription 1827 by P. Köppen
    A. H. Vostokov, Sanktpeterburg
  • ? Critical edition by Slovenian Academy of
    Sciences 1992, 1993, 2004

5
The printed edition 2004 our source,
containing
  • Diplomatic transcription with apparatus,
    comparing 9 older DT
  • Critical transcription with apparatus,comparing
    13 older CT
  • Phonetic transcription in IPA, with apparatus
  • Translations into Latin and 3 modern languages
  • Dictionary of all words in the CT, with PT, the 4
    translations Old Church Slavonic, and examples
    (concordances)
  • Bibliography, with 600 items
  • Introductions

6
The goal of e-edition to gather the 200-years
history of FM editions
  • Annotated text of all major transcriptions so
    farthe history of understanding
  • Alignment of all 16 transcriptions and
    translationsunderstanding through comparison
  • Sound recording added to phonetic
    transcriptionunderstanding through experiencing
  • Addition of translations Polish, Italian
    understanding for non-Slovenian speakers
  • Integration of materialsunderstanding for all

7
Production of the e-edition
  • Electronic original a local editor format or
    re-keyed Word files
  • Conversion dedicated Perl and XSLT filters
  • Target format the Text Encoding Initiative
    Guidelines P4
  • View format XSLT transform into HTML
  • Rapid prototyping and a cyclical process of
    refinement

8
Challenging issues
  • Complex characters, e.g. (ZRCola font
    http//zrcola.zrc-sazu.si/)
  • Adding speech into the e-edition(manual
    segmentation, errors in the originals, inserting
    phrase sentence boundaries into parallel views)
  • Dictionary conversion(idiosyncratic format,
    complex structure, difficult cross-references)

9
ExamplesThe TEI encoded phonetic transcription
10
BS Dictionary
11
BS Bibliography
12
BS basic parallel view
13
Further work in finishing the BS eEdition
  • TEI header (Slovene English, also HTML view)
  • Better treatment of PUA characters(documented in
    header, fallback)
  • Resolving outstanding content issues
  • Better overall structure and linking

14
Further workgeneral goals
  • Incorporating language technologies into the
    eEditions (concordancing, lemmatisation,
    part-of-speech tagging)
  • Adaptable Web interface for viewing (select what
    and how to see corrections, emendations, notes,
    facsimile)
  • Accessing and connecting the e-library as a whole
    (cataloguing, searching)
Write a Comment
User Comments (0)
About PowerShow.com