Representing dictionaries with the TEI - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Representing dictionaries with the TEI

Description:

superEntry to group sets of homographs ... cit /dicteg ... dicteg cit q Ta gamine est assise trop oRef/ , elle ne d passe pas de la table. /q ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 25
Provided by: utili131
Category:

less

Transcript and Presenter's Notes

Title: Representing dictionaries with the TEI


1
Representing dictionaries with the TEI
  • Proposal for basic guidelines
  • Laurent Romary - Max Planck Digital Library
  • With the help of Susanne Alt - CNRS

2
Background
  • The P5 edition of the TEI guidelines
  • XML
  • ODD - Roma
  • Modules and classes
  • DTD, RelaxNG, W3C schemas
  • The dictionary chapter
  • Very close to the P4 version
  • Work to be done
  • Enhancing the coherence with the class system
  • Providing more examples

3
Proposal for today
  • Browse through the main features of the
    dictionary chapter
  • Identify questionable issues
  • Select best practices
  • Work with Roma and implement (part of) the best
    practices
  • Minimal schema that dictionary project can start
    with
  • Bottom approach to customization
  • Discuss about conformance

4
Dictionaries as TEI documents
  • Same general document structure as any other TEI
    document
  • ltteiHeadergt, lttextgt
  • Define a common strategy concerning source
    identification with general text sources
  • Specific documentation of previous editions
  • Intuition that ltteiCorpusgt is not to be retained
    here
  • ltfrontgt, ltbodygt, ltbackgt
  • Divisions
  • Strong case for unnumbered ltdivgts
  • Can we recommend/implement a basic dictionary
    oriented typology?

5
Issues
  • see Wuerzburg.xml
  • Providing precise guidelines for
  • ltpublicationStmtgt
  • Elicit the role and possible content of
    ltpublishergt
  • ltsourceDescgt
  • Base the guidelines on ltbiblStructgt (ltbiblItemgt?)
    and ltlistBiblgt

6
Describing dictionary entries
  • A variety of possible objects
  • ltentrygt, ltentryFreegt ltsuperEntrygt, ltdictScrapgt
  • lthomgt, ltregt
  • First issue dealing with the editorial workflow
  • Keep ltdictScrapgt for ongoing tagging activity
  • depends on the degree of structure of the
    dictionary
  • Stay consistent in the use of entry/entryFree/supe
    rEntry/hom
  • Strong feeling for limiting ourselves to ltentrygt
  • Point to the importance of ltregt
  • Embedded entries

7
Finding the right granularity
  • The core lexical unit ltentrygt
  • Should be used coherently in a dictionary project
    to gather up homogenous lexical objects
  • Possible combination with
  • ltsuperEntrygt to group sets of homographs
  • Should only be used to record such a feature when
    it exists in legacy data
  • Should be avoided for new editorial projects
  • lthomgt to subdivide senses in groups of homonyms

8
Example
  • Recording a series of homographs with
    ltsuperEntrygt
  • ltbodygt
  • ltentry/gt
  • ltentry/gt
  • ltsuperEntrygt
  • ltentry type"hom" n"1"/gt
  • ltentry type"hom" n"2"/gt
  • lt/superEntrygt
  • lt/bodygt
  • Issues
  • Values of n attribute according to the source
  • Values of type defined in att.entryLike

9
Example
  • Recording a series of homographs with lthomgt
  • ltentrygt
  • lthom n"1"gt
  • ltsense n"1"/gtltsense n"2"/gt
  • lt/homgt
  • lthom n"2"gt
  • ltsense n"1"/gtltsense n"2"/gtltsense n"3"/gt
  • lt/homgt
  • lt/entrygt
  • Issues
  • Weak boundary between polysemes and homonyms
  • Why not just have separate entries?

10
From word to senses
  • Background
  • Semasiological vs. onomasiological views on
    lexical data
  • Two complementary data organisations
  • Two sets of standards
  • In ISO TMF (ISO 16642) vs. LMF
  • In the TEI Terminology vs. Print dictionary
    chapters

11
The LMF Model
Lexical DB
1..1
1..1
1..1
0..n
Global Info
Lexical Entry
1..1
1..1
0..n
1..1
0..n
Form
Sense
1..1
12
Consequences for dictionaries
  • Strong ltformgt to ltsensegt orientation
  • ltformgt qualifies the entry, with the
    identification of the headword and its
    morphological variations
  • ltsensegt is subordinated to the choice made for
    ltformgt
  • Role of grammatical information
  • Overall qualification of the entry
  • Qualification of morphological variants
  • Issue
  • ltregt does not necessarily fit into the theory

13
Example
  • Basic structure of an ltentrygt
  • ltentrygt
  • ltformgt
  • ltorthgtchatlt/orthgt
  • lt/formgt
  • ltsensegt
  • ltdefgtPetit animal familierlt/defgt
  • lt/sensegt
  • lt/entrygt

14
Representing form and grammar
  • General issues
  • Multiple forms
  • ltorthgt, ltprongt, etc.
  • Compounds
  • May be represented using embedded forms
  • Role of grammar (ltgramGrpgt)
  • In isolation qualifies the entry
  • Within a form marks special features associated
    with the form
  • Inflexions
  • Can be reprensented by means of additional
    ltformgts

15
Example
  • A simple entry
  • ltentrygt
  • ltformgt
  • ltorthgtchatlt/orthgt
  • ltprongt?alt/prongt
  • lt/formgt
  • ltgramGrpgt
  • ltposgtNlt/posgt
  • ltgengtfltgengt
  • lt/gramGrpgt
  • lt/entrygt

16
Example
  • Simple entry with inflected form
  • ltentrygt
  • ltform typelemmagt
  • ltorthgtchatlt/orthgt
  • lt/formgt
  • ltgramGrpgt
  • ltposgtNlt/posgt
  • ltgengtmlt/gengt
  • lt/gramGrpgt
  • ltform typeinflectedgt
  • ltorthgtchatslt/orthgt
  • ltgramGrpgt
  • ltnumbergtplt/numbergt
  • lt/gramGrpgt
  • lt/formgt
  • lt/entrygt

17
ltformgt the case of the Campe dictionary
  • Step 1 Dealing with the presence of determiners
  • ltform typelemmagt
  • ltform typedeterminergt
  • ltorthgtDaslt/orthgt
  • lt/formgt
  • ltform typeheadwordgt
  • ltorthgtAaklt/orthgt
  • lt/formgt
  • lt/formgt

18
ltformgt the case of the Campe dictionary
  • Step 2 adding grammatical information
  • ltform typelemmagt
  • ltform typedeterminergt
  • ltorthgtDaslt/orthgt
  • ltgramGrpgt
  • ltpos valueD/gt
  • ltgengtnlt/gengt
  • lt/gramGrpgt
  • lt/formgt
  • ltform typeheadwordgt
  • ltorthgtAaklt/orthgt
  • ltgramGrpgt
  • ltposgtNlt/posgt
  • ltgengtnlt/gengt
  • lt/gramGrpgt
  • lt/formgt
  • lt/formgt

19
ltformgt the case of the Campe dictionary
  • Step 3 dealing with inflected forms
  • ltform typeinflectedgt
  • ltform typedeterminergt
  • ltorthgtdeslt/orthgt
  • ltgramGrpgtlt/gramGrpgt
  • lt/formgt
  • ltform typeheadwordgt
  • ltorthgtltoVargtltoRef/gt-eslt/oVargtlt/orthgt
  • ltgramGrpgt
  • ltcase valueGgtGlt/casegt
  • lt/gramGrpgt
  • lt/formgt
  • lt/formgt

20
Main arguments for the proposed changes
  • Coherent use of ltformgt and ltorthgt
  • Accounts for a coherent access to orthographic
    information in form/orth
  • Coherent use of grammatical features
  • Danger of tag abuse with
  • ltgram typeart_ngtDaslt/gramgt
  • type attribute should indicate a grammatical
    feature
  • ltgramgt content should be the value of that
    feature
  • Non differentiation of features (art_n -gt pos
    gen)

21
ltsensegt main components
  • Core elements
  • ltdefgt to provide the definition
  • ltdicteggt
  • Need to establish guidelines on the
    identification of sources
  • ltetymgt a complex issue

22
Documentation des exemples
ltdicteggtltqgtTa gamine est assise trop ltoRef/gt,
elle ne dépasse pas de la table.lt/qgtlt/dicteggt
ltdicteggtltcitgt ltqgtTa gamine est assise trop
ltoRef/gt, elle ne dépasse pas de la
table.lt/qgt ltbiblgtBenoit M., Michel C., Le Parler
de Metz...lt/biblgt lt/citgtlt/dicteggt
ltdicteggt ltcitgt ltqgtTa gamine est assise trop
ltoRef/gt, elle ne dépasse pas de la
table.lt/qgt ltbiblStructgt ltauthorgtBENOIT M,
MICHEL C.lt/authorgt lttitlegtLe Parler de Metz et
du pays messinlt/titlegt ltimprintgt ltpubPlacegtMe
tzlt/pubPlacegt ltpublishergtSerpenoiselt/publishergt
ltdategt2001lt/dategt ltbiblScopegtp.
38lt/biblScopegt lt/imprintgt lt/biblStructgt lt/citgt
lt/dicteggt
23
A quick glimpse into Roma
  • A journey in three steps
  • Adding the PD module and generating a schema
  • Checking out elements
  • Expressing constraints on specific values

24
Final discussion
  • What is it, being TEI conformant?
Write a Comment
User Comments (0)
About PowerShow.com