Title: Terminology Organization in Terminology Management Systems
1Terminology Organization in Terminology
Management Systems
- Angela Boll, Marina Kaneva, Claudia Himmler,
Chiara Huber, Annika Meinhardt, Patrick Johnson
2COMPILATION OF TERMINOLOGY
- the most practical way to process lexical data is
by computer - benefits speed, flexibility and storage capacity
- growing trend towards the automation of
terminological data processing - from now on, all aspects of terminology
compilation, storage and retrieval will be
assisted by or directly carried out by computers
3PRINCIPLES OF COMPILATION
- automation fundamentally affects the compilation
of terminology - necessity to evolve completely new principles for
compilation
4PRINCIPLES OF COMPILATION
- systematic terminology compilation is now firmly
corpus-based - text corpora reinforce the principle that
terminology compilation is an ongoing and
repeated activity
5PRINCIPLES OF COMPILATION
- many technical texts can now be preserved in or
converted into a suitable format for
terminological analysis - texts which are to be processed by translators
can be analysed and compared with current
machine-readable terminology holdings and a
machine-readable general dictionary in order to
produce a listing of items not contained in
either -
6PRINCIPLES OF COMPILATION
- running text can be used totally independently of
user requirements - terminology compilation is becoming increasingly
text-oriented
7PRINCIPLES OF COMPILATION
- the second major innovation affecting principles
of compilation is the division which is now
possible between - the raw data as they are found in the corpus,
- the database which contains all the information
that is collected in suitably structured form,
and - all the various subsets of information which are
created for specific purposes and uses
8PRINCIPLES OF COMPILATION
9PRINCIPLES OF COMPILATION
- the terminologist now has appropriate tools which
lift his work from a craft to a scientifically
supported activity - automatic processing and computer-assisted
terminology compilation is therefore
qualitatively superior to conventional methods - terminologist is freed from the limitations of
the past with respect to size of individual
records and total quantity of records
10PRINCIPLES OF COMPILATION
- however, there is also a danger private term
collections of individual translators can become
widely known - instead, there should be only one major database
of terminological information for each language
community, to which all users would refer and
contribute - communication across all industrial and
institutional barriers would be facilitated
11The nature and type of terminological information
- Information for the construction of a
terminological record is various and subject to
changes - This affects the nature of database system
- Information in the database must be considered
independent of each other - Information can be entered at different times and
from different sources
12The nature and type of terminological information
- Full bibliographical information for each item is
provided separately - Limitation of human manipulation of lexical data
to the specific interpretative tasks the computer
cannot perform - Concept is explained by indication of linguistic
forms antonyms, broader and narrower generic
terms (refer to a whole class of terms), broader
and narrower partitive terms (relate to a part of
a whole)
13The nature and type of terminological information
- Exemplification of the usage of technical terms
example sentences (context) and usage notes - Terms meaning is semantically more changeable
than items of the general lexicon of a langauge
14The nature and type of terminological information
- In conceptually-based terminological data banks
definitions are given in one language only - Bilingual terminology is directional and
non-reversible gt translation equivalents cannot
be converted into entries of the source language - Translation equivalents do not refer to an
authentic concept because they introduce new
concepts
15Methodological considerations
- Terminologists dont need to be concerned how the
data is stored in the computer thanks to the
modern techniques of computational linguistics - Computer can store a multi-dimensional semantic
network - No physical limitation of the size of any
non-magnetic medium - Definitions can be as long as is necessary to
properly define the term
16Methodological considerations
- Terminology compilation can be distributed
physically and temporally - Information can be collected and stored in stages
- As long as each item of data satisfies the
controls (e.g. bibliographical reference) gt as
much data as available can be entered at any time - Information can be collected on a distributed
basis gt work can be distributed among various
people and locations gt it is particularly
important for the compilation of multilingual
terminology
17Quality of data
- Computer usage for input control and validation
resulted in a trend to terminology of a higher
quality - Increased dangers of spreading terminology of low
quality - Increase in quality is very important
- Far-reaching effect of computerised terminology
processing on terminology spreading
18Quality of data
- Distinction between original source texts and
translated texts - Terms taken from texts in their original language
genuine terms and as such have full validity - Terms taken from translated texts may either be
valid terms or translation equivalents
19Quality of data
- Trend towards the use of genuine original texts
for extraction of terms and contexts - There is no exact match of concepts for many
terms across languages - Several possible equivalents together with
context and usage information are needed for a
correct choice
20Principles of data collection
- Set of basic principles for the compilation of
terminological data - Certain consistency of criteria
- Sources must be stated
- Distinction between original and translated texts
- Linguistic behaviour of terms should be
documented by contexts so that all relevant
textual variants are covered
21Terminological Data Banks-A Definition-
- Automated collection of vocabularies of special
areas that serve a particular user group - Used for large translation services
- Enhanced but still conventional glossaries
transferred to a new medium
22Terminological Data Banks-A Definition-
- Designed to give response to the same questions a
good dictionary is supposed to answer - But these questions only elicit direct responses
from the various parts of the conventional
dictionary
23Terminological Data Banks-A Definition-
- Examples
- ENTRY PART QUESTION
ANSWER - equivalent what is the French word
imprimante - for laser
printer? laser - gender what is the gender of
feminine - imprimante?
24Terminological Data Banks-A Definition-
- These responses are not sufficient for a wide
range of dictionary users - Answers may be ambiguous
- Full potential of a lexical database was not
exploited by existing term banks
25Terminological Data Banks-A Definition-
- Reasons
- Information was not unified in a suitable manner
in order to retrieve it - Lack of coherent structure
- Existing system failed to exploit new and
additional techniques for ordering and
representing the data
26Terminological Data Banks-A Definition-
- There was an increasing demand for a system that
allows to answer complex queries - Example
- QUERY
SEARCH OF FIELD - what do you call a machine
definition or - that performs X?
conceptual links
27Terminological Data Banks-A Definition-
- a collection, stored in a computer, of
special language vocabularies, including
nomenclatures, standardised terms and phrases,
together with the information required for their
identification, which can be used as a mono- or
multilingual dictionary for direct consultation,
as a basis for dictionary production, as a
control instrument for consistency of usage and
term creation and as an ancillary tool in
information and documentation.
28Terminological Data Banks-A Definition-
- Term banks are supposed to be used by people
with varying degrees of expertise and different
purposes
29Semantic Networks
- Complex storage of data to represent
terminological relationships - First developed in artificial intelligence
research for formal representation of the human
knowledge - Have no intrinsic meaning they are basically
directed graphs - They have superficial similarity
30Semantic Networks
31Semantic Networks
- The relationships between concepts are expressed
through abbreviations - Generic relationshipis a type ofisa
- Partitive relationshipis a part of /
consists ofispart- of/has-part - Nodes different concepts
- Arcs labelled links
32Semantic Networks
- A wide variety of relationships between concepts
- To create semantic networks it is necessary to
define a specific number of relationships and a
coherent internal structure - System must allow only one single method of
description for each type of relationship - Networks have to be subject field-specific
33Semantic Networks
- In order to get a perfect result the end-user
poses questions to the system - The fragments are matched against the network
data base - Variable nodes in the fragments are bound to the
value they must have in order to make the match
perfect
34Semantic Networks
- The success of term banks depends on several
factors - The semantics of the network arcs must be
carefully defined - System must be easy to implement and
user-friendly - Danger of over-complicated system that is too
detailed
35Compilation of TerminologyTerminological
information
- What terms are used in a terminological tool?
- The selection of the most effective terms is
assisted by reference to terminological
information which is collected in
dictionaries/glossaries/term banks - Principal factor of effectiveness type and
quality of information
36Terminological information
- International consensus on basic categories for
terminological records - entry term
- a reference number
- a subject field
- a definition
- an indication of the usage
37Terminological information
- Customary to add indication of the sources of the
term(s), definition, context or any foreign
language equivalents - It is up to the user to decide on appropriateness
of terms
38Corpora of raw data containing definitions,
terms, contexts
Source information
origin
type
origin type
origin type
origin type
No.
No. page
No. page
No. page
page
Conceptual Specification
Linguistic Specification
Pragmatic Specification
FL equivalent Specification
language
language
Equiv. term
language
definition
term
context
Grammatical information
Grammatical information
links to other concept
Usage note or example
synonyms
synonyms
scope notes
abbreviation
usage
abbreviation
subject field
variants
usage
variants
date type
date type
date type
date type
pool number
record number
terminologist
Housekeeping information
39Terminological informationBasic data categories
- What information is included in a
multifunctional term record? - Information complex and consists of a number of
subsets which can be compiled and processed quite
separately.
40Terminological informationBasic data categories
- In which categories is the term record
structured? - 1. source information links the term record to
the raw data files - 2. entry term either linguistic item or a label
of a concept, or both - 3. semantic and conceptual specification
definition, a subject attribution, scope notes,
set of links to other concepts - 4. linguistic specification e.g. variants,
abbreviations -
41Terminological informationBasic data categories
- 5. pragmatic specification examples of the
context in which term occurs, usage notes - 6. housekeeping or administrative information
record number, name of terminologist, dates of
first processing, up-dating of the record - 7. foreign language equivalent specification in
translation-orientated databases
42Terminological informationBasic data categories
- Now let us take a closer look on the information
categories - Entry Term
- - most common search item
- - presented in the most relevant form
(e.g. sing. for nouns) - - because the distinction between concept-
or term-orientation - affects the treatment of
homographs/synonyms ? decision, - whether entry term represents concept
or is the linguistic - form
43Terminological informationBasic data categories
- - In concept-orientated term banks primary
importance on the - definition of the concept and all terms
matching the definition - are grouped together ? imposes difficult
choice of the order in which terms are listed - - Exclusive concept orientation (e.g. NORMATERM)
is doable - in mono- and bilingual term banks which deal
with subject fields of similar conceptual
structures -
44Terminological informationBasic data categories
- - For multilingual term banks explanatory notes
are required which indicate in every case the
scope and degree of matching a term with the
concept defined in another language - - Three types of entry
- 1. simple compound or complex terms
- 2. phrases regardless of lexicalisation
- 3. sentences
45Terminological informationBasic data categories
- Conceptual Specification
- Definition
- - first item that links entry term to the
concept - - can be in a style specific to the term
bank, or extracted from - an authoritative source
- - term banks can be classified by the way an
entry is identified - or explained
- - there are two major schools of thought
-
46Terminological informationBasic data categories
- - The first can refer to a definition which is
strictly limited in its validity to the range of
texts which represent the source material for the
term collection - - In the second there is no restricted corpus ?
no single valid definition in the first place
47Terminology informationBasic data categories
- Relationships
- - most controversial and least defined category
of information, it may indicate no more than the
most obvious broader term - - information could be a reference to another
record - Subject Field
- - terminology is divided by subject field
before ordered in another way - - because of the large quantities of terms it
is advisable to introduce a classification of
terms by subject areas -
48Terminological informationBasic data categories
- Scope Note
- - can be considered a further specification
of subject or register - - is intended to indicate a special field in
application - Linguistic specification
- Grammatical Information
- - can consist of spelling, pronunciation,
gender for nouns, - parts of speech (e.g. n, v, adj.),
principal parts of verbs (e.g. infinitive, past)
49Terminology informationBasic data categories
- Language
- - is important in term banks where it is
combined with an - indication of the country where it is used
- Parallel information categories to the entry
term - - has usually no separate record but is listed
in an index with a - reference to the record of the entry term
- - comprises information as spelling, expanded
forms or - reduced forms or synonyms
- - several overlapping categories exist
variants, full synonyms abbreviated forms
50Terminology informationBasic data categories
- Pragmatic specification
- ? Context
- Gives examples of the way that the entry term is
used in a language - Is considered a successful way of showing any
unusual features of wordform, inflection or
collocation - The context should make the definition and the
usage note complete
51Terminology informationBasic data categories
- Usage Note
- - gives information about the way the entry
term is used in context - - cannot be provided in the form of examples of
a real context - e.g. collocational restrictions of formal
variants - colloquial
- slang
- mandatory
- firm-specific
- standardised
- translation
- General language dictionaries further usage
labels as archaic, - informal, taboo, derogatory, offensive,
vulgar but these are - only rarely found in terminology
52Terminology informationBasic data categories
- Quality Label
- - term banks show in many different ways
whether a term is standardised or and whether a
term in a foreign language or borrowed from a
foreign language can be considered established
usage - Synonyms
- terms that differ from the entry term (by usage,
context and subject field) - usually are fully entry terms and represent a
crossreference in the term bank structure
53Terminology informationBasic data categories
- Source Reference Specification
- ? Sources
- printed dictionaries rarely give an indication
of the source - term banks source of every relevant item of
information is recorded - needed for entry term, definition, context,
translation equivalents, possibly also for
synonyms - Can determine the selection criteria according to
which information is collected
54Terminology informationBasic data categories
- Can consist of
- source origin
- ? (reliable sources in the UK
- BSI - British Standard Institute
- CEC - Commission of the European Communities
- HMS - Her Majestys Stationery Office
- ISO - International Organisation for
Standardisation - IEC- International Electrotechnical Commission
- - The origin of a term may be its best
indication of quality and usage. - - detailed reference of the source (e.g. year
of publication ? acceptability of a term)
55Terminology informationBasic data categories
- Source type
- - Indication of the type of document of the
source - ?Article in specialist literature
- ?Contracts and legal usage
- ?Governments circulars to the general public
- ?journalistic publications
- ?manuals
- ?patents
- ?publicity material
- ?research reports
- ?standards
- ?dictionary words ? should be avoided
56Terminology informationBasic data categories
- Sources of definition and contexts
- should show different areas of usage
- Source for the foreign language equivalent
should match the source of the entry term to make
it suitable - ? Source reference code or number
- large databases a separate source reference file
that gives the full bibliographical details for
written sources - databases of raw data reference can be directly
into the different/specific file
57Terminology informationBasic data categories
- Housekeeping information (or administrative
information) - ? Record number
- Consists of a number for the entry, possibly with
some subcategories - Possible subsets of the database can be used to
identify a topic (e.g. the terminology of a
particular product, manual, congress or set of
documents which can cut across subject field
divisions) - Such subsets are often the basis of the database
they are isolated for separate use
58Terminology informationBasic data categories
- ? Author of Record
- for checking the work
- author either terminologist or committee
- ? Date of Record
- date of the production of the first record
- and any up-dates
59Terminological informationMethods of compilation
- ? Methods of compilation
- No fully acknowledged and general methodology
- Terminology compilation must become
user-oriented! - Serious terminology compilation is firmly
corpus-based ? relies on the analysis of textual
evidence - Compilation can be a discontinous process as long
as certain items of information which are
connected and have an effect on each other are
compiled at the same time - Compilation must be seen as an ongoing revision
and up-dating process
60Terminological informationMethods of compilation
- Term banks softwares should provide a facility
for prompting terminologists when building up
terminological records. - some form of expert system is required to control
the work of terminologists - If machines themselves shall be end-users of
terminological databases there must be greater
precision and explicitness of identification in
the compilation of data. - Methods to be applied in the regular compilation
of terminology depend on - 1. nature of data available
- 2. purpose of compilation
61Terminological informationMethods of compilation
- Methods change with the degree of automatic
support available ? rapid advances in the design
of automatic tools - no specific model is possible
- - most cases
- 1. a corpus of text is put together in
machine-readable form (criteria
representativeness, completeness, relevance) - 2. corpus is fully indexed
- 3. terms are isolated and extracted
- 4. terms are sorted automatically and variously
grouped - 5. terms are matched with definitions
62Terminological informationMethods of compilation
- 6. the provisional file is enlarged and
corrected - 7. terms are placed in relationship to other
terms - 8. terms are attributed to particular subject
fields if required - 9. a term record is created which contains
only the term with its linguistic variants - 10. the term record is completed with the
addition of the house-keeping information - - The amount and diversity of data collected
in the term record varies according to the range
of purposes of the data base.
63 IATE (iate.europa.eu)
- IATE ( Inter-Active Terminology for Europe)
- it is the EU inter-institutional terminology
database system - IATE has been used in the EU institutions and
agencies since summer 2004 for the collection,
distribution and shared management of EU-specific
terminology
64About IATE
- EU institutions and agencies involved
- European Commission
- Parliament
- Council
- Court of Justice
- Court of Auditors
- Economic Social Committee
- Committee of the Regions
- European Central Bank
- European Investment Bank
- Translation Centre for the Bodies of
the EU -
65About IATE
- The project was launched in 1999 with the
objective - - to provide a web-based infrastructure for
all EU terminology resources - - enhancing the availability
- - standardisation of the information
66About IATE
- existing terminology databases by the European
Commission, Council, Parliament and Translation
Centre have been imported into IATE - ? single new, highly interactive and
accessible interinstitutional database - ? approximately 1.4 million multilingual
entries
67SDL MultiTerm 2007
- SDL MultiTerm captures, creates, manages and
distributes terminology - Designed for companies (e.g.marketing), which
spend significant time and resources to create
words which position their brand, company and
products to the market
68SDL MultiTerm 2007
- Concept-based terminology management
- Web- and server-based access
- Different search types (e.g. Fuzzy Search)
- Customisable data entry definitions and layouts
- It supports all worldwide languages
- Cross-references to easily link entries to each
other (Unicode)
69SDL MultiTerm 2007
- Benefits
- - delivers accurate and approved terminology
with real-time verification during the
translation process - - quickly builds corporate glossaries
- - improves publication quality
70Bibliography
- Sager, J. 1990. A practical course in terminology
processing. John Benjamins B. V. - Quah. C. K. 2006. Translation and Technology.
Basinstoke (UK). Palgrave Macmillan. - IATE, iate.europa.eu
- SDL MultiTerm 2007, www.sdl.com