From thesauri to ontologies: semantic standards for law - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

From thesauri to ontologies: semantic standards for law

Description:

Figures, or social individuals (either agentive or not) are other social objects, ... Typical agentive figures are societies, organizations, and in general all ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 48
Provided by: verwaltu
Category:

less

Transcript and Presenter's Notes

Title: From thesauri to ontologies: semantic standards for law


1
From thesauri to ontologies semantic standards
for law
  • Daniela Tiscornia
  • tiscornia_at_ittig.cnr.it

2
Index of arguments
  • 1. Linguistic barriers
  • 2. Language-dependent approaches
  • - traditional tools (thesauri, keywords)
  • - metadata
  • - lexicons
  • 3. Language-independent tools ontologies
  • 4.The Lois database
  • 5. Conclusions

3
Linguistic barriers
  • hamper
  • access to content for non expert users
  • semantic interoperability in e-government
  • cross-lingual Legal Information Searching
  • commercial Exploitation of Public Sector
    Information

4
References
  • Fellbaum C. (editor), WordNet An electronic
    lexical database, Cambridge, MA The MIT Press,
    1998, 305, downloadable from http//mitpress.mit.
    edu/book-home.tcl?isbn026206197X.
  • Gangemi A., Guarino N., Masolo C., Oltramari, A.
    Sweetening WordNet with DOLCE, AI Magazine 24(3)
    Fall 2003, 13-24Legal Knowledge and Information
    Systems, Proceedings of JURIX Conferences,
    Amsterdam, IOS Press.
  • Gangemi A., Sagri M.T., Tiscornia D., Metadata
    for Content Description in Legal Information,
    Workshop Legal Ontologies, ICAIL2003, Edinburgh.
    In press for Journal of Artificial Intelligence
    and Law, Kluwer.
  • Hirst G., Ontology and the lexicon In Staab,
    Steffen and Studer, Rudi (editors) Handbook on
    Ontologies in Information Systems, Berlin
    Springer, 2003, p.14.

5
Limits of the language-based retrieval tools
  • Terminology vs common language the Italian Code
    on Data Protection doesnt contain the term
    privacy
  • Polisemy the Italian term ordine (order) has 4
    legal senses
  • Cross lingual IR the Italian term diritto means
    right and law.

6
Thesauri
  • Vertical (systematic) Thesauri (e.g.Eurovoc)
  • mono hierarchic tree structure of terms
    interlinked with broad and basic relationship
    (BT,NT,RT) no distinction is made between terms
    and concepts, semantic specification of relations
    is missing.
  • Horizontal thesauri (e.g. The Italgiure semantic
    area) unstructured collection of terms without
    any distinction among words, concepts, types,
    part of speech .

7
Semantic metadata
  • Semantic metadata are expected to support search
    engines for legal information retrieval,
    providing legal knowledge to include into their
    search strategies
  • Conceptual search strategies based on keywords
    are still missing a clear semantics of terms, and
    this does not allow a conceptual query expansion
  • there is no semantic relationship between
    information needs of the user and the information
    content of documents, apart from text pattern
    matching

8
Sense distinction
  • From EU Legislation texts, four senses of
    'worker' are defined
  • any worker as defined in Article 3 (a) of
    Directive 89/391/EEC who habitually uses display
    screen equipment as a significant part of his
    normal work.
  • any person employed by an employer, including
    trainees and apprentices but excluding domestic
    servants
  • any person carrying out an occupation on board a
    vessel, including trainees and apprentices, but
    excluding port pilots and shore personnel
    carrying out work on board a vessel at the
    quayside
  • any person who, in the Member State concerned, is
    protected as an employee under national
    employment law and in accordance with national
    practice
  • The corresponding lexical entry is defined as
    follows
  • a person who works at a specific occupation

9
From words to concepts
  • A semantic theory requires an ontology of all the
    concepts or predicates expressed by the words of
    a language
  • Concepts are organized in structure that
    represent knowledge about the world
  • Lexicons map words to concepts words are
    lexicalizations of a concept a concept can be
    represented by many terms (words or phrases) in
    multiple languages one term can identify several
    concepts
  • A lexicon it is a bridge between a language and
    the knowledge expressed in that language (Sowa
    2000), but it is still language dependent!

10
Describing concepts
  • By Lexical and semantic relations (e.g. Wordnet)
  • By semantic roles among predicates (verbs) and
    their arguments(FrameNet)
  • By properties linked by formal relations
    (ontologies)

11
WordNets Family
  • WordNet (WN)(freeware, American English)
    (Cognitive Science Laboratory Princeton
    University)
  • EuroWordNet (EUW) (proprietary, European
    languages) (ILC Institute of Computational
    Linguistic-Pisa for Italian language)
  • ItalWordNet (IWN)(Italian part of EWN)
    (IRST-ICT-Trento)
  • Jur-(Ital)WordNet (JWN) (C.N.R. Project ITTIG
    Institute of Theory and Techniques for Legal
    Information, ILC, LOA Laboratory of Applied
    Ontology)

12
Frame Net (Fillmore 1997)
  • FrameNet is a frame-semantic description of
    lexical items based on semantically tagged
    corpora. Semantic roles (case roles, thematic
    roles, theta roles) characterize the semantic
    relation that a predicate can have to its
    arguments
  • Mapping between the syntactic constituents of a
    sentence and the frame semantic elements

13
Thematic roles
  • Based on
  • syntactic patterns subj verb object
  • semantic patterns agent action (action or
    state associated with the verb the participants
    the roles of participants)
  • ontological assumptions role as participant
    relation and roles as ontological classes
    (Guarino 2004)

14
Conceptual-oriented vscontext-sensible
representation
  • The traditional 'standardisation oriented' and
    'concept centred' approach, where (ideally) only
    one term is assigned to a concept, has proved to
    fail in cross-lingual conceptualizations
  • Termino-ontographers' need an intermediate
    structure of the dominion, to distinguish
    language-independent concepts and relations from
    concepts and relations which are not (Kerremans
    and Temmerman, 2004)

15
The importance of context
  • It is necessary anchoring of term extraction,
    term definition and inter-term relation
    identification on the contexts of use
  • In law, legislative definitions are contexts
    which have a prescriptive force. This fact
    influences the determination of the number of
    senses of terms, and the equivalence setting
    between legal concepts and lexical concepts

16
Lightweight ontologies
  • Lexicons are considered lightweight ontologies,
    linguistic expansions of the description of a way
    of perceiving reality, with limited formal
    modelling.
  • It is possible that a lexicon with a semantic
    hierarchy might serve as the basis for a useful
    ontology, and that an ontology may serve as a
    grounding for a lexicon. This is particularly the
    case in technical domains, in which vocabulary
    and ontology are more closely tied than in more
    general domains (Hirst 2003).

17
The proposed approach
  • Define a shareable conceptual model based on a
    semantic structure (classes of concepts and of
    semantically constrained relations).
  • Concepts in the model are lexicalized by a
    multilingual lexicon which provide a source of
    legal semantic metadata (e.g. The Lois data
    base),
  • locally and dynamically incremented,
  • integrated by existing resources,
  • to support a semantically structured Google for
    Law.

18
The Lois project
  • The Lois project (EDC 22161) aims at developing
    a multi-language legal thesaurus based on WordNet
    and EuroWordNet technology
  • WordNets lexicons pertain to the class of
    computational lexicons that aim at making word
    content machine-understandable via the highly
    structured semantic representation of concepts.
    These are represented by synsets, a set of all
    the terms expressing the same conceptual area,
    linked by a semantic relation of meaning
    equivalence. A synset is a set of one or more
    uninflected word forms (lemmas) with the same
    part-of-speech that can be interchanged in a
    certain context.
  • Cross-lingual equivalence relations are made
    explicit in the so-called Inter-Lingual-Index
    (ILI). The ILI is the superset of all concepts
    from all wordnets, and the concepts from
    indigenous wordnets are linked into one or more
    ILI records by means of equivalence relations.
  • ILI is an unordered list of concepts, i.e., it
    does not have any internal structuring. The
    reason behind this is that we assume that each
    language imposes its own language specific
    structural constraints on the concepts.
    Therefore, any ordering of ILI concepts needs to
    be retrieved from knowledge bases that link into
    the ILI (or from ontological classification).

19
Lois Architecture
20
Multiple Levels in Legal Language

Philosophy of Law
Lexical Data Base
Judges discourse
EU-National Legal Concept
Legislators language
21
National legal WNs
  • The Lexical Data Base conceptualizes general
    language entities pertaining to legal theory and
    legal dogmatics (structured according to the EWN
    methodology).
  • The Legislative data base (EU-National Legal
    Concept) is populated by concepts defined in
    European and national legislations.

22
LEXDB Lexical DB
  • Lexical legal concepts 1944 ILI records.
  • first nucleus translated from the Italian
    JurWN739 synsets
  • new concepts selected by legal expert provided by
    Universities of Vienna, Evora, Praha and
    Sheffield.

23
The Lexical Data Base
  • synsets are linked by
  • internal relations
  • cross-lingual relations
  • eq_synonym,eq_near_synonym,eq_has_hyperonym,
  • eq-has_hyponym, etc.

Lexical relations (syn, antonym,
near-syn) Semantic relations (Hyper, Hypo, role,
instance, etc.)
24
EULX EU lexical concepts
  • Terminology automatically extracted from EU texts
    which do not occur as explicitly defined
  • The selection has been automated by analysing the
    English EU directives, and extracting salient
    terms, mapping them to WordNet and selecting
    only the ones with one legal meaning in WordNet
  • This process has created an automatic import of
    terms with gloss, and a plug-in synonym relation
    into WordNet

25
EULG EU legal concepts
  • Concepts from EU directives with explicit
    definition, obtained by a process of
    semi-automatic alignment of the EU directives in
    the different languages 2332 ILI records

26
NATLG National legal concepts
  • Concepts defined in national legislation within
    the domain of consumer law or implementing the
    EU legislation in the the domain automatically
    or manually extracted.

27
EU-National Legal Concept
EU-National legal document
National Legislation
ID
Celex Def.s about 2478 concepts
Implemented_as
National Legal Concepts
eq_synonym ,eq_near_synonym,has_hyperonym
28
Kinds of equivalence in Lois
  • 1. Between lexical concepts
  • near-equivalence
  • hypo/hyper-equivalence
  • functional equivalence
  • 2. Between legal concepts
  • legal equivalence

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Consumer Protection Law structuring the domain
(I)
  • Lexical Def. ILI GLOSS - worker_1 a person who
    works at a specific occupation.
  • EU Def.s
  • 8.2005-02-02 worker_2 any person who, in the
    Member State concerned, is protected as an
    employee under national employment law and in
    accordance with national practice
  • 23 2005-02-02 worker_3 any person carrying out
    an occupation on board a vessel, including
    trainees and apprentices, but excluding port
    pilots and shore personnel carrying out work on
    board a vessel at the quayside
  • 22. 2005-02-02 worker 4 any person employed by
    an employer, including trainees and apprentices
    but excluding domestic servants21. 2005-02-02
    worker_5 any worker as defined in Article 3 (a)
    of Directive 89/391/EEC who habitually uses
    display screen equipment as a significant part of
    his normal work.

Has_hyper
Has_hyper
33
Consumer Protection Law structuring the domain
(II)
Implemented-as
EU concept device National concept medical
device National concept active implantable
medical device
Near-syn
Has_hyper
Has_hyper
Has_hyper
34
Consumer Protection Law structuring the domain
(III)
Core Ontology physical object social
object EU concept device National
concept medical device National concept
active implantable medical device
Has_hyper
Has_hyper
35
The Core Legal Ontologyas ordering principle
  • Creating ILI records from WordNET high-level
    concepts.
  • Creating ILI records from the upper concepts of
    the IT-LEXDB linked to the Core Legal Ontology
    (together with their LCO links), used as
    hypernyms in local hierarchies.
  • Link WordNET high-level to CLO categories.

36
Why do we need a core legal ontology as ordering
principle
  • Disadvantages
  • Manually performed
  • Limited improvement of searching capabilities
  • Advantages
  • Aid in harmonizing lexical concepts proposed by
    national legal experts and existing lexical
    resources
  • Added value future use of the lexical resources
    in Semantic Tagging, Information Extraction,
    Ontologies building, Knowledge-Based Systems.

37
Dolce D S and the Core Legal Ontology (CLO)
  • DOLCE (a Descriptive Ontology for Linguistic and
    Cognitive Engineering) is a foundational ontology
    (FO) developed originally in the EU WonderWeb
    project
  • DOLCE, extended by means of the Description
    and Situation(DS). ontology, is suited to
    conceptualize domains (such as Law) that are
    mainly constituted by non Physical (Mental,
    Social) objects.
  • A Description in Dolce DS is a social object,
    which represents a conceptualization.
    Differently from physical objects, social objects
    are dependent on some agentive physical object
    that is able to conceive them. Descriptions have
    typical components, called concepts. Concept is
    also a social object, which is defined by a
    description and can be used in other
    descriptions. Figures, or social individuals
    (either agentive or not) are other social
    objects, defined by descriptions. Typical
    agentive figures are societies, organizations,
    and in general all socially constructed persons.
    (Gangemi et al.2005)

38
Dolce

39
CLO Cathegories
In CLO a norm is a Legal Description which has
components such as a Task (the set of actions
the norm aims to regulate) legal roles (played
by legal subjects involved) and parameters, as
temporal and physical locations. Legal
Descriptions are satisifed by Situations
(Fattispecie) composed by entities pertaining to
real word (Legal Subjects as Persond, Bodies,
etc.) and by Behaviours performed by them.

40
'Translating' legal concept
  • The Italian term contratto is, in terms of CLO
    concepts, a legal description, an information
    content and a physical object (the material
    support of the information content).
  • A legal institution, for instance the Prime
    Minister, is a figure, created by norms, but it
    is also a social role.

41
Comparing WordNet High-level and CLO classes
WN
CLO Artificial Person
Artificial Person .
Social Figure Person
Social concept
Being2
Non-Physical Object Living thing1
Endurant
Object1
Entity Physical entity1
Entity

42
Comparing WordNet High-level and CLO classes
WN CLO Lease
Lease contract
contract written
agreement social description
agreement 1
social concept statement 1
non physical Endurant
message2
Endurant communication2
Entity
abstraction abstract
entity

43
Comparing WordNet High-level and CLO classes
WN CLO Consumer
Consumer User1
Social Role Person
Social Concept Being2
Non-Physical
Endurant Living thing1
Endurant Object1
Entity Physical
entity1 Entity

44
Conclusions (I) the importance of semantic
metadata
  • Structural documentary standards (Legal XML) must
    be integrated with semantic ones for the
    description of content, to achieve a high level
    of semantic interoperability between sectors in
    order to
  • improve communication between areas and services
    of the Public Administration
  • make it possible for the user to access
    information and to make that information
    available for further use by other sections of
    the Public Administration
  • develop easy-to-access tools to incorporate and
    organize the data the users themselves are asked
    to supply.

45
Conclusions(II) the semantic lexicon role
  • A semantic lexicon for law should be a source of
    semantic metadata, shared by multilingual and
    multinational legal information systems.
  • It needs to be based on a common conceptual model
    of legal and world knowledge
  • The Lois project aims at defining a methodology
    to achieve this goal.

46
Conclusions(III) lesson learned
  • One of the main methodological point to be faced
    is the harmonization between
  • lexical and legislative concepts,
  • linguistic and ontological levels
  • domain and world entities
  • and the integration between
  • new and existing resources
  • manual and semi-automatic procedures.

47
References
  • Breuker, J. and Hoekstra, R. (2004) Epistemology
    and ontology in core ontologies exemplified by
    two core ontologies for law FOLaw and LRI-Core.
    In Coront-Wes Ekaw 2004.
  • Gangemi, A., Sagri, M.-T., Tiscornia, D.,
    (2005), A Constructive Framework for Legal
    Ontologies . In Law and the Semantic Web (
    Benjamins, Casanovas, Breuker and Gangemi eds.)
    Springer Verlag, 2005.
  • Gangemi, A., Guarino, N., Masolo, C., Oltramari,
    A., Schneider, L. (2002), Sweetening Ontologies
    with DOLCE. In proceedings of EKAW 2002.
  • Hirst, G. (2004), Ontology and the Lexicon, in
    (Staab and Studer eds.)HAndbook on Ontologies,
    Springer, 2004.
  • Kerremans K. and Temmerman R.(2004) Towards
    Multilingual, Termontological Support in Ontology
    Engineering. In Proceeding of Termino 2004 ,
    workshop on Terminology, (2004).
  • Peters W., M. T. Sagri,Tiscornia D.,The
    Structuring of Legal Knowledge in LOIS, in
    Artificial Intelligence and Law Journal ,
    forthcoming.
  • Vossen, P., Peters, W. and Díez-Orzas, P. (1997),
    The Multilingual design of the EuroWordNet
    Database, in Mahesh, K. (ed.), Ontologies and
    multilingual NLP, Proceedings of IJCAI-97
    workshop, Nagoya, Japan, August 23-29.
Write a Comment
User Comments (0)
About PowerShow.com