Title: Detection of Relations in Textual Documents
1Detection of Relations in Textual Documents
- Manuela Kunze,
- Dietmar Rösner
University of Magdeburg C Knowledge Based Systems
and Document Processing
2Introduction
http//en.wikipedia.org/wiki/Unsupervised_learning
3Introduction
- to extract information from text, you can use
techniques like simple pattern matching etc.
- additional knowledge is required
- 'Thursday' a day of a week
- meaning of
- (implicit) open' vs. close'
- Pay-what-you-wish'
- text understanding / techniques of NLP
- Exhibition of over 30 color photographs and
stories of life in China's Yunnan Province '
4Introduction
- ontologies contain information about
- definition/description of concepts and
- description of instances
- kind of relation (name, type),
- definition of domain and range values,
- characteristic of the relation cardinality,
transitivity, ...,
5Natural Language Processing
- NLP techniques
- case frame analysis
- exploiting syntactic structures
- corpus-based IE for an initial ontology
- corpus
- autopsy protocols (400 protocols)
- different document parts
- findings
- histological findings
- background
- discussion
-
- short linguistic structures
- typical attribute-value structures
6Overview
- Case Frame
- Analysis of Specific Syntactic Structures
- Discussion/Conclusion
7Case Frames
- resources
- results from syntactic parser
- ltNP TYPE"COMPLEX" RULE"NPC3" GEN"MAS" NUM"SG"
CAS"NOM"gt ltNP TYPE"FULL" RULE"NP1"
CAS"NOM" NUM"SG" GEN"MAS"gt
ltNgtFlachschnittlt/Ngt lt/NPgt ltPP
RULE"PP1" CAS"AKK"gt ltPRP
CAS"AKK"gtinlt/PRPgt ltNP TYPE"FULL"
RULE"NP2" CAS"AKK" NUM"SG" GEN"NTR"gt
ltDETDgtdaslt/DETDgt
ltNgtZungengewebelt/Ngt lt/NPgt lt/PPgt
lt/NPgt - results from semantic tagger
- description of case frames
8Case Frames
- (corpus-based) definition of roles for a concept
- Flachschnitt' (flat cut)
- location'
- sem. category tissue'
- PP, case of NP accusative, preposition in'
- Herausschleudern' (skidding)
- patient'
- sem. category body-hum'
- NP case of NP genitive
- location'
- sem. category vehicle'
- PP, case of NP dative, preposition aus'
9Case Frames
-
- ltCONCEPT TYPE"medicalOperation"gt
ltWORDgtFlachschnittlt/WORDgt
ltDESCgtmedizinischer Schnittlt/DESCgt
ltSLOTSgt ltRELATION
TYPE"LOCATION"gt
ltASSIGN_TOgtTISSUElt/ASSIGN_TOgt
ltFORMgtP(akk, fak, in)lt/FORMgt
ltCONTENTgtin das
Zungengewebelt/CONTENTgt
lt/RELATIONgt lt/SLOTSgt lt/CONCEPTgt - ltCONCEPT TYPE"traffic-event"gt
ltWORDgtHerausschleudernlt/WORDgt
ltDESCgteventlt/DESCgt ltSLOTSgt
ltRELATION TYPE"PATIENT"gt
ltASSIGN_TOgtBODY-HUMlt/ASSIGN_TOgt
ltFORMgtN(gen, fak)lt/FORMgt
ltCONTENTgtdes
Koerperslt/CONTENTgt lt/RELATIONgt
ltRELATION TYPE"LOCATION"gt
ltASSIGN_TOgtVEHICLElt/ASSIG
N_TOgt ltFORMgtP(dat, fak,
aus)lt/FORMgt
ltCONTENTgtlt/CONTENTgt lt/RELATIONgt
lt/SLOTSgt lt/CONCEPTgt -
10Case Frames
- coverage of phrases like fracture of elbow
joint'? - abstraction
- fracture' (sem. category trauma')
- role patient' sem. category bone'
- bruise' (sem. category trauma')
- role patient' sem. category organ'
- hematoma' (sem. category trauma')
- role patient' sem. category tissue'
- concept x (sem. category trauma')
- role patient' sem. category body-part'
11Case Frames
- results
- relations are defined by the case frame
- name/type of relation
- domain, range
- corpus-based abstractions
- redefinition of semantic restriction
- use the least general hypernym as semantic
restriction - not yet extracted
- information about the characteristic of a relation
12Overview
- Case Frame
- Analysis of Specific Syntactic Structures
- Discussion/Conclusion
13Analysis of Specific Syntactic Structures
- from general to specific information
- resources
- results from syntactic parser
- results from semantic tagger
- description of interpretation of syntactic
structures - Which word class can be interpreted as
concept/instance? - Which word class describes a relation?
- adjective in a NP describes the noun in the NP ?
relation prop - negations negate concepts, verbs, or properties
of a concept - particle modification of adjectives
14Analysis of Specific Syntactic Structures
CLMed? N ADJ
N interpreted as concept ADJ interpreted as
concept
prop(N, ADJ)
results prop_catadj(N,ADJ)
15Analysis of Specific Syntactic Structures
Steps
- nouns and adjectives are interpreted as
concept/instance
liver_tissue
tissue
liver tissue
bloodless
blood concentration
bloodless
concept instance relation
- adjectives describe a relation
- in general 'prop'
16Analysis of Specific Syntactic Structures
ltowlClass rdfID"lebergewebe"gt
ltrdfssubClassOfgtltowlClass rdfID"tissue"/gtlt/rdf
ssubClassOfgtlt/owlClassgt ltowlClass
rdfID"blood-concentration"/gt ltowlClass
rdfID"blutleer"gt ltrdfssubClassOf
rdfresource"blood-concentration"/gtlt/owlClassgt
ltowlObjectProperty rdfID"prop_blood-concentra
tion"gt ltrdfsdomain rdfresource"tissue"/gtltr
dfsrange rdfresource"blood-concentration"/gtlt/o
wlObjectPropertygt ltlebergewebe
rdfID"Lebergewebe_6"gt ltprop_blood-concentrat
iongtltblutleer rdfID"blutleer_7"/gtlt/prop_blood-co
ncentrationgtlt/lebergewebegt
17Analysis of Specific Syntactic Structures
"kaum wahrnehmbare Unterblutungen" (Engl. "hardly
detectable hematomas")
results of syntactic parser ltNP TYPE"FULL"
RULE"NP4" CAS"_" NUM"PL" GEN"FEM"gt
ltADJP RULE"ADJP1"gt ltADVgtkaumlt/ADVgt
ltADJgtwahrnehmbarelt/ADJgt lt/ADJPgt
ltNgtUnterblutungenlt/Ngt lt/NPgt
- resources for interpretation
- N concept/instance
- ADJ
- concept/instance
- rel prop
- ADV
- concept/instance
- rel mod
- results of semantic tagger
- kaum' weak-graduation
- wahrnehmbar' unknown token
- Unterblutung' trauma
18Analysis of Specific Syntactic Structures
- hardly detectable hematomas
Steps
- nouns, adjectives and adverbs are interpreted as
concept/instance
- adjectives and adverbs describe relations
concept instance relation
hematoma
trauma
hematoma
detectable
unspecified
hardly
hardly
weak-graduation
19Analysis of Specific Syntactic Structures
- hardly detectable hematomas
ltowlClass rdfID"unterblutung"gtltrdfssubClassOf
rdfresource"trauma"/gtlt/owlClassgt ltowlClass
rdfID"trauma"/gt ltowlClass rdfID"wahrnehmbar"
gt ltrdfssubClassOf rdfresource"unspecified"/gtlt
/owlClassgt ltowlClass rdfID"unspecified"/gt lto
wlClass rdfID"kaum"gt ltrdfssubClassOf
rdfresource"weak-graduation"/gtlt/owlClassgt ltow
lClass rdfID"weak-graduation"/gt
20Analysis of Specific Syntactic Structures
- hardly detectable hematomas
ltowlObjectProperty rdfID"mod_weak-graduation"gt
ltrdfsdomain rdfresource"unspecified"/gt ltrdfs
range rdfresource"weak-graduation"/gtlt/owlObje
ctPropertygt ltowlObjectProperty
rdfID"prop_unspecified"gt ltrdfsdomain
rdfresource"trauma"/gt ltrdfsrange
rdfresource"unspecified"/gtlt/owlObjectPropertygt
ltunterblutung rdfID"Unterblutungen_5"gt ltprop
_unspecified rdfresource"wahrnehmbare_4"/gtlt/unt
erblutunggt ltwahrnehmbar rdfID"wahrnehmbare_4"gt
ltmod_weak-graduation rdfresource"kaum_3"/gtlt/w
ahrnehmbargt ltkaum rdfID"kaum_3"gtlt/kaumgt
21Analysis of Specific Syntactic Structures
- Phrases like
- NP ? NP NP
- NP ? N Adj Conj Adj
- NP ? N conj N Adj
concept instance relation
Protégé Plugin for Visualization Ontoviz
22Analysis of Specific Syntactic Structures
- results
- definition of concepts/instances
- corpus-based definition/concretion of relations
- prop ? prop_catADJ
- information about domain, relation
- not extracted
- information about the characteristic of a
relation
23Overview
- Case Frame
- Analysis of Specific Syntactic Structures
- Discussion/Conclusion
24Conclusion
- NLP techniques for extraction of information
- analyse syntactic structures
- information about semantic categories
- result corpus-based description of an initial
ontology - case frame analysis
- relations are described in the case frame
- disadvantage creation of case frames
- advantage a definition of the relation
- analysis specific syntactic structures
- a general interpretation of tokens and the
syntactic structures - redefined by results from the semantic tagger
- disadvantage in some case, only the general
relation definition is delivered - advantage less effort to describe the resources
25Conclusion
- no information about the characteristic of a
relation (cardinality, ) - solutions
- analyse occurrences in the corpus
- corpus-based assumption about cardinality
- integration of additional knowledge
- initial domain specific ontology
26Key Aspects for IE
- conceptual preprocessing steps Names of
concepts occur in different linguistic
structures compound vs. complex noun phrase
(like liver tissue and tissue of liver) - handle only one canonical linguistic structure as
a representative for all paraphrases - treatment of generalisation within local contexts
- The token liver may occur in the first sentence
of a paragraph. In the next sentences of the
paragraph, only the hypernym organ is used. - concept or instance which term in a linguistic
structure has to be interpreted as a concept and
which as an instance of a concept resp. - definition of the scope for a concept
- a paragraph starts with a description of an organ
(e.g. organ liver in The liver shows ... .
Bloodrichness of the tissue. ), after this
follows a description of parts of the organ
(e.g., Gewebe). In such cases, additional
knowledge about the domain has to be employed
(for example, about meronyms or holonyms) - tissue part-of liver vs tissue part-of concept X