Detection of Relations in Textual Documents - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Detection of Relations in Textual Documents

Description:

Detection of Relations in Textual Documents. Manuela Kunze, Dietmar R sner. University of Magdeburg C Knowledge ... PP, case of NP: dative, preposition: `aus' ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 27
Provided by: manuel72
Category:

less

Transcript and Presenter's Notes

Title: Detection of Relations in Textual Documents


1
Detection of Relations in Textual Documents
  • Manuela Kunze,
  • Dietmar Rösner

University of Magdeburg C Knowledge Based Systems
and Document Processing
2
Introduction
http//en.wikipedia.org/wiki/Unsupervised_learning
3
Introduction
  • to extract information from text, you can use
    techniques like simple pattern matching etc.
  • additional knowledge is required
  • 'Thursday' a day of a week
  • meaning of
  • (implicit) open' vs. close'
  • Pay-what-you-wish'
  • text understanding / techniques of NLP
  • Exhibition of over 30 color photographs and
    stories of life in China's Yunnan Province '

4
Introduction
  • ontologies contain information about
  • definition/description of concepts and
  • description of instances
  • kind of relation (name, type),
  • definition of domain and range values,
  • characteristic of the relation cardinality,
    transitivity, ...,

5
Natural Language Processing
  • NLP techniques
  • case frame analysis
  • exploiting syntactic structures
  • corpus-based IE for an initial ontology
  • corpus
  • autopsy protocols (400 protocols)
  • different document parts
  • findings
  • histological findings
  • background
  • discussion
  • short linguistic structures
  • typical attribute-value structures

6
Overview
  • Case Frame
  • Analysis of Specific Syntactic Structures
  • Discussion/Conclusion

7
Case Frames
  • resources
  • results from syntactic parser
  • ltNP TYPE"COMPLEX" RULE"NPC3" GEN"MAS" NUM"SG"
    CAS"NOM"gt       ltNP TYPE"FULL" RULE"NP1"
    CAS"NOM" NUM"SG" GEN"MAS"gt        
    ltNgtFlachschnittlt/Ngt       lt/NPgt       ltPP
    RULE"PP1" CAS"AKK"gt         ltPRP
    CAS"AKK"gtinlt/PRPgt         ltNP TYPE"FULL"
    RULE"NP2" CAS"AKK" NUM"SG" GEN"NTR"gt
              ltDETDgtdaslt/DETDgt          
    ltNgtZungengewebelt/Ngt         lt/NPgt       lt/PPgt
        lt/NPgt
  • results from semantic tagger
  • description of case frames

8
Case Frames
  • (corpus-based) definition of roles for a concept
  • Flachschnitt' (flat cut)
  • location'
  • sem. category tissue'
  • PP, case of NP accusative, preposition in'
  • Herausschleudern' (skidding)
  • patient'
  • sem. category body-hum'
  • NP case of NP genitive
  • location'
  • sem. category vehicle'
  • PP, case of NP dative, preposition aus'

9
Case Frames
  • ltCONCEPT TYPE"medicalOperation"gt        
    ltWORDgtFlachschnittlt/WORDgt        
    ltDESCgtmedizinischer Schnittlt/DESCgt        
    ltSLOTSgt                 ltRELATION
    TYPE"LOCATION"gt                        
    ltASSIGN_TOgtTISSUElt/ASSIGN_TOgt                
            ltFORMgtP(akk, fak, in)lt/FORMgt        
                    ltCONTENTgtin das
    Zungengewebelt/CONTENTgt                
    lt/RELATIONgt         lt/SLOTSgt lt/CONCEPTgt
  • ltCONCEPT TYPE"traffic-event"gt        
    ltWORDgtHerausschleudernlt/WORDgt        
    ltDESCgteventlt/DESCgt         ltSLOTSgt        
            ltRELATION TYPE"PATIENT"gt        
                    ltASSIGN_TOgtBODY-HUMlt/ASSIGN_TOgt
                            ltFORMgtN(gen, fak)lt/FORMgt
                            ltCONTENTgtdes
    Koerperslt/CONTENTgt                 lt/RELATIONgt
                    ltRELATION TYPE"LOCATION"gt
                            ltASSIGN_TOgtVEHICLElt/ASSIG
    N_TOgt                         ltFORMgtP(dat, fak,
    aus)lt/FORMgt                        
    ltCONTENTgtlt/CONTENTgt                 lt/RELATIONgt
            lt/SLOTSgt lt/CONCEPTgt

10
Case Frames
  • coverage of phrases like fracture of elbow
    joint'?
  • abstraction
  • fracture' (sem. category trauma')
  • role patient' sem. category bone'
  • bruise' (sem. category trauma')
  • role patient' sem. category organ'
  • hematoma' (sem. category trauma')
  • role patient' sem. category tissue'
  • concept x (sem. category trauma')
  • role patient' sem. category body-part'

11
Case Frames
  • results
  • relations are defined by the case frame
  • name/type of relation
  • domain, range
  • corpus-based abstractions
  • redefinition of semantic restriction
  • use the least general hypernym as semantic
    restriction
  • not yet extracted
  • information about the characteristic of a relation

12
Overview
  • Case Frame
  • Analysis of Specific Syntactic Structures
  • Discussion/Conclusion

13
Analysis of Specific Syntactic Structures
  • from general to specific information
  • resources
  • results from syntactic parser
  • results from semantic tagger
  • description of interpretation of syntactic
    structures
  • Which word class can be interpreted as
    concept/instance?
  • Which word class describes a relation?
  • adjective in a NP describes the noun in the NP ?
    relation prop
  • negations negate concepts, verbs, or properties
    of a concept
  • particle modification of adjectives

14
Analysis of Specific Syntactic Structures
CLMed? N ADJ
N interpreted as concept ADJ interpreted as
concept
prop(N, ADJ)
results prop_catadj(N,ADJ)
15
Analysis of Specific Syntactic Structures
  • liver tissue bloodless

Steps
  • nouns and adjectives are interpreted as
    concept/instance

liver_tissue
tissue
liver tissue
bloodless
blood concentration
bloodless
concept instance relation
  • adjectives describe a relation
  • in general 'prop'

16
Analysis of Specific Syntactic Structures
  • liver tissue bloodless

ltowlClass rdfID"lebergewebe"gt
ltrdfssubClassOfgtltowlClass rdfID"tissue"/gtlt/rdf
ssubClassOfgtlt/owlClassgt ltowlClass
rdfID"blood-concentration"/gt ltowlClass
rdfID"blutleer"gt ltrdfssubClassOf
rdfresource"blood-concentration"/gtlt/owlClassgt
ltowlObjectProperty rdfID"prop_blood-concentra
tion"gt ltrdfsdomain rdfresource"tissue"/gtltr
dfsrange rdfresource"blood-concentration"/gtlt/o
wlObjectPropertygt ltlebergewebe
rdfID"Lebergewebe_6"gt ltprop_blood-concentrat
iongtltblutleer rdfID"blutleer_7"/gtlt/prop_blood-co
ncentrationgtlt/lebergewebegt
17
Analysis of Specific Syntactic Structures
"kaum wahrnehmbare Unterblutungen" (Engl. "hardly
detectable hematomas")
results of syntactic parser ltNP TYPE"FULL"
RULE"NP4" CAS"_" NUM"PL" GEN"FEM"gt
ltADJP RULE"ADJP1"gt ltADVgtkaumlt/ADVgt
ltADJgtwahrnehmbarelt/ADJgt lt/ADJPgt
ltNgtUnterblutungenlt/Ngt lt/NPgt
  • resources for interpretation
  • N concept/instance
  • ADJ
  • concept/instance
  • rel prop
  • ADV
  • concept/instance
  • rel mod
  • results of semantic tagger
  • kaum' weak-graduation
  • wahrnehmbar' unknown token
  • Unterblutung' trauma

18
Analysis of Specific Syntactic Structures
  • hardly detectable hematomas

Steps
  • nouns, adjectives and adverbs are interpreted as
    concept/instance
  • adjectives and adverbs describe relations

concept instance relation
hematoma
trauma
hematoma
detectable
unspecified
hardly
hardly
weak-graduation
19
Analysis of Specific Syntactic Structures
  • hardly detectable hematomas

ltowlClass rdfID"unterblutung"gtltrdfssubClassOf
rdfresource"trauma"/gtlt/owlClassgt ltowlClass
rdfID"trauma"/gt ltowlClass rdfID"wahrnehmbar"
gt ltrdfssubClassOf rdfresource"unspecified"/gtlt
/owlClassgt ltowlClass rdfID"unspecified"/gt lto
wlClass rdfID"kaum"gt ltrdfssubClassOf
rdfresource"weak-graduation"/gtlt/owlClassgt ltow
lClass rdfID"weak-graduation"/gt
20
Analysis of Specific Syntactic Structures
  • hardly detectable hematomas

ltowlObjectProperty rdfID"mod_weak-graduation"gt
ltrdfsdomain rdfresource"unspecified"/gt ltrdfs
range rdfresource"weak-graduation"/gtlt/owlObje
ctPropertygt ltowlObjectProperty
rdfID"prop_unspecified"gt ltrdfsdomain
rdfresource"trauma"/gt ltrdfsrange
rdfresource"unspecified"/gtlt/owlObjectPropertygt
ltunterblutung rdfID"Unterblutungen_5"gt ltprop
_unspecified rdfresource"wahrnehmbare_4"/gtlt/unt
erblutunggt ltwahrnehmbar rdfID"wahrnehmbare_4"gt
ltmod_weak-graduation rdfresource"kaum_3"/gtlt/w
ahrnehmbargt ltkaum rdfID"kaum_3"gtlt/kaumgt
21
Analysis of Specific Syntactic Structures
  • Phrases like
  • NP ? NP NP
  • NP ? N Adj Conj Adj
  • NP ? N conj N Adj

concept instance relation
Protégé Plugin for Visualization Ontoviz
22
Analysis of Specific Syntactic Structures
  • results
  • definition of concepts/instances
  • corpus-based definition/concretion of relations
  • prop ? prop_catADJ
  • information about domain, relation
  • not extracted
  • information about the characteristic of a
    relation

23
Overview
  • Case Frame
  • Analysis of Specific Syntactic Structures
  • Discussion/Conclusion

24
Conclusion
  • NLP techniques for extraction of information
  • analyse syntactic structures
  • information about semantic categories
  • result corpus-based description of an initial
    ontology
  • case frame analysis
  • relations are described in the case frame
  • disadvantage creation of case frames
  • advantage a definition of the relation
  • analysis specific syntactic structures
  • a general interpretation of tokens and the
    syntactic structures
  • redefined by results from the semantic tagger
  • disadvantage in some case, only the general
    relation definition is delivered
  • advantage less effort to describe the resources

25
Conclusion
  • no information about the characteristic of a
    relation (cardinality, )
  • solutions
  • analyse occurrences in the corpus
  • corpus-based assumption about cardinality
  • integration of additional knowledge
  • initial domain specific ontology

26
Key Aspects for IE
  • conceptual preprocessing steps Names of
    concepts occur in different linguistic
    structures compound vs. complex noun phrase
    (like liver tissue and tissue of liver)
  • handle only one canonical linguistic structure as
    a representative for all paraphrases
  • treatment of generalisation within local contexts
  • The token liver may occur in the first sentence
    of a paragraph. In the next sentences of the
    paragraph, only the hypernym organ is used.
  • concept or instance which term in a linguistic
    structure has to be interpreted as a concept and
    which as an instance of a concept resp.
  • definition of the scope for a concept
  • a paragraph starts with a description of an organ
    (e.g. organ liver in The liver shows ... .
    Bloodrichness of the tissue. ), after this
    follows a description of parts of the organ
    (e.g., Gewebe). In such cases, additional
    knowledge about the domain has to be employed
    (for example, about meronyms or holonyms)
  • tissue part-of liver vs tissue part-of concept X
Write a Comment
User Comments (0)
About PowerShow.com