Title: Semantic Relation Detection in Bioscience Text
1Semantic Relation Detectionin Bioscience Text
- Marti Hearst
- SIMS, UC Berkeley
- http//biotext.berkeley.edu
- Supported by NSF DBI-0317510 and a gift from
Genentech
2BioText Project Goals
- Provide flexible, intelligent access to
information for use in biosciences applications. - Focus on
- Textual Information from Journal Articles
- Tightly integrated with other resources
- Ontologies
- Record-based databases
3Project Team
- Project Leaders
- PI Marti Hearst
- Co-PI Adam Arkin
- Computational Linguistics
- Barbara Rosario
- Presley Nakov
- Database Research
- Ariel Schwartz
- Gaurav Bhalotia (graduated)
- User Interface / IR
- Adam Newberger
- Dr. Emilia Stoica
- Bioscience
- Dr. TingTing Zhang
- Janice Hamerja
Supported primarily by NSF DBI-0317510 and a
gift from Genentech
4BioText Architecture
Sophisticated Text Analysis
Annotations in Database
Improved Search Interface
5The Nature of Bioscience Text
- Claim
- Bioscience semantics are simultaneously easier
and harder than general text.
easier
harder
Fewer subtleties Fewer ambiguities Systematic
meanings
Enormous terminology Complex sentence structure
6Sample Sentence
-
- Recent research, in proliferating cells, has
demonstrated that interaction of E2F1 with the
p53 pathway could involve transcriptional
up-regulation of E2F1 target genes such as
p14/p19ARF, which affect p53 accumulation
67,68, E2F1-induced phosphorylation of p53
69, or direct E2F1-p53 complex formation 70.
7BioScience Researchers
- Read A LOT!
- Cite A LOT!
- Curate A LOT!
- Are interested in specific relations, e.g.
- What is the role of this protein in that pathway?
- Show me articles in which a comparison between
two values is significant.
8This Talk
- Discovering semantic relations
- Between nouns in noun compounds
- Between entities in sentences
- Acquiring labeled data
- Idea use text surrounding citations to documents
to identify paraphrases - A new direction preliminary work only
9Noun CompoundRelation Recognition
10Noun Compounds (NCs)
- Technical text is rich with NCs
- Open-labeled long-term study of the subcutaneous
sumatriptan efficacy and tolerability in acute
migraine treatment. - NC is any sequence of nouns that itself functions
as a noun - asthma hospitalizations
- health care personnel hand wash
11NCs 3 computational tasks
- Identification
- Syntactic analysis (attachments)
- Baseline headache frequency
- Tension headache patient
- Our Goal Semantic analysis
- Headache treatment ? treatment for
headache - Corticosteroid treatment ? treatment that uses
corticosteroid
12Descent of Hierarchy
- Idea
- Use the top levels of a lexical hierarchy to
identify semantic relations - Hypothesis
- A particular semantic relation holds between all
2-word NCs that can be categorized by a lexical
category pair.
13 Related work (Semantic analysis of NCs)
- Rule-based
- Finin (1980)
- Detailed AI analysis, hand-coded
- Vanderwende (1994)
- automatically extracts semantic information from
an on-line dictionary, manipulates a set of
handwritten rules. 13 classes,
52 accuracy - Probabilistic
- Lauer (1995)
- probabilistic model, 8 classes, 47 accuracy
- Lapata (2000)
- classifies nominalizations into subject/object.
2 classes, 80 accuracy
14 Related work (Semantic analysis of NCs)
- Lexical Hierarchy
- Barrett et al. (2001)
- WordNet, heuristics to classify a NC given the
similarity to a known NC - Rosario and Hearst (2001)
- Relations pre-defined
- MeSH, Neural Network. 18 classes, 60 accuracy
15Linguistic Motivation
- Can cast NC into head-modifier relation, and
assume head noun has an argument and qualia
structure. - (used-in) kitchen knife
- (made-of) steel knife
- (instrument-for) carving knife
- (used-on) putty knife
- (used-by) butchers knife
16The lexical Hierarchy MeSH
-
- 1. Anatomy A
- 2. Organisms B
- 3. Diseases C
- 4. Chemicals and Drugs D
- 5. Analytical, Diagnostic and Therapeutic
Techniques and Equipment E - 6. Psychiatry and Psychology F
- 7. Biological Sciences G
- 8. Physical Sciences H
- 9. Anthropology, Education, Sociology and
Social Phenomena I - 10. Technology and Food and Beverages J
- 11. Humanities K
- 12. Information Science L
- 13. Persons M
- 14. Health Care N
- 15. Geographic Locations Z
17The lexical Hierarchy MeSH
- 1. Anatomy A Body Regions A01
- 2. B
Musculoskeletal System A02 - 3. C Digestive
System A03 - 4. D Respiratory
System A04 - 5. E Urogenital
System A05 - 6. F
- 7. G
- 8. Physical Sciences H
- 9. I
- 10. J
- 11. K
- 12. L
- 13. M
-
18Descending the Hierarchy
- 1. Anatomy A Body Regions A01
Abdomen A01.047 - 2. B
Musculoskeletal System A02 Back
A01.176 - 3. C Digestive
System A03 Breast A01.236 - 4. D Respiratory
System A04 Extremities A01.378
- 5. E Urogenital
System A05 Head A01.456 - 6. F
Neck
A01.598 - 7. G
. - 8. Physical Sciences H
- 9. I
- 10. J
- 11. K
- 12. L
- 13. M
19Descending the Hierarchy
- 1. Anatomy A Body Regions A01
Abdomen A01.047 - 2. B
Musculoskeletal System A02 Back
A01.176 - 3. C Digestive
System A03 Breast A01.236 - 4. D Respiratory
System A04 Extremities A01.378
- 5. E Urogenital
System A05 Head A01.456 - 6. F
Neck
A01.598 - 7. G
. - 8. Physical Sciences H Electronics
- 9. I
Astronomy - 10. J
Nature - 11. K
Time - 12. L
Weights and Measures - 13. M .
20Descending the Hierarchy
- 1. Anatomy A Body Regions A01
Abdomen A01.047 - 2. B
Musculoskeletal System A02 Back
A01.176 - 3. C Digestive
System A03 Breast A01.236 - 4. D Respiratory
System A04 Extremities A01.378
- 5. E Urogenital
System A05 Head A01.456 - 6. F
Neck
A01.598 - 7. G
. - 8. Physical Sciences H Electronics
Amplifiers - 9. I
Astronomy Electronics, Medical - 10. J
Nature Transducers - 11. K
Time - 12. L
Weights and Measures - 13. M .
21Descending the Hierarchy
- 1. Anatomy A Body Regions A01
Abdomen A01.047 - 2. B
Musculoskeletal System A02 Back
A01.176 - 3. C Digestive
System A03 Breast A01.236 - 4. D Respiratory
System A04 Extremities A01.378
- 5. E Urogenital
System A05 Head A01.456 - 6. F
Neck
A01.598 - 7. G
. - 8. Physical Sciences H Electronics
Amplifiers - 9. I
Astronomy Electronics, Medical - 10. J
Nature Transducers - 11. K
Time - 12. L
Weights and Measures Calibration - 13. M .
Metric
System -
Reference Standard
22Descending the Hierarchy
- 1. Anatomy A Body Regions A01
Abdomen A01.047 - 2. B
Musculoskeletal System A02 Back
A01.176 - 3. C Digestive
System A03 Breast A01.236 - 4. D Respiratory
System A04 Extremities A01.378
- 5. E Urogenital
System A05 Head A01.456 - 6. F
Neck
A01.598 - 7. G
. - 8. Physical Sciences H Electronics
Amplifiers - 9. I
Astronomy Electronics, Medical - 10. J
Nature Transducers - 11. K
Time - 12. L
Weights and Measures Calibration - 13. M .
Metric
System -
Reference Standard
Homogeneous
Heterogeneous
23Mapping Nouns to MeSH Concepts
- headache recurrence
- C23.888.592.612.441 C23.550.291.937
- headache pain
- C23.888.592.612.441 G11.561.796.444
24Levels of Description
- headache pain
- Level 0 C.23 G.11
- Level 1 C23.888 G11.561
- Level 1 C23.888.592 G11.561.796
-
- Original C23.888.592.612.441 G11.561.796.444
25Descent of Hierarchy
- Idea
- Words falling in homogeneous MeSH subhierarchies
behave similarly with respect to relation
assignment - Hypothesis
- A particular semantic relation holds between all
2-word NCs that can be categorized by a MeSH
category pairs
26Grouping the NCs
- CP A02 C04 (Musculoskeletal System, Neoplasms)
- skull tumors, bone cysts, bone metastases, skull
osteosarcoma - CP C04 M01 (Neoplasms, Person)
- leukemia survivor, lymphoma patients, cancer
physician, cancer nurses
27Distribution of Category Pairs
28 Collection
- 70,000 NCs extracted from titles and abstracts
of Medline - 2,627 CPs at level 0 (with at least 10 unique
NCs) - We analyzed
- 250 CPs with Anatomy (A)
- 21 CPs with Natural Science (H01)
- 3 CPs with Neoplasm (C04)
- This represents 10 of total CPs and 20 of total
NCs
29 Classification Method
- For each CP
- Divide its NCs into training-testing sets
- Training inspect NCs by hand
- Start from level 0 0
- While NCs are not all similar
- descend one level of the hierarchy
- Repeat until all NCs for that CP are similar
30Classification Decisions
- A02 C04
- B06 B06
- C04 M01
- C04 M01.643
- C04 M01.526
- A01 H01
- A01 H01.770
- A01 H01.671
- A01 H01.671.538
- A01 H01.671.868
- A01 M01
- A01 M01.643
- A01 M01.526
- A01 M01.898
31Classification Decisions Relations
- A02 C04 ? Location of Disease
- B06 B06 ? Kind of Plants
- C04 M01
- C04 M01.643 ? Person afflicted by Disease
- C04 M01.526 ? Person who treats Disease
- A01 H01
- A01 H01.770
- A01 H01.671
- A01 H01.671.538
- A01 H01.671.868
- A01 M01
- A01 M01.643
- A01 M01.526
- A01 M01.898
32Classification Decisions Relations
- A02 C04 ? Location of Disease
- B06 B06 ? Kind of Plants
- C04 M01
- C04 M01.643 ? Person afflicted by Disease
- C04 M01.526 ? Person who treats Disease
- A01 H01
- A01 H01.770
- A01 H01.671
- A01 H01.671.538
- A01 H01.671.868
- A01 M01
- A01 M01.643 ? Person afflicted by Disease
- A01 M01.526
- A01 M01.898
33Classification Decision Levels
- Anatomy 250 CPs
- 187 (75) remain first level
- 56 (22) descend one level
- 7 (3) descend two levels
- Natural Science (H01) 21 CPs
- 1 ( 4) remain first level
- 8 (39) descend one level
- 12 (57) descend two levels
- Neoplasms (C04) 3 CPs
- 3 (100) descend one level
34Evaluation
- Test the decisions on testing set
- Count how many NCs that fall in the groups
defined in the classification decisions are
similar to each other - Accuracy (for 2nd noun)
- Anatomy 91
- Natural Science 79
- Neoplasm 100
- Total Accuracy 90.8
- Generalization our 415 classification decisions
cover 46,000 possible CP pairs
35Ambiguity Two Types
- Lexical ambiguity
- mortality
- state of being mortal
- death rate
- Relationship ambiguity
- bacteria mortality
- death of bacteria
- death caused by bacteria
36Four Cases
Single MeSH senses
Multiple MeSH senses
Only one possible relationship abdomen
radiography, aciclovir treatment
Only one possible relationship alcoholism
treatment
Multiple relationships hospital databases,
education efforts, kidney metabolism
Multiple relationships bacteria mortality
Ambiguity of relationship
37Four Cases
Single MeSH senses
Multiple MeSH senses
Only one possible relationship abdomen
radiography, aciclovir treatment
Only one possible relationship alcoholism
treatment
Multiple relationships bacteria mortality
Multiple relationships hospital databases,
education efforts, kidney metabolism
Most problematic cases
Ambiguity of relationship
but rare!
38Conclusions on NN Relation Classification
- Very simple method for assigning semantic
relations to two-word technical NCs - 90.8 accuracy
- Lexical resource (MeSH) useful for this task
- Probably works because of the relative lack of
ambiguity in this kind of technical text.
39Entity-EntityRelation Recognition
40Problem Which relations hold between 2 entities?
Cure?
Prevent?
Side Effect?
41Hepatitis Examples
- Cure
- These results suggest that con A-induced
hepatitis was ameliorated by pretreatment with
TJ-135. - Prevent
- A two-dose combined hepatitis A and B vaccine
would facilitate immunization programs - Vague
- Effect of interferon on hepatitis B
42Two tasks
- Relationship Extraction
- Identify the several semantic relations that can
occur between the entities disease and treatment
in bioscience text - Entity extraction
- Related problem identify such entities
43The Approach
- Data MEDLINE abstracts and titles
- Graphical models
- Combine in one framework both relation and entity
extraction - Both static and dynamic models
- Simple discriminative approach
- Neural network
- Lexical, syntactic and semantic features
44Related Work
- We allow several DIFFERENT relations between the
same entities - Thus differs from the problem statement of other
work on relations - Many find one relation which holds between two
entities (many based on ACE) - Agichtein and Gravano (2000), lexical patterns
for location of - Zelenko et al. (2002) SVM for person affiliation
and organization-location - Hasegawa et al. (ACL 2004) Person-Organization -gt
President relation - Craven (1999, 2001) HMM for subcellular-location
and disorder-association - Doesnt identify the actual relation
45Related work Bioscience
- Many hand-built rules
- Feldman et al. (2002),
- Friedman et al. (2001)
- Pustejovsky et al. (2002)
- Saric et al. this conference
46Data and Relations
- MEDLINE, abstracts and titles
- 3662 sentences labeled
- Relevant 1724
- Irrelevant 1771
- e.g., Patients were followed up for 6 months
- 2 types of Entities, many instances
- treatment and disease
- 7 Relationships between these entities
47Semantic Relationships
- 810 Cure
- Intravenous immune globulin for recurrent
spontaneous abortion - 616 Only Disease
- Social ties and susceptibility to the common cold
- 166 Only Treatment
- Flucticasone propionate is safe in recommended
doses - 63 Prevent
- Statins for prevention of stroke
48Semantic Relationships
- 36 Vague
- Phenylbutazone and leukemia
- 29 Side Effect
- Malignant mesodermal mixed tumor of the uterus
following irradiation - 4 Does NOT cure
- Evidence for double resistance to permethrin and
malathion in head lice
49Features
- Word
- Part of speech
- Phrase constituent
- Orthographic features
- is number, all letters are capitalized,
first letter is capitalized - MeSH (semantic features)
- Replace words, or sequences of words, with
generalizations via MeSH categories - Peritoneum -gt Abdomen
50Models
- 2 static generative models
- 3 dynamic generative models
- 1 discriminative model (neural network)
51Static Graphical Models
- S1 observations dependent on Role but
independent from Relation given roles - S2 observations dependent on both Relation and
Role
S1
S2
52Dynamic Graphical Models
- D1, D2 as in S1, S2
- D3 only one observation per state is
- dependent on both the relation and the role
-
53Graphical Models
- Relation node
- Semantic relation (cure, prevent, none..)
expressed in the sentence
54Graphical Models
- Role nodes
- 3 choices treatment, disease, or none
55Graphical Models
- Feature nodes (observed)
- word, POS, MeSH
56Graphical Models
- Different dependencies between the features and
the relation nodes
57Graphical Models
- For Dynamic Model D1
- Joint probability distribution over relation,
roles and features nodes - Parameters estimated with maximum likelihood and
absolute discounting smoothing
58Neural Network
- Feed-forward network (MATLAB)
- Training with conjugate gradient descent
- One hidden layer (hyperbolic tangent function)
- Logistic sigmoid function for the output layer
representing the relationships - Same features
- Discriminative approach
59Role extraction
- Results in terms of F-measure
- Graphical models
- Junction tree algorithm (BNT)
- Relation hidden and marginalized over
- Neural Net
- Couldnt run it (features vectors too large)
- (Graphical models can do role extraction and
relationship classification simultaneously)
60Role Extraction Results
- F-measures
- D1 best when no smoothing
61Role Extraction Results
- F-measures
- D2 best with smoothing, but doesnt boost scores
as much as in relation classification
62Role Extraction Results
- Static models better than Dynamic for
- Note No Neural Networks
63Relation classification Results
With Smoothing and Roles, D1 best GM
64Features impact Role Extraction
- Most important features 1)Word, 2)MeSH
- Models D1 D2
- All features 0.67 0.71
- No word 0.58 0.61
- -13.4 -14.1
- No MeSH 0.63 0.65
- -5.9 -8.4
(rel. irrel.)
65Features impact Relation classification
- Most important features Roles
- Accuracy D1 D2
NN - All feat. roles 91.6 82.0
96.9 - All feat. roles 68.9 74.9
79.6 - -24.7 -8.7 -17.8
- All feat. roles Word 91.6 79.8
96.4 - 0 -2.8
-0.5 - All feat. roles MeSH 91.6 84.6 97.3
- 0 3.1
0.4
(rel. irrel.)
66Relation extraction
- Results in terms of classification accuracy (with
and without irrelevant sentences) - 2 cases
- Roles hidden
- Roles given
- Graphical models
- NN simple classification problem
67Relation classification Results
Neural Net always best
68Relation classification Results
With Smoothing and No Roles, D2 best GM
69Relation classification Results
Dynamic models always outperform Static
70Relation classification Results
With no smoothing, D1 best Graphical Model
71Relation classification Confusion Matrix
- Computed for the model D2, rel irrel., only
features
72Features impact Relation classification
- Most realistic case Roles not known
- Most important features 1) Mesh 2) Word for D1
and NN (but vice versa for D2) - Accuracy D1 D2
NN - All feat. roles 68.9 74.9
79.6 - All feat. - roles Word 66.7 66.1 76.2
- -3.3 -11.8 -4.3
- All feat. - roles MeSH 62.7 72.5 74.1
- -9.1 -3.2 -6.9
(rel. irrel.)
73Relation Recognition Conclusions
- Classification of subtle semantic relations in
bioscience text - Discriminative model (neural network) achieves
high classification accuracy - Graphical models for the simultaneous extraction
of entities and relationships - Importance of lexical hierarchy
- Next Step
- Different entities/relations
- Semi-supervised learning to discover relation
types
74Acquiring Labeled Data using Citances
75A discovery is made
A paper is written
76That paper is cited
and cited
and cited
as the evidence for some fact(s) F.
77Each of these in turn are cited for some fact(s)
until it is the case that all important facts
in the field can be found in citation sentences
alone!
78Citances
- Nearly every statement in a bioscience journal
article is backed up with a cite. - It is quite common for papers to be cited 30-100
times. - The text around the citation tends to state
biological facts. (Call these citances.) - Different citances will state the same facts in
different ways - so can we use these for creating models of
language expressing semantic relations?
79Using Citances
- Potential uses of citation sentences (citances)
- creation of training and testing data for
semantic analysis, - synonym set creation,
- database curation,
- document summarization,
- and information retrieval generally.
- Some preliminary results
- Citances to a document align well with a
hand-built curation. - Citances are good candidates for paraphrase
creation.
80Citances for Acquiring Examples of Semantic
Relations
- A relationship type R between entities of type A
and B can be expressed in many ways. - Use citances to build a model the different ways
to express the relationship - Seed learning algorithms with examples that
mention A and B, for which relation R holds. - Train a model to recognize R when the relation is
not known. - Results may extend to sentences that are not
citances as well.
81Issues for Processing Citances
- Text span
- Identification of the appropriate phrase, clause,
or sentence that constructs a citance. - Correct mapping of citations when shown as lists
or groups (e.g., 22-25). - Grouping citances by topic
- Citances that cite the same document should be
grouped by the facts they state. - Normalizing or paraphrasing citances
- For IR, summarization, learning synonyms,
relation extraction, question answering, and
machine translation.
82Related Work
- Traditional citation analysis dates back to the
1960s (Garfield). Includes - Citation categorization,
- Context analysis,
- Citer motivation.
- Citation indexing systems, such as ISIs SCI, and
CiteSeer. - Mercer and Di Marco (2004) propose to improve
citation indexing using citation types. - Bradshaw (2003) introduces Reference Directed
Indexing (RDI), which indexes documents using the
terms in the citances citing them.
83Related Work (cont.)
- Teufel and Moens (2002) identify citances to
improve summarization of the citing paper.. - Nanba et. al. (2000) use citances as features for
classifying papers into topics. - Related field to citation indexing is the use of
link structure and anchor text of Web pages. - Applications include IR, classification, Web
crawlers, and summarization.
84Example protein-protein
85Early resultsParaphrase Creation from Citances
86Sample Sentences
- NGF withdrawal from sympathetic neurons induces
Bim, which then contributes to death. - Nerve growth factor withdrawal induces the
expression of Bim and mediates Bax dependent
cytochrome c release and apoptosis. - The proapoptotic Bcl-2 family member Bim is
strongly induced in sympathetic neurons in
response to NGF withdrawal. - In neurons, the BH3 only Bcl2 member, Bim, and
JNK are both implicated in apoptosis caused by
nerve growth factor deprivation.
87Their Paraphrases
- NGF withdrawal induces Bim.
- Nerve growth factor withdrawal induces the
expression of Bim. - Bim has been shown to be upregulated following
nerve growth factor withdrawal. - Bim implicated in apoptosis caused by nerve
growth factor deprivation. - They all paraphrase
- Bim is induced after NGF withdrawal.
88Paraphrase Creation Algorithm
- 1. Extract the sentences that cite the target.
- 2. Mark the NEs of interest (genes/proteins, MeSH
terms) - and normalize.
- 3. Dependency parse (MiniPar).
- 4. For each parse
- For each pair of NEs of interest
- i. Extract the path between them.
- ii. Create a paraphrase from the path.
- 5. Rank the candidates for a given pair of NEs.
- 6. Select only the ones above a threshold.
- 7. Generalize.
89Creating a Paraphrase
- Given the path from the dependency parse
- Restore the original word order.
- Add words to improve grammaticality.
- Bim shown be following nerve growth factor
withdrawal. - Bim has been shown to be upregulated
following nerve growth factor withdrawal.
902-word Heuristic Demonstration
- NGF withdrawal induces Bim.
- Nerve growth factor withdrawal induces the
expression of Bim. - Bim has been shown to be upregulated
following nerve growth factor withdrawal. - Bim is induced in sympathetic neurons in
response to NGF withdrawal. - member Bim implicated in apoptosis caused by
nerve growth factor deprivation.
91Evaluation (1)
- An influential journal paper from Neuron
- J. Whitfield, S. Neame, L. Paquet, O. Bernard,
and J. Ham. Dominantnegative c-jun promotes
neuronal survival by reducing bim expression and
inhibiting mitochondrial cytochrome c release.
Neuron, 29629643, 2001. - 99 journal papers citing it
- 203 citances in total
- 36 different types of important biological
factoids - But we concentrated on one model sentence
- Bim is induced after NGF withdrawal.
92Evaluation (2)
- Set 1 67 citances pointing to the target paper
and manually found to contain a good or
acceptable paraphrase (do not necessarily contain
Bim or NGF) - (Ideal conditions)
- Set 2 65 citances pointing to the target paper
and containing both Bim and NGF - Set 3 102 sentences from the 99 texts,
containing both Bim and NGF - (Do citances do better than arbitrarily chosen
sentences?)
93Correctness (Judgments)
- Bad (0.0), if
- different relation (often phosphorylation
aspect) - opposite meaning
- vagueness (wording not clear enough).
- Acceptable (0.5), If it was not Bad and
- contains additional terms (e.g., DP5 protein) or
topics (e.g., PPs like in sympathetic neurons) - the relation was suggested but not definitely.
- Else Good (1.0)
-
94Results
- Obtained 55, 65 and 102 paraphrases for sets 1, 2
and 3 - Only one paraphrase from each sentence
- comparison of the dependency path to that of
the model sentence - - good (1.0) or acceptable (0.5)
95Correctness (Recall)
- Calculated on Set 1
- 60 paraphrases (out of 67 citances)
- 5 citances produced 2 paraphrases
- system recall 55/67, i.e. 82.09
- 10 of the 67 relevant in Set 1 initially missed
by the human annotator - 8 good,
- 2 acceptable.
- human recall is 57/67, i.e. 85.07
96Misses
- Sample system miss (no NGF)
- Growth factor withdrawal was shown to cause
increased Bim expression in various populations
of neuronal cell types. - Sample human miss
- The precise targets of c-Jun necessary for the
induction of apoptosis have been the subject of
intense interest and recently, Bim and Dp5, both
BH3-domain only family members, have been
identified as pro-apoptotic genes induced in a
c-Jun-dependent manner in both sympathetic
neurons subjected to NGF withdrawal and in
cerebellar granule cells deprived of KCl.
97Grammaticality
- Missing coordinating and
- Hrk/DP5 Bim have been found to be
upregulated after NGF withdrawal - Verb subcategorization
- caused by NGF role for Bim
- Extra subject words
- member Bim implicated in apoptosis caused by NGF
deprivation - sentence In neurons, the BH3-only Bcl2 member,
Bim, and JNK are both implicated in apoptosis
caused by NGF deprivation.
98Related Work
- Word-level paraphrases. Grefenstette uses a
semantic parser to compare the distributional
similarity of local contexts for synonyms
extraction. - Phrase-level paraphrases. BarzilayMcKeown use
POS information from the local context and
co-training. - Template paraphrases. LinPantel apply the idea
of Grefenstette to dependency tree paths. Later
refined by Shinyamaal. - Sentence-level paraphrases. BarzilayLee use
multiple sequence alignment. Pangal. merge parse
trees into a transducer.
99Relevant Papers
- Citances Citation Sentences for Semantic
Analysis of Bioscience Text, Preslav Nakov, Ariel
Schwartz, and Marti Hearst, in the SIGIR'04
workshop on Search and Discovery in
Bioinformatics. Â - Classifying Semantic Relations in Bioscience
Text, Barbara Rosario and Marti Hearst, in ACL
2004. Â - The Descent of Hierarchy, and Selection in
Relational Semantics, Barbara Rosario, Marti
Hearst, and Charles Fillmore, in ACL 2002.
100Thank you!
- Marti Hearst
- SIMS, UC Berkeley
- http//biotext.berkeley.edu
101Additional slides
102- Thompson et al. 2003
- Frame classification and role
- labeling for FrameNet sentences
- Target word must be observed
- More relations and roles
Our D1
103Smoothing absolute discounting
- Lower the probability of seen events by
subtracting a constant from their count (ML
estimate ) - The remaining probability is evenly divided by
the unseen events -
104F-measures for role extraction in function of
smoothing factors
105Relation accuracies in function of smoothing
factors