Semantic Relation Detection in Bioscience Text

About This Presentation
Title:

Semantic Relation Detection in Bioscience Text

Description:

9. [I] Astronomy Electronics, Medical. 10. [J] Nature Transducers. 11. [K] Time ... Only one possible relationship: abdomen radiography, aciclovir treatment ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Semantic Relation Detection in Bioscience Text


1
Semantic Relation Detectionin Bioscience Text
  • Marti Hearst
  • SIMS, UC Berkeley
  • http//biotext.berkeley.edu
  • Supported by NSF DBI-0317510 and a gift from
    Genentech

2
BioText Project Goals
  • Provide flexible, intelligent access to
    information for use in biosciences applications.
  • Focus on
  • Textual Information from Journal Articles
  • Tightly integrated with other resources
  • Ontologies
  • Record-based databases

3
Project Team
  • Project Leaders
  • PI Marti Hearst
  • Co-PI Adam Arkin
  • Computational Linguistics
  • Barbara Rosario
  • Presley Nakov
  • Database Research
  • Ariel Schwartz
  • Gaurav Bhalotia (graduated)
  • User Interface / IR
  • Adam Newberger
  • Dr. Emilia Stoica
  • Bioscience
  • Dr. TingTing Zhang
  • Janice Hamerja

Supported primarily by NSF DBI-0317510 and a
gift from Genentech
4
BioText Architecture
Sophisticated Text Analysis
Annotations in Database
Improved Search Interface
5
The Nature of Bioscience Text
  • Claim
  • Bioscience semantics are simultaneously easier
    and harder than general text.

easier
harder
Fewer subtleties Fewer ambiguities Systematic
meanings
Enormous terminology Complex sentence structure
6
Sample Sentence
  • Recent research, in proliferating cells, has
    demonstrated that interaction of E2F1 with the
    p53 pathway could involve transcriptional
    up-regulation of E2F1 target genes such as
    p14/p19ARF, which affect p53 accumulation
    67,68, E2F1-induced phosphorylation of p53
    69, or direct E2F1-p53 complex formation 70.

7
BioScience Researchers
  • Read A LOT!
  • Cite A LOT!
  • Curate A LOT!
  • Are interested in specific relations, e.g.
  • What is the role of this protein in that pathway?
  • Show me articles in which a comparison between
    two values is significant.

8
This Talk
  • Discovering semantic relations
  • Between nouns in noun compounds
  • Between entities in sentences
  • Acquiring labeled data
  • Idea use text surrounding citations to documents
    to identify paraphrases
  • A new direction preliminary work only

9
Noun CompoundRelation Recognition
10
Noun Compounds (NCs)
  • Technical text is rich with NCs
  • Open-labeled long-term study of the subcutaneous
    sumatriptan efficacy and tolerability in acute
    migraine treatment.
  • NC is any sequence of nouns that itself functions
    as a noun
  • asthma hospitalizations
  • health care personnel hand wash

11
NCs 3 computational tasks
  • Identification
  • Syntactic analysis (attachments)
  • Baseline headache frequency
  • Tension headache patient
  • Our Goal Semantic analysis
  • Headache treatment ? treatment for
    headache
  • Corticosteroid treatment ? treatment that uses

    corticosteroid

12
Descent of Hierarchy
  • Idea
  • Use the top levels of a lexical hierarchy to
    identify semantic relations
  • Hypothesis
  • A particular semantic relation holds between all
    2-word NCs that can be categorized by a lexical
    category pair.

13
Related work (Semantic analysis of NCs)
  • Rule-based
  • Finin (1980)
  • Detailed AI analysis, hand-coded
  • Vanderwende (1994)
  • automatically extracts semantic information from
    an on-line dictionary, manipulates a set of
    handwritten rules. 13 classes,
    52 accuracy
  • Probabilistic
  • Lauer (1995)
  • probabilistic model, 8 classes, 47 accuracy
  • Lapata (2000)
  • classifies nominalizations into subject/object.
    2 classes, 80 accuracy

14
Related work (Semantic analysis of NCs)
  • Lexical Hierarchy
  • Barrett et al. (2001)
  • WordNet, heuristics to classify a NC given the
    similarity to a known NC
  • Rosario and Hearst (2001)
  • Relations pre-defined
  • MeSH, Neural Network. 18 classes, 60 accuracy

15
Linguistic Motivation
  • Can cast NC into head-modifier relation, and
    assume head noun has an argument and qualia
    structure.
  • (used-in) kitchen knife
  • (made-of) steel knife
  • (instrument-for) carving knife
  • (used-on) putty knife
  • (used-by) butchers knife

16
The lexical Hierarchy MeSH
  • 1. Anatomy A
  • 2. Organisms B
  • 3. Diseases C
  • 4. Chemicals and Drugs D
  • 5. Analytical, Diagnostic and Therapeutic
    Techniques and Equipment E
  • 6. Psychiatry and Psychology F
  • 7. Biological Sciences G
  • 8. Physical Sciences H
  • 9. Anthropology, Education, Sociology and
    Social Phenomena I
  • 10. Technology and Food and Beverages J
  • 11. Humanities K
  • 12. Information Science L
  • 13. Persons M
  • 14. Health Care N
  • 15. Geographic Locations Z

17
The lexical Hierarchy MeSH
  • 1. Anatomy A Body Regions A01
  • 2. B
    Musculoskeletal System A02
  • 3. C Digestive
    System A03
  • 4. D Respiratory
    System A04
  • 5. E Urogenital
    System A05
  • 6. F
  • 7. G
  • 8. Physical Sciences H
  • 9. I
  • 10. J
  • 11. K
  • 12. L
  • 13. M

18
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H
  • 9. I
  • 10. J
  • 11. K
  • 12. L
  • 13. M

19
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H Electronics
  • 9. I
    Astronomy
  • 10. J
    Nature
  • 11. K
    Time
  • 12. L
    Weights and Measures
  • 13. M .

20
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H Electronics
    Amplifiers
  • 9. I
    Astronomy Electronics, Medical
  • 10. J
    Nature Transducers
  • 11. K
    Time
  • 12. L
    Weights and Measures
  • 13. M .

21
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H Electronics
    Amplifiers
  • 9. I
    Astronomy Electronics, Medical
  • 10. J
    Nature Transducers
  • 11. K
    Time
  • 12. L
    Weights and Measures Calibration
  • 13. M .
    Metric
    System


  • Reference Standard

22
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H Electronics
    Amplifiers
  • 9. I
    Astronomy Electronics, Medical
  • 10. J
    Nature Transducers
  • 11. K
    Time
  • 12. L
    Weights and Measures Calibration
  • 13. M .
    Metric
    System


  • Reference Standard

Homogeneous
Heterogeneous
23
Mapping Nouns to MeSH Concepts
  • headache recurrence
  • C23.888.592.612.441 C23.550.291.937
  • headache pain
  • C23.888.592.612.441 G11.561.796.444

24
Levels of Description
  • headache pain
  • Level 0 C.23 G.11
  • Level 1 C23.888 G11.561
  • Level 1 C23.888.592 G11.561.796
  • Original C23.888.592.612.441 G11.561.796.444

25
Descent of Hierarchy
  • Idea
  • Words falling in homogeneous MeSH subhierarchies
    behave similarly with respect to relation
    assignment
  • Hypothesis
  • A particular semantic relation holds between all
    2-word NCs that can be categorized by a MeSH
    category pairs

26
Grouping the NCs
  • CP A02 C04 (Musculoskeletal System, Neoplasms)
  • skull tumors, bone cysts, bone metastases, skull
    osteosarcoma
  • CP C04 M01 (Neoplasms, Person)
  • leukemia survivor, lymphoma patients, cancer
    physician, cancer nurses

27
Distribution of Category Pairs
28
Collection
  • 70,000 NCs extracted from titles and abstracts
    of Medline
  • 2,627 CPs at level 0 (with at least 10 unique
    NCs)
  • We analyzed
  • 250 CPs with Anatomy (A)
  • 21 CPs with Natural Science (H01)
  • 3 CPs with Neoplasm (C04)
  • This represents 10 of total CPs and 20 of total
    NCs

29

Classification Method
  • For each CP
  • Divide its NCs into training-testing sets
  • Training inspect NCs by hand
  • Start from level 0 0
  • While NCs are not all similar
  • descend one level of the hierarchy
  • Repeat until all NCs for that CP are similar

30
Classification Decisions
  • A02 C04
  • B06 B06
  • C04 M01
  • C04 M01.643
  • C04 M01.526
  • A01 H01
  • A01 H01.770
  • A01 H01.671
  • A01 H01.671.538
  • A01 H01.671.868
  • A01 M01
  • A01 M01.643
  • A01 M01.526
  • A01 M01.898

31
Classification Decisions Relations
  • A02 C04 ? Location of Disease
  • B06 B06 ? Kind of Plants
  • C04 M01
  • C04 M01.643 ? Person afflicted by Disease
  • C04 M01.526 ? Person who treats Disease
  • A01 H01
  • A01 H01.770
  • A01 H01.671
  • A01 H01.671.538
  • A01 H01.671.868
  • A01 M01
  • A01 M01.643
  • A01 M01.526
  • A01 M01.898

32
Classification Decisions Relations
  • A02 C04 ? Location of Disease
  • B06 B06 ? Kind of Plants
  • C04 M01
  • C04 M01.643 ? Person afflicted by Disease
  • C04 M01.526 ? Person who treats Disease
  • A01 H01
  • A01 H01.770
  • A01 H01.671
  • A01 H01.671.538
  • A01 H01.671.868
  • A01 M01
  • A01 M01.643 ? Person afflicted by Disease
  • A01 M01.526
  • A01 M01.898

33
Classification Decision Levels
  • Anatomy 250 CPs
  • 187 (75) remain first level
  • 56 (22) descend one level
  • 7 (3) descend two levels
  • Natural Science (H01) 21 CPs
  • 1 ( 4) remain first level
  • 8 (39) descend one level
  • 12 (57) descend two levels
  • Neoplasms (C04) 3 CPs
  • 3 (100) descend one level

34
Evaluation
  • Test the decisions on testing set
  • Count how many NCs that fall in the groups
    defined in the classification decisions are
    similar to each other
  • Accuracy (for 2nd noun)
  • Anatomy 91
  • Natural Science 79
  • Neoplasm 100
  • Total Accuracy 90.8
  • Generalization our 415 classification decisions
    cover 46,000 possible CP pairs

35
Ambiguity Two Types
  • Lexical ambiguity
  • mortality
  • state of being mortal
  • death rate
  • Relationship ambiguity
  • bacteria mortality
  • death of bacteria
  • death caused by bacteria

36
Four Cases
Single MeSH senses
Multiple MeSH senses
Only one possible relationship abdomen
radiography, aciclovir treatment
Only one possible relationship alcoholism
treatment
Multiple relationships hospital databases,
education efforts, kidney metabolism
Multiple relationships bacteria mortality
Ambiguity of relationship
37
Four Cases
Single MeSH senses
Multiple MeSH senses
Only one possible relationship abdomen
radiography, aciclovir treatment
Only one possible relationship alcoholism
treatment
Multiple relationships bacteria mortality
Multiple relationships hospital databases,
education efforts, kidney metabolism
Most problematic cases
Ambiguity of relationship
but rare!
38
Conclusions on NN Relation Classification
  • Very simple method for assigning semantic
    relations to two-word technical NCs
  • 90.8 accuracy
  • Lexical resource (MeSH) useful for this task
  • Probably works because of the relative lack of
    ambiguity in this kind of technical text.

39
Entity-EntityRelation Recognition
40
Problem Which relations hold between 2 entities?
Cure?
Prevent?
Side Effect?
41
Hepatitis Examples
  • Cure
  • These results suggest that con A-induced
    hepatitis was ameliorated by pretreatment with
    TJ-135.
  • Prevent
  • A two-dose combined hepatitis A and B vaccine
    would facilitate immunization programs
  • Vague
  • Effect of interferon on hepatitis B

42
Two tasks
  • Relationship Extraction
  • Identify the several semantic relations that can
    occur between the entities disease and treatment
    in bioscience text
  • Entity extraction
  • Related problem identify such entities

43
The Approach
  • Data MEDLINE abstracts and titles
  • Graphical models
  • Combine in one framework both relation and entity
    extraction
  • Both static and dynamic models
  • Simple discriminative approach
  • Neural network
  • Lexical, syntactic and semantic features

44
Related Work
  • We allow several DIFFERENT relations between the
    same entities
  • Thus differs from the problem statement of other
    work on relations
  • Many find one relation which holds between two
    entities (many based on ACE)
  • Agichtein and Gravano (2000), lexical patterns
    for location of
  • Zelenko et al. (2002) SVM for person affiliation
    and organization-location
  • Hasegawa et al. (ACL 2004) Person-Organization -gt
    President relation
  • Craven (1999, 2001) HMM for subcellular-location
    and disorder-association
  • Doesnt identify the actual relation

45
Related work Bioscience
  • Many hand-built rules
  • Feldman et al. (2002),
  • Friedman et al. (2001)
  • Pustejovsky et al. (2002)
  • Saric et al. this conference

46
Data and Relations
  • MEDLINE, abstracts and titles
  • 3662 sentences labeled
  • Relevant 1724
  • Irrelevant 1771
  • e.g., Patients were followed up for 6 months
  • 2 types of Entities, many instances
  • treatment and disease
  • 7 Relationships between these entities

47
Semantic Relationships
  • 810 Cure
  • Intravenous immune globulin for recurrent
    spontaneous abortion
  • 616 Only Disease
  • Social ties and susceptibility to the common cold
  • 166 Only Treatment
  • Flucticasone propionate is safe in recommended
    doses
  • 63 Prevent
  • Statins for prevention of stroke

48
Semantic Relationships
  • 36 Vague
  • Phenylbutazone and leukemia
  • 29 Side Effect
  • Malignant mesodermal mixed tumor of the uterus
    following irradiation
  • 4 Does NOT cure
  • Evidence for double resistance to permethrin and
    malathion in head lice

49
Features
  • Word
  • Part of speech
  • Phrase constituent
  • Orthographic features
  • is number, all letters are capitalized,
    first letter is capitalized
  • MeSH (semantic features)
  • Replace words, or sequences of words, with
    generalizations via MeSH categories
  • Peritoneum -gt Abdomen

50
Models
  • 2 static generative models
  • 3 dynamic generative models
  • 1 discriminative model (neural network)

51
Static Graphical Models
  • S1 observations dependent on Role but
    independent from Relation given roles
  • S2 observations dependent on both Relation and
    Role

S1
S2
52
Dynamic Graphical Models
  • D1, D2 as in S1, S2
  • D3 only one observation per state is
  • dependent on both the relation and the role

53
Graphical Models
  • Relation node
  • Semantic relation (cure, prevent, none..)
    expressed in the sentence

54
Graphical Models
  • Role nodes
  • 3 choices treatment, disease, or none

55
Graphical Models
  • Feature nodes (observed)
  • word, POS, MeSH

56
Graphical Models
  • Different dependencies between the features and
    the relation nodes

57
Graphical Models
  • For Dynamic Model D1
  • Joint probability distribution over relation,
    roles and features nodes
  • Parameters estimated with maximum likelihood and
    absolute discounting smoothing

58
Neural Network
  • Feed-forward network (MATLAB)
  • Training with conjugate gradient descent
  • One hidden layer (hyperbolic tangent function)
  • Logistic sigmoid function for the output layer
    representing the relationships
  • Same features
  • Discriminative approach

59
Role extraction
  • Results in terms of F-measure
  • Graphical models
  • Junction tree algorithm (BNT)
  • Relation hidden and marginalized over
  • Neural Net
  • Couldnt run it (features vectors too large)
  • (Graphical models can do role extraction and
    relationship classification simultaneously)

60
Role Extraction Results
  • F-measures
  • D1 best when no smoothing

61
Role Extraction Results
  • F-measures
  • D2 best with smoothing, but doesnt boost scores
    as much as in relation classification

62
Role Extraction Results
  • Static models better than Dynamic for
  • Note No Neural Networks

63
Relation classification Results
With Smoothing and Roles, D1 best GM
64
Features impact Role Extraction
  • Most important features 1)Word, 2)MeSH
  • Models D1 D2
  • All features 0.67 0.71
  • No word 0.58 0.61
  • -13.4 -14.1
  • No MeSH 0.63 0.65
  • -5.9 -8.4

(rel. irrel.)
65
Features impact Relation classification
  • Most important features Roles
  • Accuracy D1 D2
    NN
  • All feat. roles 91.6 82.0
    96.9
  • All feat. roles 68.9 74.9
    79.6
  • -24.7 -8.7 -17.8
  • All feat. roles Word 91.6 79.8
    96.4
  • 0 -2.8
    -0.5
  • All feat. roles MeSH 91.6 84.6 97.3
  • 0 3.1
    0.4

(rel. irrel.)
66
Relation extraction
  • Results in terms of classification accuracy (with
    and without irrelevant sentences)
  • 2 cases
  • Roles hidden
  • Roles given
  • Graphical models
  • NN simple classification problem

67
Relation classification Results
Neural Net always best
68
Relation classification Results
With Smoothing and No Roles, D2 best GM
69
Relation classification Results
Dynamic models always outperform Static
70
Relation classification Results
With no smoothing, D1 best Graphical Model
71
Relation classification Confusion Matrix
  • Computed for the model D2, rel irrel., only
    features

72
Features impact Relation classification
  • Most realistic case Roles not known
  • Most important features 1) Mesh 2) Word for D1
    and NN (but vice versa for D2)
  • Accuracy D1 D2
    NN
  • All feat. roles 68.9 74.9
    79.6
  • All feat. - roles Word 66.7 66.1 76.2
  • -3.3 -11.8 -4.3
  • All feat. - roles MeSH 62.7 72.5 74.1
  • -9.1 -3.2 -6.9

(rel. irrel.)
73
Relation Recognition Conclusions
  • Classification of subtle semantic relations in
    bioscience text
  • Discriminative model (neural network) achieves
    high classification accuracy
  • Graphical models for the simultaneous extraction
    of entities and relationships
  • Importance of lexical hierarchy
  • Next Step
  • Different entities/relations
  • Semi-supervised learning to discover relation
    types

74
Acquiring Labeled Data using Citances
75
A discovery is made
A paper is written
76
That paper is cited
and cited
and cited
as the evidence for some fact(s) F.
77
Each of these in turn are cited for some fact(s)
until it is the case that all important facts
in the field can be found in citation sentences
alone!
78
Citances
  • Nearly every statement in a bioscience journal
    article is backed up with a cite.
  • It is quite common for papers to be cited 30-100
    times.
  • The text around the citation tends to state
    biological facts. (Call these citances.)
  • Different citances will state the same facts in
    different ways
  • so can we use these for creating models of
    language expressing semantic relations?

79
Using Citances
  • Potential uses of citation sentences (citances)
  • creation of training and testing data for
    semantic analysis,
  • synonym set creation,
  • database curation,
  • document summarization,
  • and information retrieval generally.
  • Some preliminary results
  • Citances to a document align well with a
    hand-built curation.
  • Citances are good candidates for paraphrase
    creation.

80
Citances for Acquiring Examples of Semantic
Relations
  • A relationship type R between entities of type A
    and B can be expressed in many ways.
  • Use citances to build a model the different ways
    to express the relationship
  • Seed learning algorithms with examples that
    mention A and B, for which relation R holds.
  • Train a model to recognize R when the relation is
    not known.
  • Results may extend to sentences that are not
    citances as well.

81
Issues for Processing Citances
  • Text span
  • Identification of the appropriate phrase, clause,
    or sentence that constructs a citance.
  • Correct mapping of citations when shown as lists
    or groups (e.g., 22-25).
  • Grouping citances by topic
  • Citances that cite the same document should be
    grouped by the facts they state.
  • Normalizing or paraphrasing citances
  • For IR, summarization, learning synonyms,
    relation extraction, question answering, and
    machine translation.

82
Related Work
  • Traditional citation analysis dates back to the
    1960s (Garfield). Includes
  • Citation categorization,
  • Context analysis,
  • Citer motivation.
  • Citation indexing systems, such as ISIs SCI, and
    CiteSeer.
  • Mercer and Di Marco (2004) propose to improve
    citation indexing using citation types.
  • Bradshaw (2003) introduces Reference Directed
    Indexing (RDI), which indexes documents using the
    terms in the citances citing them.

83
Related Work (cont.)
  • Teufel and Moens (2002) identify citances to
    improve summarization of the citing paper..
  • Nanba et. al. (2000) use citances as features for
    classifying papers into topics.
  • Related field to citation indexing is the use of
    link structure and anchor text of Web pages.
  • Applications include IR, classification, Web
    crawlers, and summarization.

84
Example protein-protein
85
Early resultsParaphrase Creation from Citances
86
Sample Sentences
  • NGF withdrawal from sympathetic neurons induces
    Bim, which then contributes to death.
  • Nerve growth factor withdrawal induces the
    expression of Bim and mediates Bax dependent
    cytochrome c release and apoptosis.
  • The proapoptotic Bcl-2 family member Bim is
    strongly induced in sympathetic neurons in
    response to NGF withdrawal.
  • In neurons, the BH3 only Bcl2 member, Bim, and
    JNK are both implicated in apoptosis caused by
    nerve growth factor deprivation.

87
Their Paraphrases
  • NGF withdrawal induces Bim.
  • Nerve growth factor withdrawal induces the
    expression of Bim.
  • Bim has been shown to be upregulated following
    nerve growth factor withdrawal.
  • Bim implicated in apoptosis caused by nerve
    growth factor deprivation.
  • They all paraphrase
  • Bim is induced after NGF withdrawal.

88
Paraphrase Creation Algorithm
  • 1. Extract the sentences that cite the target.
  • 2. Mark the NEs of interest (genes/proteins, MeSH
    terms)
  • and normalize.
  • 3. Dependency parse (MiniPar).
  • 4. For each parse
  • For each pair of NEs of interest
  • i. Extract the path between them.
  • ii. Create a paraphrase from the path.
  • 5. Rank the candidates for a given pair of NEs.
  • 6. Select only the ones above a threshold.
  • 7. Generalize.

89
Creating a Paraphrase
  • Given the path from the dependency parse
  • Restore the original word order.
  • Add words to improve grammaticality.
  • Bim shown be following nerve growth factor
    withdrawal.
  • Bim has been shown to be upregulated
    following nerve growth factor withdrawal.

90
2-word Heuristic Demonstration
  • NGF withdrawal induces Bim.
  • Nerve growth factor withdrawal induces the
    expression of Bim.
  • Bim has been shown to be upregulated
    following nerve growth factor withdrawal.
  • Bim is induced in sympathetic neurons in
    response to NGF withdrawal.
  • member Bim implicated in apoptosis caused by
    nerve growth factor deprivation.

91
Evaluation (1)
  • An influential journal paper from Neuron
  • J. Whitfield, S. Neame, L. Paquet, O. Bernard,
    and J. Ham. Dominantnegative c-jun promotes
    neuronal survival by reducing bim expression and
    inhibiting mitochondrial cytochrome c release.
    Neuron, 29629643, 2001.
  • 99 journal papers citing it
  • 203 citances in total
  • 36 different types of important biological
    factoids
  • But we concentrated on one model sentence
  • Bim is induced after NGF withdrawal.

92
Evaluation (2)
  • Set 1 67 citances pointing to the target paper
    and manually found to contain a good or
    acceptable paraphrase (do not necessarily contain
    Bim or NGF)
  • (Ideal conditions)
  • Set 2 65 citances pointing to the target paper
    and containing both Bim and NGF
  • Set 3 102 sentences from the 99 texts,
    containing both Bim and NGF
  • (Do citances do better than arbitrarily chosen
    sentences?)

93
Correctness (Judgments)
  • Bad (0.0), if
  • different relation (often phosphorylation
    aspect)
  • opposite meaning
  • vagueness (wording not clear enough).
  • Acceptable (0.5), If it was not Bad and
  • contains additional terms (e.g., DP5 protein) or
    topics (e.g., PPs like in sympathetic neurons)
  • the relation was suggested but not definitely.
  • Else Good (1.0)

94
Results
  • Obtained 55, 65 and 102 paraphrases for sets 1, 2
    and 3
  • Only one paraphrase from each sentence
  • comparison of the dependency path to that of
    the model sentence
  • - good (1.0) or acceptable (0.5)

95
Correctness (Recall)
  • Calculated on Set 1
  • 60 paraphrases (out of 67 citances)
  • 5 citances produced 2 paraphrases
  • system recall 55/67, i.e. 82.09
  • 10 of the 67 relevant in Set 1 initially missed
    by the human annotator
  • 8 good,
  • 2 acceptable.
  • human recall is 57/67, i.e. 85.07

96
Misses
  • Sample system miss (no NGF)
  • Growth factor withdrawal was shown to cause
    increased Bim expression in various populations
    of neuronal cell types.
  • Sample human miss
  • The precise targets of c-Jun necessary for the
    induction of apoptosis have been the subject of
    intense interest and recently, Bim and Dp5, both
    BH3-domain only family members, have been
    identified as pro-apoptotic genes induced in a
    c-Jun-dependent manner in both sympathetic
    neurons subjected to NGF withdrawal and in
    cerebellar granule cells deprived of KCl.

97
Grammaticality
  • Missing coordinating and
  • Hrk/DP5 Bim have been found to be
    upregulated after NGF withdrawal
  • Verb subcategorization
  • caused by NGF role for Bim
  • Extra subject words
  • member Bim implicated in apoptosis caused by NGF
    deprivation
  • sentence In neurons, the BH3-only Bcl2 member,
    Bim, and JNK are both implicated in apoptosis
    caused by NGF deprivation.

98
Related Work
  • Word-level paraphrases. Grefenstette uses a
    semantic parser to compare the distributional
    similarity of local contexts for synonyms
    extraction.
  • Phrase-level paraphrases. BarzilayMcKeown use
    POS information from the local context and
    co-training.
  • Template paraphrases. LinPantel apply the idea
    of Grefenstette to dependency tree paths. Later
    refined by Shinyamaal.
  • Sentence-level paraphrases. BarzilayLee use
    multiple sequence alignment. Pangal. merge parse
    trees into a transducer.

99
Relevant Papers
  • Citances Citation Sentences for Semantic
    Analysis of Bioscience Text, Preslav Nakov, Ariel
    Schwartz, and Marti Hearst, in the SIGIR'04
    workshop on Search and Discovery in
    Bioinformatics.  
  • Classifying Semantic Relations in Bioscience
    Text, Barbara Rosario and Marti Hearst, in ACL
    2004.  
  • The Descent of Hierarchy, and Selection in
    Relational Semantics, Barbara Rosario, Marti
    Hearst, and Charles Fillmore, in ACL 2002.

100
Thank you!
  • Marti Hearst
  • SIMS, UC Berkeley
  • http//biotext.berkeley.edu

101
Additional slides
102
  • Thompson et al. 2003
  • Frame classification and role
  • labeling for FrameNet sentences
  • Target word must be observed
  • More relations and roles

Our D1
103
Smoothing absolute discounting
  • Lower the probability of seen events by
    subtracting a constant from their count (ML
    estimate )
  • The remaining probability is evenly divided by
    the unseen events

104
F-measures for role extraction in function of
smoothing factors
105
Relation accuracies in function of smoothing
factors
Write a Comment
User Comments (0)