Learning Causal Relationships From Medical Abstracts

About This Presentation

Title:

Learning Causal Relationships From Medical Abstracts

Description:

Extract contexts (pre, mid, post,direction, P) that mark causal relationships with Pr = P ... Need a better way of limiting the window of pre- and post-context, ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 11

Provided by: jone51

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning Causal Relationships From Medical Abstracts

1
Learning Causal Relationships From Medical
Abstracts

Jonathan Elsas
Jaime Arguello

2
Goal

Extract entity pairs (A,B,P), where it is said
that A ? B with Pr P
Extract contexts (pre, mid, post,direction, P)
that mark causal relationships with Pr P
Use the best entity pairs to extract more
contexts
Use the best contexts to extract more entity
pairs
BEST highest P( C (A,B)) P( C (pre, mid,
post))

3
Examples
post
pre
mid
Toxic dermatitis due to TB 1 therapy.
post
mid
pre
In 518 cases, a single factor caused the abortion
post
mid
Risk of cancer persists for years following
smoking cessation.
4
Algorithm

Input ContextList seeds, EntityPairList seeds
1 Use all Contexts in ContextList to compute
P( c (Ai, Bi)) Sk P( c Sk ) P ( Sk (Ai
,Bi))
2 Keep N best (Ai, Bi) in EntityPairList
3 Use all EntityPairs in EntityPairList to
compute
P( c Si) Sk P( c (Ak, Bk) ) P ((Ak, Bk)
Si)
4 Keep N best Si in ContextList to compute 1
5 Increment N and return to 1
P ( Sk (Ai ,Bi)) Sk , (Ai ,Bi) / (Ai
,Bi)
P ((Ai ,Bi) Sk) Sk , (Ai ,Bi) / Sk

5
Corpus and Tools

300,000 MEDLINE abstracts
Annotated with POS sentence boundaries
Limited to within-sentence contexts
Use Indri IR toolkit to index annotations
Allows sentence retrieval
Structure Query Language
e.g. 1(ltanynn anynnpgt cause.vb ltanynn
anynnpgt)

6
Observations (1)

The algorithm seems to learn well during first
few iterations
One seed (Heavy smoking, cancer)
Learns contexts caused by and due to
Learns entity pairs
(rubber, dermatitis), (bromobenzene, necrosis)
All this within 2-3 iterations!

7
Observations (2)

Quality Rapidly degrades
Quickly converges to specialized area of the
corpus.
A quickly growing population of people with
appears to be non women
Need a better way of limiting the window of pre-
and post-context, or when pre- and post- are
irrelevant
Yet while men are increasingly kicking the habit
doctors fear that induced could become as
much of a threat to women in the future.
Doesnt depend on whether we start with
context-seeds or entity-pair-seeds

8
Next Steps