Learning Causal Relationships From Medical Abstracts - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Causal Relationships From Medical Abstracts

Description:

Extract contexts (pre, mid, post,direction, P) that mark causal relationships with Pr = P ... Need a better way of limiting the window of pre- and post-context, ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 11
Provided by: jone51
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning Causal Relationships From Medical Abstracts


1
Learning Causal Relationships From Medical
Abstracts
  • Jonathan Elsas
  • Jaime Arguello

2
Goal
  • Extract entity pairs (A,B,P), where it is said
    that A ? B with Pr P
  • Extract contexts (pre, mid, post,direction, P)
    that mark causal relationships with Pr P
  • Use the best entity pairs to extract more
    contexts
  • Use the best contexts to extract more entity
    pairs
  • BEST highest P( C (A,B)) P( C (pre, mid,
    post))

3
Examples
post
pre
mid
Toxic dermatitis due to TB 1 therapy.
post
mid
pre
In 518 cases, a single factor caused the abortion
post
mid
Risk of cancer persists for years following
smoking cessation.
4
Algorithm
  • Input ContextList seeds, EntityPairList seeds
  • 1 Use all Contexts in ContextList to compute
  • P( c (Ai, Bi)) Sk P( c Sk ) P ( Sk (Ai
    ,Bi))
  • 2 Keep N best (Ai, Bi) in EntityPairList
  • 3 Use all EntityPairs in EntityPairList to
    compute
  • P( c Si) Sk P( c (Ak, Bk) ) P ((Ak, Bk)
    Si)
  • 4 Keep N best Si in ContextList to compute 1
  • 5 Increment N and return to 1
  • P ( Sk (Ai ,Bi)) Sk , (Ai ,Bi) / (Ai
    ,Bi)
  • P ((Ai ,Bi) Sk) Sk , (Ai ,Bi) / Sk

5
Corpus and Tools
  • 300,000 MEDLINE abstracts
  • Annotated with POS sentence boundaries
  • Limited to within-sentence contexts
  • Use Indri IR toolkit to index annotations
  • Allows sentence retrieval
  • Structure Query Language
  • e.g. 1(ltanynn anynnpgt cause.vb ltanynn
    anynnpgt)

6
Observations (1)
  • The algorithm seems to learn well during first
    few iterations
  • One seed (Heavy smoking, cancer)
  • Learns contexts caused by and due to
  • Learns entity pairs
  • (rubber, dermatitis), (bromobenzene, necrosis)
  • All this within 2-3 iterations!

7
Observations (2)
  • Quality Rapidly degrades
  • Quickly converges to specialized area of the
    corpus.
  • A quickly growing population of people with
    appears to be non women
  • Need a better way of limiting the window of pre-
    and post-context, or when pre- and post- are
    irrelevant
  • Yet while men are increasingly kicking the habit
    doctors fear that induced could become as
    much of a threat to women in the future.
  • Doesnt depend on whether we start with
    context-seeds or entity-pair-seeds

8
Next Steps
  • More Data!
  • 300,000 abstracts way too sparse
  • Several million abstracts available from NLM
  • Add better entity extraction, and
    canonicalization
  • Problem NLM METATHESAURUS needs 26 Gigs!

9
Last Minute Google Results
  • Pre-context seems to help a lot
  • e.g. studies show that_____ causes.
  • Some Good contexts learned
  • null is less in people who quit null
  • null is the major single cause of null
  • null deaths in 2002 attributed to passive
    null
  • null accounts for at least 30 of all null
  • null is less in people who quit null
  • Need to find way to transform learned contexts
    into more general form

10
Ideas?
Write a Comment
User Comments (0)
About PowerShow.com