HLTHINFO 730 Healthcare Decision Support Systems Lecture 6: Decision Trees - PowerPoint PPT Presentation

About This Presentation

HLTHINFO 730 Healthcare Decision Support Systems Lecture 6: Decision Trees


HLTHINFO 730 Healthcare Decision Support Systems Lecture 6: Decision Trees Lecturer: Prof Jim Warren Decision Trees Essentially flowcharts A natural order of micro ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 21
Provided by: csAuckla


Transcript and Presenter's Notes

Title: HLTHINFO 730 Healthcare Decision Support Systems Lecture 6: Decision Trees

HLTHINFO 730Healthcare Decision Support
Systems Lecture 6 Decision Trees
  • Lecturer Prof Jim Warren

Decision Trees
  • Essentially flowcharts
  • A natural order of micro decisions (Boolean
    yes/no decisions) to reach a conclusion
  • In simplest form all you need is
  • A start (marked with an oval)
  • A cascade of Boolean decisions (each with exactly
    outbound branches)
  • A set of decision nodes (marked with ovals) and
    representing all the leaves of the decision
    tree (no outbound branches)

  • Consider this fragment of the Prostate Cancer
    Workup (Evaluation) decision tree from

The page also shows supporting text Additional
testing is recommended for men expected to live 5
or more years or who have symptoms from the
cancer. For example, if the tumor is T1 or T2, a
bone scan is recommended if the PSA level is
greater than 20 or if the Gleason score is
greater than 8. A bone scan is also recommended
if the man has any symptoms, or the cancer is
growing outside the prostate (T3 or T4). A CT or
MRI of the pelvis is recommended when the tumor
is T1 or T2 and there is a 7 or greater chance
of lymph node spread based on the Partin tables,
or the tumor is growing outside the prostate (T3
or T4).
KE problems for flowchart
  • The natural language may pack a lot in
  • E.g., any one of the following
  • Even harder if they say two or more of the
    following which implies they mean to compute
    some score and then ask if its gt2
  • Incompleteness
  • There are logically possible (and, worse,
    physically possible) cases that arent handled
  • The for example in the text is a worry
  • Inconsistency
  • Are we trying to reach one decision (which test)
    or a set of decisions
  • 1) whether to do a bone scan
  • 2) whether to do a CT or MRI

Lets try it anyway
  • Whats said for staging workup looks like this

Legend S2 step 2 CT or MRI of pelvisBS
Bone scanS3 step 3 All others no additional
testingLNS 7 or greater chance of lymph
node spread based on the Partin tables Please
dont decide your fathers Prostate follow-up
from this! Its unverified, and I dont think a
tumour can be T1 or T2 and ALSO T3 or T4 (but
thats what it says!)
T1 or T2
T3 or T4
T3 or T4
Decision Tables
  • As you can see from the Prostate example, a
    flowchart can get huge
  • We can pack more into a smaller space if we
    relinquish some control on indicating the order
    of microdecisions
  • A decision table has
  • One row per rule
  • One column per decision variable
  • An additional column for the decision to take
    when that rule evaluates to true

Decision Table example
d doesnt matter (True or False)
From van Bemmel Musen, Ch 15
Flowcharts v. Tables
  • Decision table is not as natural as a flowchart
  • But weve seen, a real (complete and
    consistent) flowchart ends up very large (or
    representing a very small decision)
  • Decision table gets us close to production rule
  • Good as design specification to take to an expert
    system shell
  • Completeness is more evident with a flowchart
  • Decision table could allow for multiple rules to
    simultaneously evaluate to true
  • Messy on a flowchart (need multiple charts, or
    terminals that include every possible combination
    of decision outcomes)
  • Applying either in practice requires KE in a
    broad sense
  • E.g., may need to reformulate the goals of the

On to production rule systems
  • In a production rule system we have
    decision-table-like rule, but also the decision
    outcomes can feed back to the decision variables
  • Evaluating some special decision rule (or rules)
    is then the goal for the decision process
  • The other rules are intermediary, and might be
    part of the explanation of how externally-derived
    decision variables were used to reach a goal
  • The inference engine of the expert system shell
    chooses how to reach the goal
  • i.e., with backward chaining, or forward chaining
  • Possibly with some direction from a User
    Interface (UI) manager component (e.g., we might
    group sets of variables for input into forms as a
    web page)

Boolean Algebra
  • To formulate flow chart decisions and
    (especially) decision table rows, can help to
    have mastered Boolean Algebra
  • Basic operators
  • NOT if A was true, NOT A is false
  • AND A AND B is only true if both A and B are
  • OR A OR B is true if either A, or B, or both
    are true (aka inclusive or)
  • This is not the place for a course on Boolean
    algebra, but a few ideas will help

  • Alas there are a lot of ways the operators are
  • NOT A might appear as A, A, A' or A
  • A AND B might appear as A.B, AB, AB or simply
  • A OR B might appear as AB or AvB
  • We can use parentheses like in normal algebra
  • C(AB) means the expression is True if and only
    if C is true AND either B is true OR C is true
    (or both)
  • Its equivalent to CA CB (C-AND-A or C-AND-B,
    evaluate AND before OR)
  • So AND is a bit like multiplication, whereas OR
    is a bit like addition
  • 1 1 1 1 0 1 (inclusive OR)
  • 1 x 1 1 1 x 0 1 (logical AND)

  • If you just keep your head and focus on the
    meaning in the clinical domain, you can usually
    find the Boolean expression you need
  • Be sure to be precise
  • NOT (xgt43) is x is NOT GREATER than 43 is
    xlt43 (get your equals in the right place!)
  • (with this advice, I wont teach you De Morgans
    Law, truth tables, or Karnaugh maps, but feel
    free to look them up they all Google well)

Venn diagrams
  • Visual representations of membership in sets
  • Can be very useful to decide what Boolean
    expression you need
  • Say A is the set of everything with two legs and
    B the set of everythingthat flies
  • AB would be true for a parrot
  • A would be true for a human,B would be false
  • B would be true for a mosquito,A would be false

A 2 legs
B can fly
Decision Tree Induction
  • An alternative to knowledge engineering a
    decision tree is to turn the task over to a
    machine learning algorithm
  • The decision tree can be induced (or inducted)
    from a sufficiently large set of example
  • The ID3 algorithm is the classic for inducing a
    decision tree using Information Theory
  • If I have 50 examples where the patients survived
    and 50 where they didnt I have total (1.0)
    entropy and zero information
  • Given a set of potential decision attributes I
    can try to create more order (less entropy, more
    information) in the data

Example Induced Decision Tree
Of course they go and use ovals for listing the
decision variables, put the test criteria on the
arcs and put leaf decisions in rectangles
notations vary get used to it!
From Chen et al, Complete Blood Count as a
Surrogate CD4 Marker for HIV Monitoring in
Resource-limited Settings, 10th Conf on
Retroviruses and Opportunistic Infection, 2003.
Using Entropy measures in ID3
  • For a decision node S with pp positive example
    (e.g., surviving patients) and pn negataive
  • Entropy(S) - pplog2 pp pnlog2 pn
  • So with 15 survivors out of 25 patients
  • Entropy(S) - (15/25) log2 (15/25) - (10/25)
    log2 (10/25) 0.970
  • I want to select a Boolean attribute A that
    splits S such that the two subsets are as ordered
    as possible, usually written

ID3 continued
  • So if I have 20 available Boolean decision
  • I try splitting my cases, S, according to each,
    until I find the variable that gives the most
  • I repeat this on each sub-tree until either every
    node if perfect (all survivors, or all deaths) or
    I run out of attributes
  • If my variables arent Boolean, then I have more
    work to do
  • Actually, the Gain equation works fine if the
    attribute is multi-valued (Day of Week would be
    OK, I just have a 7-way split in my tree)
  • For continuous values I have to discretize
    make one or more split points
  • e.g., SBPlt140? now Ive made continuous-valued
    blood pressure into a Boolean
  • Can be done based on knowledge (e.g., clinical
    significance), or handed to an algorithm to
    search for the max Gain

See http//dms.irb.hr/tutorial/tut_dtrees.php
  • You dont find pure ID3 too much
  • Other algorithms in a similar spirit to search
    for are C4.5 and Adaboost
  • Tools
  • Matlab implements decision tree induction
  • Weka toolkit (from Waikato Uni) has a variety of
    Java tools for machine learning
  • Try Pierre Geurts online decision tree induction
    applet, e.g., for animal descriptions from

I de-selected backbone from the available
decision attributes, hit New Tree, then Build,
and hit Zoom a couple times (note that the
attribute order in the database effects how the
decision nodes end up phrased)
  • Decision trees are a basic design-level knowledge
    representation technique for logical (rule
    based, Boolean-predicate-driven) decisions
  • Decision tables let you compactly compile a host
    of decisions on a fixed set of decision variables
  • These take you very close to the representation
    needed to encode production rules for an
    inference engine
  • Rule induction from data provides an alternative
    to conventional Knowledge Engineering
  • Computer figures out rules that fit past
    decisions instead of you pursuing experts to ask
    them what rules they use
Write a Comment
User Comments (0)
About PowerShow.com