Title: Induction and Decision Trees
1Induction and Decision Trees
2Artificial Intelligence
- The design and development of computer systems
that exhibit intelligent behavior. - What is intelligence?
- Turing test
- Developed in 1950 by Alan Turing (pioneer in
computer science) - Computer and human in one room
- Human interrogator in another room
- Interrogator asks questions...human OR computer
answers - If interrogator cannot tell whether the human or
the computer is answering, then the computer is
intelligent
3Classification of AI Systems
- Knowledge Representation Systems
- Capture existing expert knowledge and use it to
consult end-users and provide decision support - Main types Rule-based expert systems, Case-base
reasoning systems, Frame-based knowledge systems,
Semantic networks - Machine Learning
- Algorithms that use mathematical or logical
techniques for finding patterns in data and
discovering or creating new knowledge - Main types Artificial neural networks, genetic
algorithms, inductive decision trees, Naïve
Bayesian algorithms, Clustering and
pattern-recognition algorithms
Data mining involves primarily a machine
learning form of AI
4Data Mining
- Textbook definition
- Knowledge discovery in databases
- Using statistical, mathematical, AI, and machine
learning techniques to extract useful information
and subsequent knowledge from large databases - Key point identifying patterns in large data
sets
5Microsoft SQL Server Data Mining Algorithms
- Decision Trees
- Naïve Bayesian
- Clustering
- Sequence Clustering
- Association Rules
- Neural Network
- Time Series
6Decision Trees for Machine Learning
- Based on Inductive Logic
- Three types of logical structures commonly used
in AI systems - Deduction
- Abduction
- Induction
7Deduction
- Premise (rule) if p then q
- Fact (axiom, observation) p
- Conclude q
- This is classical logic (Modus Ponens). If the
rule is correct, and the fact is correct, then
you know that the conclusion will be correct.
We are given the rule
8Abduction
- Premise (rule) if p then q
- Fact (axiom, observation) q
- Conclude p
- This form of reasoning is a logical fallacy
called affirming the consequent (Post hoc ergo
propter hoc). The conclusion may be wrong, but it
is a plausible explanation of the fact, given the
rule. Useful for diagnostic tasks.
We are given the rule
9Induction
- Observe p and q together
- .
- .
- .
- n. Observe p and q together
- Conclude if p then q
- This is stereotypical thinkinghighly error
prone.
We create the rule
10Example Baby in the kitchen
11ID3 Decision Tree Algorithm
- Iterative Dichotomizer
- Developed by Ross Quinlan (1979)
- This is the basis for many commercial induction
products - The goal of this algorithm is to find rules
resulting in YES or NO values. (Therefore, the
output of generated rules have 2 possible
outcomes) - ID3 generates a tree, where each path of the tree
represents a rule. The leaf node is the THEN part
of the rule, and the nodes leading to it are the
ANDS of attribute-value combinations in the IF
part of the rule.
12ID3 Algorithm
- Starting Point
- an empty tree (this tree will eventually
represent the final rules created by ID3) - a recordset of data elements (e.g. records from a
database) - a set attributes (fields), each with some finite
number of possible values - NOTE one of the attributes is the decision
field, with a YES or NO value (or some other
2-valued option...GOOD/BAD, HIGH/LOW, WIN/LOSE,
etc.) - Output
- a tree, where each path of the tree represents a
rule
13ID3 algorithm
- If all records in your recordset are positive
(i.e. have YES values for their decision
attribute), create a YES node and stop (end
recursion) - If all records in your recordset are negative,
create a NO node and stop (end recursion) - Select the attribute that best discriminates
among the records (using an entropy function) - Create a tree-node representing that attribute,
with n branches, where n is the number of values
for the selected attribute - Divide the records of the recordset into subsets
subrecordset 1, subrecordset 2, ..., subrecordset
n corresponding with each value of the selected
attribute - Recursively apply the algorithm to each
subrecordset i, with reduced attribute set (dont
include already used attributes further down the
path)
14Calculating Entropy
- Entropy mixture, chaos
- We want to pick the attribute with the lowest
entropy - ? ideally, a particular value for the input
attribute leads to ALL yes or ALL no in the
outcome attributeor come as close to this as
possible
An attributes entropy
Where n is the total number of possible values
for the attribute and xi is the ith value
15Babys RecordSet of Oven-Touching Experiences
16ID3 Applied to Baby-in-the-Kitchen
- Which attribute to start with? Based on Entropy
measure (assuming log base 2), - Touch stove entropy 0.918
- Mom in kitchen entropy 1.0
- To see this, note that
- Probability of touching stove leading to ouch is
.67, and not leading to ouch is .33 - .67 .33 .22
- Probability of mom being in kitchen leading to
ouch is .5 and mom being in kitchen not leading
to ouch is also .5 - .5 .5 .25
17Applying the Touch Stove Attribute
18Recurse apply the Mom in Kitchen attribute where
needed
19Resulting decision rules
- If Touch_Oven No then BOO_BOO No
- If Touch_Oven Yes and Mom_In_Kitchen Yes then
BOO_BOO Yes - If Touch_Oven Yes and Mom_In_Kitchen No then
BOO_BOO No
20Now well do this with Microsoft SQL Server