Decision List - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Decision List

Description:

A decision list (DL) is an ordered list of ... cote c t , c te, cote, cot ... for each ambiguous de-accented form: e.g., one for cesse, another one for cote ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 45

Provided by: facultyWa9

Learn more at: http://faculty.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Decision List

1
Decision List

LING 572
Fei Xia
1/12/06

2
Outline

Basic concepts and properties
Case study

3
Definitions

A decision list (DL) is an ordered list of
conjunctive rules.
Rules can overlap, so the order is important.
A k-DL the length of every rule is at most k.
A decision tree determines an examples class by
using the first matched rule.

4
An example

A simple DL
If X1v11 X2v21 then c1
If X2v21 X3v34 then c2
Classify an example(v11,v21,v34)
The DL is 2-DL.

5
Rivests paper

It assumes that all attributes (including goal
attribute) are binary.
It shows DL is easily learnable from examples.

6
Assignment and formula

Input attributes x1, , xn
An assignment gives each input attribute a value
(1 or 0) e.g., 10001
A boolean formula (function) maps each assignment
to a value (1 or 0)

Two formulae are equivalent if they give the same
value for same input.
Total number of different formulae
? Classification problem learn a formula given a
partial table

8
CNF an DNF

Literal
Term conjunction (and) of literals
Clause disjunction (or) of literals
CNF (conjunctive normal form) the conjunction of
clauses.
DNF (disjunctive normal form) the disjunction of
terms.
k-CNF and k-DNF

9
A slightly different definition of DT

A decision tree (DT) is a binary tree where each
internal node is labeled with a variable, and
each leaf is labeled with 0 or 1.
k-DT the depth of a DT is at most k.
A DT defines a boolean formula look at the paths
whose leaf node is 1.
An example

10
Decision list

A decision list is a list of pairs
(f1, v1), , (fr, vr),
fi are terms, and frtrue.
A decision list defines a boolean function
given an assignment x, DL(x)vj, where j is
the least index s.t. fj(x)1.

11
Relations among different representations

CNF, DNF, DT, DL
k-CNF, k-DNF, k-DT, k-DL
For any k lt n, k-DL is a proper superset of the
other three.
Compared to DT, DL has a simple structure, but
the complexity of the decisions allowed at each
node is greater.

12
k-CNF and k-DNF are proper subsets of k-DL

k-DNF is a subset of k-DL
Each term t of a DNF is converted into a decision
rule (t, 1).
k-CNF is a subset of k-DL
Every k-CNF is a complement of a k-DNF k-CNF and
k-DNF are duals of each other.
The complement of a k-DL is also a k-DL.
Neither k-CNF nor k-DNF is a subset of the other
Ex 1-DNF

13
K-DT is a proper set of k-DL

K-DT is a subset of k-DNF
Each leaf labeled with 1 maps to a term in
k-DNF.
K-DT is a subset of k-CNF
Each leaf labeled with 0 maps to a clause in
k-CNF
? k-DT is a subset of

14
K-DT, k-CNF, k-DNF and k-DT
k-CNF
k-DNF
k-DT
K-DL
15
Learnability

Positive examples vs. negative examples of the
concept being learned.
In some domains, positive examples are easier to
collect.
A sample is a set of examples.
A boolean function is consistent with a sample if
it does not contradict any example in the sample.

16
Two properties of a learning algorithm

A learning algorithm is economical if it requires
few examples to identify the correct concept.
A learning algorithm is efficient if it requires
little computational effort to identify the
correct concept.
? We prefer algorithms that are both economical
and efficient.

17
Hypothesis space

Hypothesis space F a set of concepts that are
being considered.
Hopefully, the concept being learned should be in
the hypothesis space of a learning algorithm.
The goal of a learning algorithm is to select the
right concept from F given the training data.

Discrepancy between two functions f and g
Ideally, we want to be as small as
possible.
To deal with bad luck in drawing example
according to Pn, we define a confidence
parameter

19
Polynomially learnable

A set of Boolean functions is polynomially
learnable if there exists an algorithm A and a
polynomial function
when given a sample of f of size
drawn according to Pn, A will with probability
at least output a
s.t.
Furthermore, As running time is polynomially
bounded in n and m.
K-DL is polynomially learnable.

20
How to build a decision list

Decision tree ? Decision list
Greedy, iterative algorithm that builds DLs
directly.

21
The algorithm in (Rivest, 1987)

If the example set S is empty, halt.
Examine each term of length k until a term t is
found s.t. all examples in S which make t true
are of the same type v.
Add (t, v) to decision list and remove those
examples from S.
Repeat 1-3.

22
The general greedy algorithm

RuleList, Etraining_data
Repeat until E is empty or gain is small
f Find_best_feature(E)
Let E be the examples covered by f
Let c be the most common class in E
Add (f, c) to RuleList
EE E

23
Problem of greedy algorithm

The interpretation of rules depends on preceding
rules.
Each iteration reduces the number of training
examples.
Poor rule choices at the beginning of the list
can significantly reduce the accuracy of DL
learned.
? Several papers on alternative algorithms

24
Summary of (Rivest, 1987)

Formal definition of DL
Show the relation between k-DL, k-CNF, k-DNF and
k-DL.
Prove that k-DL is polynomially learnable.
Give a simple greedy algorithm to build k-DL.

25
Outline

Basic concepts and properties
Case study

26
In practice

Input attributes and the goal are not necessarily
binary.
Ex the previous word
A term ? a feature (it is not necessarily a
conjunction of literals)
Ex the word appears in a k-word window
Only some feature types are considered, instead
of all possible features
Ex previous word and next word
Greedy algorithm quality measure
Ex a feature with minimum entropy

27
Case study accent restoration

Task to restore accents in Spanish and French
? A special case of WSD
Ex ambiguous de-accented forms
cesse ? cesse, cessé
cote ?côté, côte, cote, coté
Algorithm build a DL for each ambiguous
de-accented form e.g., one for cesse, another
one for cote
Attributes words within a window

28
The algorithm

Training
Find the list of de-accent forms that are
ambiguous.
For each ambiguous form, build a decision list.
Testing check each word in a sentence
if it is ambiguous,
then restore the accent form according to the DL

29
Step 1 Identify forms that are ambiguous
30
Step 2 Collecting training context
Context the previous three and next three
words. Strip the accents from the data. Why?
31
Step 3 Measure collocational distributions
Feature types are pre-defined.
32
Collocations
33
Step 4 Rank decision rules by log-likelihood
There are many alternatives.
word class
34
Step 5 Pruning DLs

Pruning
Cross-validation
Remove redundant rules WEEKDAY rule precedes
domingo rule.

35
Building DL

For a de-accented form w, find all possible
accented forms
Collect training contexts
collect k words on each side of w
strip the accents from the data
Measure collocational distributions
use pre-defined attribute combination
Ex -1 w, 1w, 2w
Rank decision rules by log-likelihood
Optional pruning and interpolation

36
Experiments
Prior (baseline) choose the most common form.
37
Global probabilities vs. Residual probabilities

Two ways to calculate the log-likelihood
Global probabilities using the full data set
Residual probabilities using the residual
training data
More relevant, but less data and more expensive
to compute.
Interpolation use both
In practice, global probability works better.

38
Combining vs. Not combining evidence

Each decision is based on a single piece of
evidence.
Run-time efficiency and easy modeling
It works well, at least for this task, but why?
Combining all available evidence rarely produces
different results
The gross exaggeration of prob from combining
all of these non-independent log-likelihood is
avoided

39
Summary of case study

It allows a wider context (compared to n-gram
methods)
It allows the use of multiple, highly
non-independent evidence types (compared to
Bayesian methods)
? kitchen-sink approach of the best kind

40
Advance topics
41
Probabilistic DL