Machine Learning: Symbolbased

About This Presentation
Title:

Machine Learning: Symbolbased

Description:

a collection of unclassified objects, and ... Query 'salsa' submitted to MetaCrawler returns 246 documents in 15 clusters, of ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 48
Provided by: MBE
Learn more at: https://pages.mtu.edu

less

Transcript and Presenter's Notes

Title: Machine Learning: Symbolbased


1
Machine Learning Symbol-based
9
9.0 Introduction 9.1 A Framework
for Symbol-based Learning 9.2 Version Space
Search 9.3 The ID3 Decision Tree Induction
Algorithm 9.4 Inductive Bias and Learnability
9.5 Knowledge and Learning 9.6 Unsupervised
Learning 9.7 Reinforcement Learning 9.8 Epilogue
and References 9.9 Exercises
Additional sources used in preparing the
slides Jeffrey Ullmans data mining lecture
notes (clustering) Ernest Davis lecture notes
(clustering) Dean, Allen, and Aloimonos AI
textbook (reinforcement learning)
2
Unsupervised learning
3
Conceptual Clustering
  • The clustering problem
  • Given
  • a collection of unclassified objects, and
  • a means for measuring the similarity of objects
    (distance metric),
  • find
  • classes (clusters) of objects such that some
    standard of quality is met (e.g., maximize the
    similarity of objects in the same class.)
  • Essentially, it is an approach to discover a
    useful summary of the data.

4
Conceptual Clustering (contd)
  • Essentially, it is an approach to discover a
    useful summary of the data.
  • Ideally, we would like to represent clusters and
    their semantic explanations. In other words, we
    would like to define clusters extensionally
    (i.e., by general rules) rather than
    intensionally (i.e., by enumeration).
  • For instance, compare
  • X X teaches AI at MTU CS, and
  • John Lowther, Nilufer Onder

5
Example a cholera outbreak in London
  • Many years ago, during a cholera outbreak in
    London, a physician plotted the location of cases
    on a map. Properly visualized, the data indicated
    that cases clustered around certain
    intersections, where there were polluted wells,
    not only exposing the cause of cholera, but
    indicating what to do about the problem.

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
6
Higher dimensional examples
  • Observation that customers who buy diapers are
    more likely to buy beer than average allowed
    supermarkets to place beer and diapers nearby,
    knowing many customers would walk between them.
    Placing potato chips between increased the sales
    of all three items.

7
Higher dimensional examples (contd)
  • Skycat clustered 2 x 109 sky objects into stars,
    galaxies, quasars, etc. Each object was a point
    in a space of 7 dimensions, with each dimension
    representing radiation in one band of the
    spectrum. The Sloan Sky Survey is a more
    ambitious attempt to catalog and cluster the
    entire visible universe. Clustering sky objects
    by their radiation levels in different bands
    allowed astronomers to distinguish between
    galaxies, nearby stars, and many other kinds of
    celestial objects.

8
Skycat software
9
Higher dimensional examples (contd)
  • Documents may be thought of as points in a
    high-dimensional space, where each dimension
    corresponds to one possible word. The position of
    a document in a dimension is the number of times
    the word occurs in a document (or just 1 if it
    occurs, 0 if not). Clusters of documents in this
    space often correspond to groups of documents on
    the same topic.Query salsa submitted to
    MetaCrawler returns 246 documents in 15 clusters,
    of which the top are Puerto Rico Latin Music
    (8 docs) Follow Up Post York Salsa Dancers (20
    docs) music entertainment latin artists (40
    docs) hot food chiles sauces condiments
    companies (79 docs) pepper onion tomatoes
    (41 docs)

10
Measuring distance
  • To discuss, whether a set of points is close
    enough to be considered a cluster, we need a
    distance measure D(x,y) that tells how far points
    x and y are.
  • The usual axioms for a distance measure D are
    1. D(x,x) 0. A point is distance
    0 from itself. 2. D(x,y) D(y,x).
    Distance is symmetric. 3. D(x,y) ? D(x,z)
    D(z,y). The triangle inequality.

11
K-dimensional Euclidean space
  • The distance between any two points, saya a1,
    a2, , ak and b b1, b2, , bkis given in
    one of the usual manners
  • 1. Common distance (L2 norm)
    ?i 1 (ai - bi)2 2. Manhattan distance
    (L1 norm) ?i 1 ai -
    bi3. Max of dimensions (L? norm)
    maxi 1 ai - bi

b
k
a
b
k
a
b
k
a
12
Non-Euclidean spaces
  • Here are some examples where a distance measure
    without a Euclidean space makes sense.
  • Web pages Roughly 108-dimensional space where
    each dimension corresponds to one word. Rather
    use vectors to deal with only the words actually
    present in documents a and b.
  • Character strings, such as DNA sequences Rather
    use a metric based on the LCS---Lowest Common
    Subsequence.
  • Objects represented as sets of symbolic, rather
    than numeric, features Rather base similarity on
    the proportion of features that they have in
    common.

13
Non-Euclidean spaces (contd)
  • object1 small, red, rubber, ball
  • object2 small, blue, rubber, ball
  • object3 large, black, wooden, ball
  • similarity(object1, object2) 3 / 4
  • similarity(object1, object3)
    similarity(object2, object3) 1/4
  • Note that it is possible to assign different
    weights to features.

14
Approaches to Clustering
  • Broadly specified, there are two classes of
    clustering algorithms
  • 1. Centroid approaches We guess the centroids or
    central point in each cluster, and assign points
    to the cluster of their nearest centroid.
  • 2. Hierarchical approaches We begin assuming
    that each point is a cluster by itself. We
    repeatedly merge nearby clusters, using some
    measure of how close two clusters are (e.g.,
    distance between their centroids), or how good a
    cluster the resulting group would be (e.g., the
    average distance of points in the cluster from
    the resulting centroid.)

15
The k-means algorithm
  • Pick k cluster centroids.
  • Assign points to clusters by picking the closest
    centroid to the point in question. As points are
    assigned to clusters, the centroid of the cluster
    may migrate.
  • Example Suppose that k 2 and we assign points
    1, 2, 3, 4, 5, in that order. Outline circles
    represent points, filled circles represent
    centroids.

5
1
2
3
4
16
The k-means algorithm example (contd)
5
5
1
1
2
2
3
3
4
4
5
5
1
1
2
2
3
3
4
4
17
Issues
  • How to initialize the k centroids? Pick points
    sufficiently far away from any other centroid,
    until there is k.
  • As computation progresses, one can decide to
    split one cluster and merge two, to keep the
    total at k. A test for whether to do so might be
    to ask whether doing so reduces the average
    distance from points to their centroids.
  • Having located the centroids of k clusters, we
    can reassign all points, since some points that
    were assigned early may actually wind up closer
    to another centroid, as the centroids move about.

18
Issues (contd)
  • How to determine k? One can try different
    values for k until the smallest k such that
    increasing k does not much decrease the average
    points of points to their centroids.

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
19
Issues (contd)
X
X
When k 1, all the points are in one cluster,
and the average distance to the centroid will be
high.
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
When k 2, one of the clusters will be by itself
and the other two will be forced into one
cluster. The average distance of points to the
centroid will shrink considerably.
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
20
Issues (contd)
X
X
When k 3, each of the apparent clusters should
be a cluster by itself, and the average distance
from the points to their centroids shrinks again.
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
When k 4, then one of the true clusters will be
artificially partitioned into two nearby
clusters. The average distance to centroid will
drop a bit, but not much.
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
21
Issues (contd)
Average radius
1
2
3
4
k
  • This failure to drop further suggests that k 3
    is right. This conclusion can be made even if the
    data is in so many dimensions that we cannot
    visualize the clusters.

22
The CLUSTER/2 algorithm
  • 1. Select k seeds from the set of observed
    objects. This may be done randomly or according
    to some selection function.
  • 2. For each seed, using that seed as a positive
    instance and all other seeds as negative
    instances, produce a maximally general definition
    that covers all of the positive and none of the
    negative instances (multiple classifications of
    non-seed objects are possible.)

23
The CLUSTER/2 algorithm (contd)
  • 3. Classify all objects in the sample according
    to these descriptions. Replace each maximally
    specific description that covers all objects in
    the category (to decrease the likelihood that
    classes overlap on unseen objects.)
  • 4. Adjust remaining overlapping definitions.
  • 5. Using a distance metric, select an element
    closest to the center of each class.
  • 6. Repeat steps 1-5 using the new central
    elements as seeds. Stop when clusters are
    satisfactory.

24
The CLUSTER/2 algorithm (contd)
  • 7. If clusters are unsatisfactory and no
    improvement occurs over several iterations,
    select the new seeds closest to the edge of the
    cluster.

25
The steps of a CLUSTER/2 run
26
A COBWEB clustering for four one-celled organisms
(Gennari et al.,1989)
Note we will skip the COBWEB algorithm
27
Related communities
  • data mining (in databases, over the web)
  • statistics
  • clustering algorithms
  • visualization
  • databases

28
Reinforcement Learning
  • A form of learning where the agent can explore
    and learn through interaction with the
    environment.
  • The agent learns a policy which is a mapping
    from states to actions. The policy tells what the
    best move is in a particular state.
  • It is a general methodology planning, decision
    making, search can all be viewed in the context
    of reinforcement learning.

29
Tic-tac-toe a different approach
  • Recall the minimax approach The agent knows
    its current state. Generates a two layer search
    tree taking into account all the possible moves
    for itself and the opponent. Backs up values from
    the leaf nodes and takes the best move assuming
    that the opponent will also do so.
  • An alternative is to directly start playing with
    an opponent (does not have to be perfect,but
    could as well be). Assume no prior knowledge or
    lookahead. Assign values to states 1 is
    win 0 is loss or draw 0.5 is anything else

30
Notice that 0.5 is arbitrary, it cannot
differentiate between good moves and bad moves.
So, the learner has no guidance initially. It
engages in playing. When the game ends, if it is
a win, the value 1 will be propagated backwards.
If it is a draw or a loss, the value 0 is
propagated backwards. Eventually, earlier states
will be labeled to reflect their true value.
After several plays, the learner will learn the
best move given a state (a policy.)
31
Issues in generalizing this approach
  • How will the state values be initialized or
    propagated backwards?
  • What if there is no end to the game (infinite
    horizon)?
  • This is an optimization problem which suggests
    that it is hard. How can an optimal policy be
    learned?

32
A simple robot domain
The robot is in one of the states 0, 1, 2, 3.
Each one represents an office, the offices are
connected in a ring. Three actions are
available moves to the next state
- moves to the previous state _at_
remains at the same state
_at_
_at_

0
1
-


-
-
-
3
2
_at_
_at_

33
The robot domain (contd)
  • The robot can observe the label of the state it
    is in and perform any action corresponding to an
    arc leading out of its current state.
  • We assume that there is a clock governing the
    passage of time, and that at each tick of the
    clock the robot has to perform an action.
  • The environment is deterministic, there is a
    unique state resulting from any initial state and
    action. (Yes, the diagram in the previous page is
    a state-transition diagram.)
  • Each state has a reward, 10 for state 3, 0 for
    the others.

34
Compare three policies
  • a. Every state is mapped to _at_
  • The value of this policy is 0, because the
    robot will never get to office 3.
  • b. Every state is mapped to
    policy 0
  • The value of this policy is ?, because the
    robot will end up in office 3 infinitely often.
  • c. Every state is except 3 is mapped to , 3 is
    mapped to _at_
    policy 1
  • The valus of this policy is also ?, because
    the robot will end up (stay) in office 3
    infinitely often.

35
Compare three policies
So, it is easy to rule case a out, but how can we
show that policy 1 is better than policy 0?
  • POLICY 1
  • The average reward per tick for state 0 is 10.
  • The discounted cumulative reward for state 0 is
    2.5.

POLICY 0 The average reward per tick for state 0
is 10/4. The discounted cumulative reward for
state 0 is 1.33.
36
Discounted cumulative reward
  • Assume that the robot associates a higher value
    with more immediate rewards and therefore
    discounts future rewards.
  • The discount rate (?) is a number between 0 and 1
    used to discount future rewards.
  • The discounted cumulative reward for a particular
    state with respect to a given policy is the sum
    for n from 0 to infinity of ?n times the reward
    associated with the state reached after the n-th
    tick of the clock.

37
Discounted cumulative reward (contd)
  • Take ? 0.5
  • For state 0 with respect to policy 00.50 x 0
    0.51 x 0 0.52 x 0 0.53 x 10 0.54 x 0 0.55
    x 0 0.56 x 0 0.57 x 10 1.25 0.078
    1.33 in the limit
  • For state 0 with respect to policy 00.50 x 0
    0.51 x 0 0.52 x 0 0.53 x 10 0.54 x 10
    0.55 x 10 0.56 x 10 0.57 x 10 2.5 in
    the limit

38
Discounted cumulative reward (contd)
  • Let j be a state,R(j) be the reward for ending
    up in state j,? be a fixed policy,?(j) be the
    action dictated by ? in state j,f(j,a) be the
    next state given the robot starts in state j and
    performs action a,V?i(j) be the estimated value
    of state j with respect to the policy ? after the
    i-th iteration of the algorithm
  • Using a dynamic programming algorithm, one can
    obtain a good estimate of V?, the value function
    for policy ? as i ? ?.

39
A dynamic programming algorithm to compute values
for states
  • 1. For each j, set V?0(j) to 0.
  • 2. Set i to 0.
  • 3. For each j, set V?I1 (j) to R(j) ? V?i(
    f(j,?) ) ).
  • 4. Set i to i 1.
  • 5. If i is equal to the maximum number of
    iterations, then return V?I
    otherwise, return to step 3.

40
Temporal credit assignment problem
  • The problem of assigning credit or blame to the
    actions in a sequence of actions where feedback
    is available only at the end of the sequence.
  • When you lose a game of chess or checkers, the
    blame for your loss cannot necessarily be
    attributesd to the last move you made, or even
    the next-to-the-last move.
  • Dynamic programming solves the temporal credit
    assignment problem by propogating rewards
    backwards to earlier states and hence to actions
    earlier in the sequence of actions determined by
    a policy.

41
Computing an optimal policy
  • Given a method for estimating the value of states
    with respect to a fixed policy, it is possible to
    find an optimal policy. We would like to maximize
    the discounted cumulative reward.
  • Policy iteration Howard, 1960 is an algorithm
    that uses the algorithm for computing the value
    of a state as a subroutine.

42
Policy iteration algorithm
  • 1. Let ?0 be an arbitrary policy.
  • 2. Set i to 0.
  • 3. Compute V?0 (j) for each j.
  • 4. Compute a new policy ?i1 so that ?i1 (j) is
    the action a maximizing R(j) ? V?i( f(j,?) ) .
  • 5. If ?i1 ?i , then return ?I otherwise, set
    i to i 1, and go to step 3.

43
Policy iteration algorithm (contd)
  • A policy ? is said to be the optimal policy if
    there is no other policy ? and state j such that
    V? (j) V? (j) and for all k ? j V? (j) V?
    (j) .
  • The policy iteration algorithm is guaranteed to
    terminate in a finite number of steps with an
    optimal policy.

44
Comments on reinforcement learning
  • A general model where an agent can learn to
    function in dynamic environments
  • The agent can learn while interacting with the
    environment
  • No prior knowledge except the (probabilistic)
    transitions is assumed
  • Can be generalized to stochastic domains (an
    action might have several different probabilistic
    consequences, i.e., the state-transition function
    is not deterministic)
  • Can also be generalized to domains where the
    reward function is not known

45
Famous example TD-Gammon (Tosauro, 1995)
  • Learns to play Backgammon
  • Immediate reward 100 if win -100 if lose 0
    for all other states
  • Trained by playing 1.5 million games against
    itself (several weeks)
  • Now approximately equal to best human player
    (won World Cup of Backgammon in 1992 among top 3
    since 1995)
  • Predecessor NeuroGammon Tesauro and Sejnowski,
    1989 learned from examples of labelled moves
    (very tedious for human expert)

46
Other examples
  • Robot learning to dock on battery charger
  • Pole balancing
  • Elevator dispatching Crites and Barto, 1995
    better than industry standard
  • Inventory management Van Roy et. Al 10-15
    improvement over industry standards
  • Job-shop scheduling for NASA space missions
    Zhang and Dietterich, 1997
  • Dynamic channel assignment in cellular phones
    Singh and Bertsekas, 1994
  • Robotic soccer

47
Common characteristics
  • delayed reward
  • opportunity for active exploration
  • possibility that state only partially observable
  • possible need to learn multiple tasks with same
    sensors/effectors
  • there may not be an adequate teacher
Write a Comment
User Comments (0)