Machine Learning and ILP for Multi-Agent Systems

About This Presentation

Title:

Machine Learning and ILP for Multi-Agent Systems

Description:

Which features are used to represent examples (e.g., ammunition left) ... 0 1 is a discount factor ( = 0 means that only immediate reward is considered) ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 114

Provided by: danielk

more less

Transcript and Presenter's Notes

Title: Machine Learning and ILP for Multi-Agent Systems

1
Machine Learning and ILP for Multi-Agent Systems

Daniel Kudenko Dimitar Kazakov
Department of Computer Science
University of York, UK

ACAI-01, Prague, July 2001
2
Why Learning Agents?

Agent designers are not able to foresee all
situations that the agent will encounter.
To display full autonomy Agents need to learn
from and adapt to novel environments.
Learning is a crucial part of intelligence.

3
A Brief History
Disembodied ML
Single-Agent Learning
Machine Learning
Multiple Single-Agent Learners
Social Multi-Agent Learners
Social Multi-Agent System
Multiple Single-Agent System
Agents
Single-Agent System
4
Outline

Principles of Machine Learning (ML)
ML for Single Agents
ML for Multi-Agent Systems
Inductive Logic Programming for Agents

5
What is Machine Learning?

Definition A computer program is said to learn
from experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E. Mitchell 97
Example T play tennis, E playing
matches, P score

6
Types of Learning

Inductive Learning (Supervised Learning)
Reinforcement Learning
Discovery (Unsupervised Learning)

7
Inductive Learning

An inductive learning system aims at
determining a description of a given concept from
a set of concept examples provided by the teacher
and from background knowledge. Michalski et al.
98

8
Inductive Learning
Examples of Category C1
Examples of Category C2
Examples of Category Cn
Inductive Learning System
Hypothesis (Procedure to Classify New Examples)
9
Inductive Learning Example
Ammo low Monster near Light good Category
shoot
Ammo low Monster far Light medium Category
shoot
Ammo high Monster far Light good Category
shoot
Inductive Learning System
If (Ammo high) and (light ?medium, good)
then shoot ..
10
Performance Measure

Classification accuracy on unseen test set.
Alternatively measure that incorporates cost of
false-positives and false-negatives (e.g.
recall/precision).

11
Wheres the knowledge?

Example (or Object) language
Hypothesis (or Concept) language
Learning bias
Background knowledge

12
Example Language

Feature-value vectors, logic programs.
Which features are used to represent examples
(e.g., ammunition left)?
For agents which features of the environment are
fed to the agent (or the learning module)?
Constructive Induction automatic feature
selection, construction, and generation.

13
Hypothesis Language

Decision trees, neural networks, logic programs,
Further restrictions may be imposed, e.g., depth
of decision trees, form of clauses.
Choice of hypothesis language influences choice
of learning methods and vice versa.

14
Learning bias

Preference relation between legal hypotheses.
Accuracy on training set.
Hypothesis with zero error on training data is
not necessarily the best (noise!).
Occams razor the simpler hypothesis is the
better one.

15
Inductive Learning

No real learning without language or learning
bias.
IL is search through space of hypotheses guided
by bias.
Quality of hypothesis depends on proper
distribution of training examples.

16
Inductive Learning for Agents

What is the target concept (i.e., categories)?
Example do(a), do(a) for specific action a.
Real-valued categories/actions can be
discretized.
Where does the training data come from and what
form does it take?

17
Batch vs Incremental Learning

Batch Learning collect a set of training
examples and compute hypothesis.
Incremental Learning update hypothesis with each
new training example.
Incremental learning more suited for agents.

18
Batch Learning for Agents

When should (re-)computation of hypothesis take
place?
Example after experienced accuracy of hypothesis
drops below threshold.
Which training examples should be used?
Example sequences of actions that led to
success.

19
Eager vs. Lazy learning

Eager learning commit to hypothesis computed
after training.
Lazy learning store all encountered examples and
perform classification based on this database
(e.g. nearest neighbour).

20
Active Learning

Learner decides which training data to receive
(i.e. generates training examples and uses oracle
to classify them).
Closed Loop ML learner suggests hypothesis and
verifies it experimentally. If hypothesis is
rejected, the collected data gives rise to a new
hypothesis.

21
Black-Box vs. White-Box

Black-Box Learning Interpretation of the
learning result is unclear to a user.
White-Box Learning Creates (symbolic) structures
that are comprehensible.

22
Reinforcement Learning

Agent learns from environmental feedback
indicating the benefit of states.
No explicit teacher required.
Learning target optimal policy (i.e.,
state-action mapping)
Optimality measure e.g., cumulative discounted
reward.

23
Q Learning
Value of a state discounted cumulative reward
V?(st) ?i ? 0 ?i r(sti,ati) 0 ?
? lt 1 is a discount factor (? 0 means that
only immediate reward is considered). r(sti
,ati) is the reward determined by performing
actions specified by policy ?. Q(s,a)
r(s,a) V(?(s,a)) Optimal Policy
?(s) argmaxa Q(s,a)
24
Q Learning
Initialize all Q(s,a) to 0 In some state s choose
some action a. Let s be the resulting state.
Update Q Q(s,a) r ? maxa
Q(s,a)
25
Q Learning

Guaranteed convergence towards optimum
(state-action pairs have to be visited infinitely
often).
Exploration strategy can speed up convergence.
Basic Q Learning does not generalize replace
state-action table with function approximation
(e.g. neural net) in order to handle unseen
states.

26
Pros and Cons of RL

Clearly suited to agents acting and exploring an
environment.
Simple.
Engineering of suitable reward function may be
tricky.
May take a long time to converge.
Learning result may be not transparent (depending
on representation of Q function).

27
Combination of IL and RL

Relational reinforcement learning Dzeroski et
al. 98 leads to more general Q function
representation that may still be applicable even
if the goals or environment change.
Explanation-based learning and RL Dietterich and
Flann, 95.
More ILP and RL see later.

28
Unsupervised Learning

Acquisition of useful or interesting patterns
in input data.
Usefulness and interestingness are based on
agents internal bias.
Agent does not receive any external feedback.
Discovered concepts are expected to improve agent
performance on future tasks.

29
Learning and Verification

Need to guarantee agent safety.
Pre-deployment verification for non-learning
agents.
What to do with learning agents?

30
Learning and VerificationGordon 00

Verification after each self-modification step.
Problem Time-consuming.
Solution 1 use property-preserving learning
operators.
Solution 2 use learning operators which permit
quick (partial) re-verification.

31
Learning and Verification

What to do if verification fails?
Repair (multi)-agent plan.
Choose different learning operator.

32
Learning in Multi-Agent Systems

Classification
Social Awareness.
Communication
Role Learning.
Distributed Learning.

33
Types of Multi-Agent LearningWeiss
Dillenbourg 99

Multiplied Learning No interference in the
learning process by other agents (except for
exchange of training data or outputs).
Divided Learning Division of learning task on
functional level.
Interacting Learning cooperation beyond the pure
exchange of data.

34
Social Awareness

Awareness of existence of other agents and
(eventually) knowledge about their behavior.
Not necessary to achieve near optimal MAS
behavior rock sample collection Steels 89.
Can it degrade performance?

35
Levels of Social Awareness VidalDurfee 97

0-level agent no knowledge about existence of
other agents.
1-level agent recognizes that other agents
exist, model other agents as 0-level.
2-level agent has some knowledge about behavior
of other agents and their behavior model other
agents as 1-level agents.
k-level agent model other agents as (k-1)-level.

36
Social Awareness and Q Learning

0-level agents already learn implicitly about
other agents.
Mundhe and Sen, 00 study of two Q learning
agents up to level 2.
Two 1-level agents display slowest and least
effective learning (worse than two 0-level
agents).

37
Agent models and Q Learning

Q S ? An ? R, where n is the number of agents.
If other agents actions are not observable, need
assumption for actions of other agents.
Pessimistic assumption given an agents action
choice other agents will minimize reward.
Optimistic assumption other agents will maximize
reward.

38
Agent Models and Q Learning

Pessimistic Assumption leads to overly cautious
behavior.
Optimistic Assumption guarantees convergence
towards optimum Lauer Riedmiller 00.
If knowledge of other agents behavior available,
Q value update can be based on probabilistic
computation Claus and Boutilier 98. But no
guarantee of optimality.

39
Q Learning and CommunicationTan 93

Types of communication
Sharing sensation
Sharing or merging policies
Sharing episodes
Results
Communication generally helps
Extra sensory information may hurt

40
Role Learning

Often useful for agents to specialize in specific
roles for joint tasks.
Pre-defined roles reduce flexibility, often not
easy to define optimal distribution, may be
expensive.
How to learn roles?
Prasad et al. 96 learn optimal distribution of
pre-defined roles.

41
Q Learning of roles

CritesBarto 98 elevator domain regular Q
learning no specialization achieved (but highly
efficient behavior).
OnoFukumoto 96 Hunter-Prey domain,
specialization achieved with greatest mass
merging strategy.

42
Q Learning of Roles Balch 99

Three types of reward function local
performance-based, local shaped, global.
Global reward supports specialization.
Local reward supports emergence of homogeneous
behaviors.
Some domains benefit from learning team
heterogeneity (e.g., robotic soccer), others do
not (e.g., multi-robot foraging).
Heterogeneity measure social entropy.

43
Distributed Learning

Motivation Agents learning a global hypothesis
from local observations.
Application of MAS techniques to (inductive)
learning.
Applications Distributed Data Mining Provost
Kolluri 99, Robotic Soccer.

44
Distributed Data Mining

Provost Hennessy 96 Individual learners see
only subset of all training examples and compute
a set of local rules based on these.
Local rules are evaluated by other learners based
on their data.
Only rules with good evaluation are carried over
to the global hypothesis.

45
Bibliography
Mitchell 97 T. Mitchell. Machine Learning.
McGraw Hill, 1997. Michalski et al. 98 R.S.
Michalski, I. Bratko, M. Kubat. Machine Learning
and Data Mining Methods and Applications. Wiley,
1998. DietterichFlann 95 T. Dietterich and
N.Flann. Explanation-based Learning and
Reinforcement Learning. In Proceedings of the
Twelfth International Conference on Machine
Learning, 1995. Dzeroski et al. 98 S. Dzeroski,
L. DeRaedt, and H. Blockeel. Relational
Reinforcement Learning. In Proceedings of the
Eighth International Conference on Inductive
Logic Programming ILP-98. Springer, 1998. Gordon
00 D. Gordon Asimovian Adaptive Agents. Journal
of Artificial Intelligence Research, 13,
2000. Weiss Dilelnbourg 99 G. Weiss and P.
Dillenbourg. What is Multi in Multi-Agent
Learning? In P. Dillenbourg (ed.), Collaborative
Learning. Cognitive and Computational Approaches.
Pergamon Press, 1999. Vidal Durfee 97 J.M.
Vidal and E. Durfee. Agents Learning about
Agents A Framework and Analysis. In Working
Notes of the AAAI-97 workshop on Multiagent
Learning, 1997. Mundhe Sen 00 M. Mundhe and
S. Sen. Evaluating Concurrent Reinforcement
Learners. Proceedings of the Fourth International
Conference on Multiagent Systems, IEEE Press,
2000. Claus Boutillier 98 C. Claus and C.
Boutillier. The Dynamics of Reinforcement
Learning in Cooperative Multiagent Systems. AAAI
98. Lauer Riedmiller 00 M. Lauer and M.
Riedmiller. An Algorithm for Distributed
Reinforcement Learning in Cooperative Multi-Agent
Systems. In Proceedings of the Seventeenth
International Conference in Machine Learning,
2000.
46
Bibliography
Tan 93 M. Tan. Multi-Agent Reinforcement
Learning Independent vs. Cooperative Agents. In
Proceedings of the Tenth International Conference
on Machine Learning, 1993. Prasad et al. 96
M.V.N. Prasad, S.E. Lander and V.R. Lesser.
Learning Organizational Roles for Negotiated
Search. International Journal of Human-Computer
Studies, 48(1), 1996. Ono Fukomoto 96 N. Ono
and K. Fukomoto. A Modular Approach to
Multi-Agent Reinforcement Learning. Proceedings
of the First International Conference on
Multi-Agent Systems, 1996. Crites Barto 98 R.
Crites and A. Barto. Elevator Group Control Using
Multiple Reinforcement Learning Agents. Machine
Learning, 1998. Balch 99 T. Balch. Reward and
Diversity in Multi-Robot Foraging. Proceedings of
the IJCAI-99 Workshop on Agents Learning About,
From, and With other Agents, 1999. Provost
Kolluri 99 F. Provost and V. Kolluri. "A Survey
of Methods for Scaling Up Inductive Algorithms."
Data Mining and Knowledge Discovery 3,
1999. Provost Hennessy 96 F. Provost and D.
Hennessy. Scaling up Distributed Machine
Learning with Cooperation. AAAI 96, 1996.
47
B R E A K
48
Machine Learning and ILP for MAS Part II

Integration of ML and Agents
ILP and its potential for MAS
Agent Applications of ILP
Learning, Natural Selection and Language

49
Machine Learning and ILP for MAS Part II

Integration of ML and Agents
ILP and its potential for MAS
Agent Applications of ILP
Learning, Natural Selection and Language

50
From Machine Learning to Learning Agents

Machine Learning Learning as
the only goal

Classic Machine Learning
Active Learning
Closed Loop Machine Learning
Learning as one of many goals Learning
Agent(s)
51
Integrating Machine Learning into the Agent
Architecture

Time constraints on learning
Synchronisation between agents actions
Learning and Recall

52
Time Constraints on Learning

Machine Learning alone
predictive accuracy matters, time doesnt (just a
price to pay)
ML in Agents
Soft deadlines resources must be shared with
other activities (perception, planning, control)
Hard deadlines imposed by environment Make up
your mind now! (or theyll eat you)

53
Doing Eager vs. Lazy Learning under Time Pressure

Eager Learning
Theories typically more compact
and faster to use
Takes more time to learn do it when the agent
is idle
Lazy Learning
Knowledge acquired at (almost) no cost
May be much slower when a test example comes

54
Clear-cut vs. Any-time Learning

Consider two types of algorithms
Running a prescribed number of steps guarantees
finding a solution
can use worst case complexity analysis to find an
upper bound on the execution time
Any-time algorithms
a longer run may result in a better solution
dont know an optimal solution when they see one
example Genetic Algorithms
policies halt learning to meet hard deadlines or
when cost outweighs expected improvements of
accuracy

55
Time Constraints on Learning in Simulated
Environments

Consider various cases
Unlimited time for learning
Upper bound on time for learning
Learning in real time
Gradually tightening the constraints makes
integration easier
Not limited to simulations real-world problems
have similar setting
e.g., various types of auctions

56
Synchronisation ? Time Constraints
Unlimited time Unlimited time Upper bound Real time
1-move-per-round, batch update Logic-based MAS for conflict simulations (Kudenko, Alonso) Logic-based MAS for conflict simulations (Kudenko, Alonso)
1-move-per-round, immediate update The York MA Environment(Kazakov et al.) The York MA Environment(Kazakov et al.) The York MA Environment(Kazakov et al.)
Asynchronous Multi-agent Progol (Muggleton)
57
Learning and Recall

Agent must strike a balance between
Learning, which updates the model of the world
Recall, which applies existing model of the world
to other tasks

58
Learning and Recall (2)
Recall current model of world to choose and carry
out an action

Observe the action outcome
Update sensory information
Learn new model of the world
59
Learning and Recall (3)

Update sensory information
Recall current model of world to choose and carry
out an action
Learn new model of the world

In theory, the two can run in parallel
In practice, must share limited resources

60
Learning and Recall (4)

Possible strategies
Parallel learning and recall at all times
Mutually exclusive learning and recall
After incremental, eager learning, examples are
discarded
or kept if batch or lazy learning used
Cheap on-the-fly learning (preprocessing),
off-line computationally expensive learning
reduce raw information, change object language
analogy with human learning and the role of sleep

61
Machine Learning and ILP for MAS Part II

Integration of ML and Agents
ILP and its potential for MAS
Agent Applications of ILP
Learning, Natural Selection and Language

62
Machine Learning Revisited

ML can be seen as the task of
taking a set of observations represented in a
given object/data language and
representing (the information in) that set in
another language called concept/hypothesis
language.
A side effect of this step the ability to deal
with unseen observations.

63
Object and Concept Language

Object Language (x,y,/-).
Concept Language any ellipse (5 param.)

?
?

_
_

_
_
64
Machine Learning Biases

The concept/hypothesis language specifies the
language bias, which limits the set of all
concepts/hypotheses that can be
expressed/considered/learned.
The preference bias allows us to decide between
two hypotheses if they both classify the training
data equally.
The search bias defines the order in which
hypotheses will be considered.
Important if one does not search the whole
hypothesis space.

65
Preference Bias, Search Bias Version Space

Version space the subset of hypotheses that have
zero training error.

most gen. concept
_
_

most spec. concept

_
_
66
Inductive Logic Programming

Based on three pillars
Logic Programming (LP) to represent data and
concepts (i.e., object and concept language)
Background Knowledge to extend the concept
language
Induction as learning method

67
LP as ILP Object Language

A subset of First Order Predicate Logic (FOPL)
called Logic Programming.
Often limited to ground facts, i.e.,
propositional logic (cf. ID3 etc.).
In the latter case, data can be represented as a
single table.

68
ILP Object Language Example
Good bargain cars Good bargain cars Good bargain cars Good bargain cars ILP representation
model mileage price y/n
BMW Z3 50,000 5000 gbc(z3,50000,5000).
Audi V8 30,000 4000 gbc(v8,30000,4000).
Fiat Uno 90,000 3000 - - gbc(uno,90000,3000).
69
LP as ILP Concept Language

The concept language of ILP is relations
expressed as Horn clauses, e.g.
equal(X,X).greater(X,Y) - X gt Y.
Cf. propositional logic representation(arg11
arg21)or(arg12 arg22)...
Tedious for finite domains and impossible
otherwise.
Most often there is one target predicate
(concept) only.
exceptions exist, e.g., Progol 5.

70
Modes in ILP

Used to distinguish between
input attributes (mode )
output attributes (mode -) of the predicate
learned.
Mode used to describe attributes that must
contain a constant in the predicate definition.
E.g., use mode car_type(,,) to
learncar_type(Doors,Roof,sports_car)- Doors
lt 2, Roof convertible.

71
Modes in ILP

Used to distinguish between
input attributes (mode )
output attributes (mode -) of the predicate
learned.
Mode used to describe attributes that must
contain a constant in the predicate definition.
E.g., use mode car_type(,,) to
learncar_type(Doors,Roof,sports_car)- Doors
lt 2, Roof convertible.

72
Modes in ILP

Used to distinguish between
input attributes (mode )
output attributes (mode -) of the predicate
learned.
Mode used to describe attributes that must
contain a constant in the predicate definition.
E.g., use mode car_type(,,) to
learncar_type(Doors,Roof,sports_car)- Doors
lt 2, Roof convertible.

73
Modes in ILP

Used to distinguish between
input attributes (mode )
output attributes (mode -) of the predicate
learned.
Mode used to describe attributes that must
contain a constant in the predicate definition.
E.g., use mode car_type(-,-,) to
learncar_type(Doors,Roof,sports_car)- (Doors
1 Doors 2), Roof convertible.

74
Types in ILP

Specify the range for each argument
User-defined types represented as unary
predicatescolour(blue). colour(red).
colour(black).
Built-in types also providednat/1, real/1,
any/1 in Progol.
These definitions may or may not be generative
colour(X) instantiates X,nat(X) does not.

75
ILP Types and Modes Example
Good bargain cars Good bargain cars Good bargain cars Good bargain cars ILP representation (Progol)
model mileage price y/n modeh(1,gbc(model,mileage,price))?
BMW Z3 50,000 5000 gbc(z3,50000,5000).
Audi V8 30,000 4000 gbc(v8,30000,4000).
Fiat Uno 90,000 3000 - - gbc(uno,90000,3000).
76
Positive Only Learning

A way of dealing with domains where no negative
examples are available.
Learn the concept of non-self-destructive
actions.
The trivial definition Anything belongs to the
target concept looks all right !
Trick generate random examples and treat them as
negative.
Requires generative type definitions.

77
Background Knowledge

Only very simple math. relations, such as
identity and greater than used so
farequal(X,X).greater(X,Y) - X gt Y.
These can also be easily hard-wired in the
concept language of propositional learners.
ILPs big advantage one can extend the concept
language with user-defined concepts or background
knowledge.

78
Background Knowledge (2)

The use of certain BK predicates may be a
necessary condition for learning the right
hypothesis.
Redundant or irrelevant BK slows down the
learning.
Example
BK prod(Miles,Price,Threshold)-
Miles Price lt Threshold.
Modes modeh(1,gbc(model,miles,price))?
modeb(1,prod(miles,price,threshold))?
Th gbc(z3,Miles,Price) -
prod(Miles,Price,250000001).

79
Choice of Background Knowledge

In an ideal world one should start from a
complete model of the background knowledge of the
target population. In practice, even with the
most intensive anthropological studies, such a
model is impossible to achieve. We do not even
know what it is that we know ourselves. The best
that can be achieved is a study of the directly
relevant background knowledge, though it is only
when a solution is identified that one can know
what is or is not relevant.
The Critical Villager, Eric Dudley

80
ILP Preference Bias

Typically a trade-off between generality and
complexity
cover as many positive examples (and as few
negative ones) as you can
with as simple a theory as possible
Some ILP learners allow the users to specify
their own preference bias.

81
Induction in ILP

Bottom-up (least general generalisation)
Map a term into a variable
Drop a literal from the clause body
Top-down (refinement operator)
Instantiate a variable
Add a literal to the clause body
Mixed techniques (e.g., Progol)

82
Example of Induction
BK q(b).q(c). Training examples p(b,a).p(f,g).
- p(i,j).
p(X,Y). p(b,a) - q(b).
p(X,a).
p(X,Y) - q(X).
83
Induction in Progol

For each training example
Find the most general theory (clause) T
Find the most specific theory (clause) ?
Search the space in between in a top-down fashion

T p(X,Y) ? p(X,a) - q(X).
p(X,a).
p(X,Y) - q(X)
84
Summary of ILP Basics

Symbolic
Eager
Knowledge-oriented (white-box) learner
Complex, flexible hypothesis space
Based on Induction

85
Learning Pure Logic Programs vs. Decision Lists

Pure logic programs the order of clauses is
irrelevant, and they must not contradict each
other.
Decision lists the concept language includes the
predicate cut (!).
The use of decision lists can make for simpler
(more concise) theories.

86
Decision List Example

action(Cat,ObservedAnimal,Action).
action(Cat,Animal,stay)-dog(Animal),owner(Owner
,Animal),owner(Owner,Cat),!.
action(Cat,Animal,run)-dog(Animal),!.
action(Cat,Animal,stay).

87
Updating Decision Lists with Exceptions

action(Cat,caesar,run)- !.
action(Cat,Animal,stay)-dog(Animal),owner(Owner
,Animal),owner(Owner,Cat),!.
action(Cat,Animal,run)-dog(Animal),!.
action(Cat,Animal,stay).

88
Updating Decision Lists with Exceptions

Could be very beneficial in agents when immediate
updating of the agents knowledge is important
just add the exception at the top of the list.
Computationally inexpensive does not need to
modify the rest of the list.
Exceptions could be compiled into rules when
agent is inactive.

89
Replacing Exceptions with Rules Before

action(Cat,caesar,run)- !.
action(Cat,rex,run)- !.
action(Cat,rusty,run)- !.
action(Cat,Animal,stay)-dog(Animal),owner(Owner
,Animal),owner(Owner,Cat),!.

90
Replacing Exceptions with Rules After

action(Cat,Animal,run)-
dog(Animal),
owner(richard,Animal),!.
action(Cat,Animal,stay)-dog(Animal),owner(Owner
,Animal),owner(Owner,Cat),!.

91
Eager ILP vs. Analogical Prediction

Eager Learning learn theory, dispose of
observations.
Lazy Learning
keep all observations
compare new with old ones to classify
no explanation provided.
Analogical Prediction (Muggleton, Bain 98)
Combines the often higher accuracy of lazy
learning with an intelligible, explicit
hypothesis typical for ILP
Constructs a local theory for each new
observation that is consistent with the largest
number of training examples.

92
Analogical Prediction Example

owner(richard,caesar).
action(Cat,caesar,run).
owner(richard,rex).
action(Cat,rex,run).
owner(daniel,blackie).
action(Cat,blackie,stay).
owner(richard,rusty).
action(Cat,rusty,?).

93
Analogical Prediction Example

owner(richard,caesar).
action(Cat,caesar,run).
owner(richard,rex).
action(Cat,rex,run).
owner(daniel,blackie).
action(Cat,blackie,stay).
owner(richard,rusty).
action(Cat,Dog,run)-
owner(richard,Dog).

94
Timing Analysis of Theories Learned with ILP

The more training examples, the more accurate the
theory
but how long does it take to produce an answer ?
No theoretical work on the subject so far
Experiment shows nontrivial behaviour (reminding
of the phase transitions observed in SAT
learning).

95
Timing Analysis of ILP Theories Example

Kazakov, PhD Thesis

left simple theory with low coverage succeeds
or quickly fails ? high speed
middle medium coverage, fragmentary theory,
lots of backtracking ? low speed
right general theory with high coverage less
backtracking ? high speed

96
Machine Learning and ILP for MAS Part II

Integration of ML and Agents
ILP and its potential for MAS
Agent Applications of ILP
Learning, Natural Selection and Language

97
Agent Applications of ILP

Relational Reinforcement Learning (Džeroski, De
Raedt, Driessens)
combines reinforcement learning with ILP
generalises over previous experience and goals
(Q-table) to produce logical decision trees
results can be used to address new situations
Dont miss the next talk (1140 1310h) !

98
Agent Applications of ILP

ILP for Verification and Validation of MAS
(Jacob, Driessens, De Raedt)
Also uses FOPL decision trees
Observes agents behavour and represents it as a
logical decision tree
The rules in the decision tree can be compared
with the designers intentions
Test domain RoboCup

99
Agent Applications of ILP

Reid Ryan 2000
ILP used to help hierarchical reinforcement
learning
ILP constructs high-level features that help
discriminate between (state,action) transitions
with non-deterministic behaviour

100
Agent Applications of ILP

Matsui et al. 2000
Proposed an ILP agent that avoids actions which
will probably fail to achieve the goal.
Application domain RoboCup
Alonso Kudenko 99
ILP and EBL for conflict simulations.

101
The York MA Environment

Species of 2D agents competing for renewable,
limited resources.
Agents have simple hard-coded behaviour based on
the notion of drives.
Each agent can optionally have an ILP (Progol)
mind a separate process receiving observations
and suggesting actions.
Allows to select the values of inherited features
through natural selection.

102
The York MA Environment
103
The York MA Environment

ILP hasnt been used in experiments yet (to come
soon).
A number of experiments using inheritance studied
Kinship-driven Altruism among Agents.
The start-up project sponsored by Microsoft.
Undergraduate students involved so far Lee
Mallabone, Steve Routledge, John Barton.

104
Machine Learning and ILP for MAS Part II

Integration of ML and Agents
ILP and its potential for MAS
Agent Applications of ILP
Learning, Natural Selection and Language

105
Learning and Natural Selection

In learning, search is trivial, choosing the
right bias is hard.
But, the choice of learning bias is always
external to the learner !
To find the best suited bias one could combine
arbitrary choices of bias of with evolution and
natural selection of the fittest individuals.

106
Darwinian vs. Lamarckian Evolution

Darwinian evolution nothing learned by the
individual is encoded in the genes and passed on
to the offspring.
The Baldwin effect learning abilities (good
biases) are selected in evolution because they
give the individual a better chance in a dynamic
environment.
What is passed on to the offspring is useful, but
very general.

107
Darwinian vs. Lamarckian Evolution (2)

Lamarckian Evolution individual experience
acquired in life can be inherited.
Not the case in nature.
Doesnt mean we cant use it.
The inherited concepts may be too specific and
not of general importance.

108
Learning and Language

Language uses concepts which are
specific enough to be useful to most/all speakers
of that language
general enough to correspond to shared experience
(otherwise, how would one know what the other is
talking about !)
The concepts of a language serve as a learning
bias which is inherited not in genes but
through education.

109
Communication and Learning

Language
helps one learn (in addition to inherited biases)
allows to communicate knowledge.
Distinguish between
Knowledge things that one can explain by the
means of a language to another.
Skills the rest, require individual learning,
cannot be communicated.
If watching was enough to learn, the dog would
have become a butcher. Bulgarian proverb.

110
Communication and Learning (2)

In NLP, forgetting examples may be harmful (van
den Bosch et al.)
An expert is someone who does not think anymore
he knows. Frank Lloyd Wright.
It may be difficult to communicate what one has
learned because of
Limited bandwidth (for lazy learning)
The absence of appropriate concepts in the
language (for black-box learning)

111
Communication and Learning (3)

In a society of communicating agents, less
accurate white-box learning may be better than
more accurate but expensive learning that cannot
be communicated since the reduced performance
could be outweighed by the much lower cost of
learning.