Title: Student presentations (starting april 13th)
1- Student presentations (starting april 13th)
2Papers to select from
- Coalitional Games in Open Anonymous Environments,
by M. Yokoo, V. Conitzer, T. Sandholm, N. Ohta,
and A. Iwasaki. In Proc. AAAI 2005. - A polynomial-time Nash equilibrium algorithm for
repeated games, by M. L. Littman and P. Stone. In
Proc. 2003 ACM Conference on Electronic Commerce
(EC'03). - Communication Complexity as a Lower Bound for
Learning in Games, by V. Conitzer and T.
Sandholm. In Proc. ICML 2004. - Coordination in Multiagent Reinforcement
Learning A Bayesian Approach, by G. Chalkiadakis
and C. Boutilier. In Proc. AAMAS 2003. - Distributed Implementations of Vickrey-Clarke-Grov
es Mechanisms, by D. C. Parkes and J. Shneidman.
In AAMAS 2004. - If Multi-Agent Learning is the Answer, What is
the Question? , by Y. Shoham, R. Powers and T.
Grenager. In JAI 2006. - Envy-Free Auctions for Digital Goods. by A.V.
Goldberg and J.D. Hartline. In Proc. 2003 ACM
Conference on Electronic Commerce (EC'03). - Distributed Perception Networks An Architecture
for Information Fusion Systems Based on Causal
Probabilistic Models. G.Pavlin, P. de Oude, M.
Maris, T. Hood. In proc. Int. conf. on
multisensor fusion and integration for
intelligent systems. Heidelberg, 2006. - Improvement Continuous Valued Q-learning and its
Application to Vision Guided Behavior
Acquisition. Y. Takahashi, M. Takeda, and M.
Asada. In proc. fourth int. workshop on Robocup,
2000. - Using the Max-Plus Algorithm for Multiagent
Decision Making in Coordination Graphs. J. Kok
and N. Vlassis. Robocup 2005 symposium (best
paper award).
3What to do?
- 2 persons per paper
- Presentation length 30 min ( 15 min discussion)
- Not twice the same paper
4- Multiagent reinforcement learning
5Multiagent reinforcement learning
- We assume that each state s ? S is fully
observable to all agents. - Each s ? S defines a local strategic game Gs with
corresponding payoffs. - We also assume a stochastic transition model
p(ss, a), where a is the joint action of the
agents. - The task is to compute an optimal joint policy
?(s) ?i(s) that maximizes discounted future
reward. - In cooperative agents the challenge is to
guarantee that the individual optimal policies
?i(s) are coordinated.
6Independent learning
- One approach is to let each agent run Q-learning
independently of the others. - In this case the other agents are treated as part
of a dynamic environment and are not explicitly
modeled. - Problem is that p(ss, ai) is in this case
nonstationary (changes with time) because the
other agents are also learning. - Convergence of Q-learning cannot be guaranteed
anymore. - However the method has been used in practice with
reported success.
7Joint action learning
- Better results can be obtained if the agents
attempt to model each other. - Each agent maintains an action value function
Q(i)(s, a) for all states and joint action pairs
(i denotes agent i). - In this case Q-learning becomes
- Q(i)(s, a) (1-?)Q(i)(s,a) ?R ? maxa
Q(i) (s,a) - Issues to consider here
- - Representation how to represent Q(i)(s, a).
- - Optimization how to compute maxa Q(i)(s,
a). - - Exploration how to choose exploration
actions a.
8Representing Q(i)(s, a)
- The simplest choice is to use a tabular
representation - Q(i)(s, a) is a matrix with as many entries as
the pairs of states - s ? S and joint actions a ? xiAi.
- Computing maxa Q(s, a) involves just a
for-loop. - Alternatively, if many agents are involved, a
coordination graph can be used. In this case we
assume Q(s, a) Pj ?jQj(s, aj) where aj is the
joint action of a subset of agents. - In this case maxa Q(s, a) can be computed with
variable elimination.
9Exploration in multiagent RL
- We assume for simplicity that all agents receive
exactly the same reward. - Then each agent can select an exploratory joint
action a according to a Boltzmann distribution
over joint actions. - This requires that each agent samples the same
joint action! - Each agent runs Q-learning over joint actions
identically and in parallel. - In this case, the whole multiagent system is
effectively treated as a big' single agent.
10(No Transcript)
11Bayesian Networks
12Example Bayesian network
Causal network
- Directed Acyclic Graph (DAG)
- with nodes A,B,C,D,E
13Important relations
Range of probability P(A) 0 ? P(A) ? 1
Sum rule If mutually exclusive
Logic equivalent P(A ? B) P(B ? A)
Product rule
14Conditional probability
Given event b, the probability of a equals x
Or P(ab) x
The joint probability van a ? b equals P(a,b)
P(ab) P(b)
15Bayes rule
1. P(a,b) P(ab) P(b) Similarly 2. P(a,b)
P(ba) P(a) Combining 1. and 2. results in
Bayes rule P(ab) P(b) P(ba) P(a)
Or
16Independence and joint probability
Somebody is White and Male Suppose P(White)
0.5 en P(Male) 0.4 Suppose also that P(White)
and P(Male) are independent, Then it holds that
P(WhiteMale) P(White) And the joint
probability P(White ? Male) P(WhiteMale)
P(Male) P(White) P(Male) 0.50.4 0.2
1
Somebody is Tall and Male Suppose P(Tall) 0.5
and P(Male) 0.4 Suppose also that P(Tall) and
P(Male) are dependent and that the conditional
probability of Tall given Male equals P(Tall
Male) 0.8 Now is the joint probability P(Tall
? Male) P(TallMale) P(Male) 0.80.4 0.32
2
17It holds that P(White ? Male) P(Male ?
White) P(White Male) P(Male) P(Male
White) P (White) 0.5 0.4 P(Male White)
0.5 So P(Male White) 0.4
1
It holds that P(Tall ? Male) P(Male ?
Tall) P(Tall Male) P(Male) P(Male Tall)
P (Tall) 0.8 0.4 P(Male Tall) 0.5 So
P(Male Tall) 0.32 / 0.5 0.64
2
18Symmetrie, the inverse fallacy
- So dont confuse the probability that someone is
Tall given Male P(TallMale) with the probability
that someone is Male given Tall P(MakeTall) .
19Definition of a Bayesian Network (BN)
- A Bayesian network consist of
- A set variables and a set directed connections
between the variables - Every variable has a finite number of states
- The variables form a directed acyclic graph (DAG)
- Each variable A met parents B1, ,Bn has a
conditional probability table P(A B1, .,Bn)
20Variables and states
The nodes are variables with states. The states
are assigned probability values. The collection
of probability values for all states is called a
probability distribution of that variable
If A is a variable with states a1, a2
an, then P(A) is the probability distribution
over these states P(A) P(a1), P(a2)
P(an) SP(ai) 1
i
21Example
Conditional probabilities
A a1,a2
P(BA)
B b1,b2,b3
Sum of rows 1
22Compute the joint probability from the
conditional probabilities
Given the probability distribution of A P(A)
(0.4, 0.6)
P(BA)
It holds that P(bi, aj) P(biaj)P(aj)
And so P(B,A)
23Calculate P(B) from P(B,A)
P(B) SP(B,A)
A
P(bi) SP(bi,aj)
j
And so P(B) (0.52, 0.18, 0.3)
This process is called Marginalisation
24Calculate P(AB)
25Evidence
- There are two types of evidence
- Hard evidence (instantiation). It is known that a
node X is for sure in a particular state.
Example a soccer match can be in three states
win, loose, draw. After the game ended, the
state is known. - Soft evidence. For a node X is known an
indication that enables increasing the
probability of a certain state. Example. If after
the first half of a match, one team leads with 3
to 0, the probability of that teams win state
can be increased.
26Other example
S stiff neck, M meningtitis What is the
probability that somebody has meningtitis, given
a stiff neck ? a-priori P(S) 0.05 P(M)
0.0002 P(SM) 0.9 (i.o.w. if somebody has
meningtitis, there is a high probability of
having a stiff neck)
(so, if someone has a stiff neck, there is a
small probability that he has meningtitis)
27Visit to Asia
ASIA
28Types of connections
In Bayesian networks, there are three types of
connections
- Serial
- Converging
- Diverging
29Serial connections
Evidence in node A influences both nodes B and C
If there is evidence in node A, evidence in node
B has no influence on node C
Example
30Converging connection
31Convergerend
One of the parents is known. This doesnt
influence the other parent node (hoofdpijn)
One of the parents is known and the child node
antwoord is also known. In this case, a parent
influences the other parent node (hoofdpijn).
So, if no evidence is available about the child
node, then the parent nodes are independent,
othetwise not. It is said the parent nodes
are conditionally dependent of child nodes. This
is called d-separation.
Example
32Diverging connection
33Diverging
Child node geleerd does not influence the child
node hoofdpijn through parent node antwoord
if this is known (d-separated). (If antwoord
is not known, then node geleerd influences
node hoofdpijn).
The parent node antwoord influences The both
child nodes geleerd and hoofdpijn
Example
34D-separation
- Two nodes B en C in a Bayesians network are
d-separated if for all - Paths between B and C, there exists a node A in
between for which holds - The connection is serial or diverging and the
state of A is known -
- The connection is converging and the state of A
is not known.
35Why d-separation is important
- If we know that two variables are d-separated,
they can be treated as independent (at that
moments that evidence is available) and we dont
have to compute or use conditional probabilities.
So it speeds up computation. - I can also be used for modeling by creating
models that exploit such causal relations, for
example for the sake of distributedness.
36Example learned for an exam
Problem A student has to learn for an exam.
What is the probability that he gives the
correct answer? Solution We start with the
probability that the student has learned the
material of 0.5. So P(Ltrue) 0.5 The student
can give a correct or a wrong answer A. So
P(Acorrect) of kortweg P(A). This probability
is conditionally dependent of whether the student
has learned the material.
37The Bayesian network
P(A L)
In Netica (www.norsys.com)
The a-priori probabilities for P(A) given Learned
are P(Acorrect Ltrue) 0.9 P(Awrong
Ltrue) 0.1 P(Acorrect Lfalse) 0.4
P(Awrong Lfalse) 0.6
38Computation P(Acorrect)
The probability P(Acorrect) is calculated
as P(Acorrect) P(AcorrectLtrue).
P(Ltrue) P(AcorrectLfalse). P(Lfalse)
0.90.5 0.40.5 0.65
39Influence of evidence
Suppose that the student takes the exam and
produces a correct answer. This is information
that we can feed as evidence into the network.
With this evidence, what is the probability
that the student has learned the material, so
what is P(LtrueAcorrect) or P(LA)?
40Evidence
P(A L)
P(L) 0.5 P(A) 0.65 P(A L) 0.9 And
so P(L A) 0.9 0.5 / 0.65 0.692
41Example, more nodes
Suppose that we want to model the influence of a
headache H on the results. Then we can make a
(converging) network
42Calculation P(Acorrect)
The probability of a correct answer is the sum
of 4 possibilities, Multiplied with the
conditional probabilty on a correct answer So
P(Acorrect) ?P(AcorrectL,H) . P(L) .
P(H) P(AcorrectLtrue,Htrue).
P(Ltrue).P(Htrue) 0.30.50.2 0.03
P(AcorrectLtrue,Hfalse). P(Ltrue).P(Hfalse
) 0.90.50.8 0.36 P(AcorrectLfalse,Ht
rue). P(Lfalse).P(Htrue) 0.050.50.2
0.005 P(AcorrectLfalse,Hfalse).
P(Lfalse).P(Hfalse) 0.40.50.8 0.16
Total 0.555
43Explaining away
From the evidence that the car does not start,
the network calculates that most likely the
startmotor is broken (64.4 against 21.7 for
the battery being broken).
However, from new evidence that the lights
neither work, the network now calculates that the
probability that the battery is broken is
highest. Comparing variables in this manner and
drawing conclusions is called explaining away.
44BNs are populair because
- They can model events using semantics
- Modeling is done with a graphical representation.
This makes interaction between the domain expert
(who makes the model) and the engineer (who makes
computation possible) very fruitful. Furthermore,
expertise from experts is easily understandable
in this manner. - They can calculate uncertainties of events taking
place using probabilities - The use of evidence is a very powerful method,
for processing, modelling and learning. - Best known popular application is the Help
Wizard in Microsoft Office. - BNs are historically used in (medical) diagnostic
systems - Lately, they are used in real-time reasoning
systems (processing speed allows this now)
45(No Transcript)
46Bayesian networks and Multi-agent Systems
47The Rhino Cooperative Framework
- Created by Michael van Wie, Univ. of Rochester,
NY - Used for robot soccer (RoboCup) to observe the
other robots in the field - Display of actions and observations are the only
type of communication
48Assumptions in Rhino
- Each agent may hold only one intention at any
given time - All agents have the same reasoning ability
- When choosing actions, agents refer to the same
recipes (descriptions of how to carry out
plans)
49Observations in Rhino
- An observation (of an action) is made when an
action is recognized by an agent at an observed
teammate. A likelihood is returned by the
database storing all possible actions - Action recognition is therefore the process of
generating observations
50First, define what an action is
- An action is a tuple lta, t, F, Wgt
- a is an action
- t is a time interval during which the action a
must run - F is a list of preconditions (the agents goals)
that must be true before the action gets executed - W is a list of effects that will be true when the
action is completed
51Definition of observation
- An observation O is a tuple lta,vgt
- a is the observed agent
- v is a vector storing the probabilities of the
actions for each possible action (there are n of
those)
52Single Agent Recipes (SARs)
- A SAR is defining a set of possible actions to be
taken to execute a plan (for example scoring a
goal) - Example
- Suppose that the belief set specifies that an
agent must have the ball in order to score - If agent B does not have the ball, then agent B
cannot execute the plan to score
53Multi-Agent Recipes (MARs)
- A MAR is a tuple (S, F)
- S is the set of SARs
- F is the list of preconditions at team level for
undertaking that MAR - For example, the team needs to have the ball to
start the scoring MAR
54Action models / action recognition
Action recognition is the problem of inferring
goal-oriented intentions behind observed
movements So the agent needs a world
model Rhino uses Bayesian networks
55Example a pass plan
After start, the agent only passes the ball to
another agent if it sees an open path, otherwise
it will dribble. The result can be OK or
error.
path open
OK
start
pass
ball intercepted
path open
path not open
ball stolen
error
dribble
path not open
In Rhino start, dribble, pass, etc. are called
atomic actions
56Belief sensors
Sensor outputs are converted into probabilistic
beliefs
World
Believe (ball is nearby)
Believe (opponent has ball)
Believe (goal is open)
sensors
57Belief example
The variable ball_distance has four states
ball_distance Adjacent Near Moderate Far
Suppose P(ball_distance) (0.1, 0.3, 0.5, 0.1)
58Example sensor belief change(approaching ball)
Bayes network
Ball (t1)
Ball (t)
ball sensor
The output is a vector of probabilities for
P(ball_distance)
time
P(ball_distance) (0.25, 0.5, 0.2, 0.05)
(old P(ball_distance) (0.1, 0.3, 0.5, 0.1))
Now we calculate the ball speed
59Other ExampleAtomic action recognition
60Other example atomicaction recognition
CPT for ball_possesed by agentA
61Combining atomic action recognitions into agents
actions
CPT for Agent_A_passed_the_ball
62Bayesian Networks for MAS
- The Rhino system uses action recognition,
exploiting Bayesian networks to generate belief
about the observed actions by the other agents - With those believes, plans are selected and
teamplay is organized (Robo-soccer) - In general, using belief while interacting with
other agents is a useful method in case there is
uncertainty about the received information
63Bayesian Networks in Distributed Perception
Networks