Student presentations (starting april 13th)

About This Presentation

Title:

Student presentations (starting april 13th)

Description:

Suppose also that P(White) and P(Male) are independent, ... Observations in Rhino ... The Rhino system uses action recognition, exploiting Bayesian networks to ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 64

Provided by: prei173

more less

Transcript and Presenter's Notes

Title: Student presentations (starting april 13th)

1

Student presentations (starting april 13th)

2
Papers to select from

Coalitional Games in Open Anonymous Environments,
by M. Yokoo, V. Conitzer, T. Sandholm, N. Ohta,
and A. Iwasaki. In Proc. AAAI 2005.
A polynomial-time Nash equilibrium algorithm for
repeated games, by M. L. Littman and P. Stone. In
Proc. 2003 ACM Conference on Electronic Commerce
(EC'03).
Communication Complexity as a Lower Bound for
Learning in Games, by V. Conitzer and T.
Sandholm. In Proc. ICML 2004.
Coordination in Multiagent Reinforcement
Learning A Bayesian Approach, by G. Chalkiadakis
and C. Boutilier. In Proc. AAMAS 2003.
Distributed Implementations of Vickrey-Clarke-Grov
es Mechanisms, by D. C. Parkes and J. Shneidman.
In AAMAS 2004.
If Multi-Agent Learning is the Answer, What is
the Question? , by Y. Shoham, R. Powers and T.
Grenager. In JAI 2006.
Envy-Free Auctions for Digital Goods. by A.V.
Goldberg and J.D. Hartline. In Proc. 2003 ACM
Conference on Electronic Commerce (EC'03).
Distributed Perception Networks An Architecture
for Information Fusion Systems Based on Causal
Probabilistic Models. G.Pavlin, P. de Oude, M.
Maris, T. Hood. In proc. Int. conf. on
multisensor fusion and integration for
intelligent systems. Heidelberg, 2006.
Improvement Continuous Valued Q-learning and its
Application to Vision Guided Behavior
Acquisition. Y. Takahashi, M. Takeda, and M.
Asada. In proc. fourth int. workshop on Robocup,
2000.
Using the Max-Plus Algorithm for Multiagent
Decision Making in Coordination Graphs. J. Kok
and N. Vlassis. Robocup 2005 symposium (best
paper award).

3
What to do?

2 persons per paper
Presentation length 30 min ( 15 min discussion)
Not twice the same paper

Multiagent reinforcement learning

5
Multiagent reinforcement learning

We assume that each state s ? S is fully
observable to all agents.
Each s ? S defines a local strategic game Gs with
corresponding payoffs.
We also assume a stochastic transition model
p(ss, a), where a is the joint action of the
agents.
The task is to compute an optimal joint policy
?(s) ?i(s) that maximizes discounted future
reward.
In cooperative agents the challenge is to
guarantee that the individual optimal policies
?i(s) are coordinated.

6
Independent learning

One approach is to let each agent run Q-learning
independently of the others.
In this case the other agents are treated as part
of a dynamic environment and are not explicitly
modeled.
Problem is that p(ss, ai) is in this case
nonstationary (changes with time) because the
other agents are also learning.
Convergence of Q-learning cannot be guaranteed
anymore.
However the method has been used in practice with
reported success.

7
Joint action learning

Better results can be obtained if the agents
attempt to model each other.
Each agent maintains an action value function
Q(i)(s, a) for all states and joint action pairs
(i denotes agent i).
In this case Q-learning becomes
Q(i)(s, a) (1-?)Q(i)(s,a) ?R ? maxa
Q(i) (s,a)
Issues to consider here
- Representation how to represent Q(i)(s, a).
- Optimization how to compute maxa Q(i)(s,
a).
- Exploration how to choose exploration
actions a.

8
Representing Q(i)(s, a)

The simplest choice is to use a tabular
representation
Q(i)(s, a) is a matrix with as many entries as
the pairs of states
s ? S and joint actions a ? xiAi.
Computing maxa Q(s, a) involves just a
for-loop.
Alternatively, if many agents are involved, a
coordination graph can be used. In this case we
assume Q(s, a) Pj ?jQj(s, aj) where aj is the
joint action of a subset of agents.
In this case maxa Q(s, a) can be computed with
variable elimination.

9
Exploration in multiagent RL

We assume for simplicity that all agents receive
exactly the same reward.
Then each agent can select an exploratory joint
action a according to a Boltzmann distribution
over joint actions.
This requires that each agent samples the same
joint action!
Each agent runs Q-learning over joint actions
identically and in parallel.
In this case, the whole multiagent system is
effectively treated as a big' single agent.

10
(No Transcript)
11
Bayesian Networks
12
Example Bayesian network
Causal network

Directed Acyclic Graph (DAG)
with nodes A,B,C,D,E

13
Important relations
Range of probability P(A) 0 ? P(A) ? 1
Sum rule If mutually exclusive
Logic equivalent P(A ? B) P(B ? A)
Product rule
14
Conditional probability
Given event b, the probability of a equals x
Or P(ab) x
The joint probability van a ? b equals P(a,b)
P(ab) P(b)
15
Bayes rule
1. P(a,b) P(ab) P(b) Similarly 2. P(a,b)
P(ba) P(a) Combining 1. and 2. results in
Bayes rule P(ab) P(b) P(ba) P(a)
Or
16
Independence and joint probability
Somebody is White and Male Suppose P(White)
0.5 en P(Male) 0.4 Suppose also that P(White)
and P(Male) are independent, Then it holds that
P(WhiteMale) P(White) And the joint
probability P(White ? Male) P(WhiteMale)
P(Male) P(White) P(Male) 0.50.4 0.2
1
Somebody is Tall and Male Suppose P(Tall) 0.5
and P(Male) 0.4 Suppose also that P(Tall) and
P(Male) are dependent and that the conditional
probability of Tall given Male equals P(Tall
Male) 0.8 Now is the joint probability P(Tall
? Male) P(TallMale) P(Male) 0.80.4 0.32
2
17
It holds that P(White ? Male) P(Male ?
White) P(White Male) P(Male) P(Male
White) P (White) 0.5 0.4 P(Male White)
0.5 So P(Male White) 0.4

1
It holds that P(Tall ? Male) P(Male ?
Tall) P(Tall Male) P(Male) P(Male Tall)
P (Tall) 0.8 0.4 P(Male Tall) 0.5 So
P(Male Tall) 0.32 / 0.5 0.64
2
18
Symmetrie, the inverse fallacy

So dont confuse the probability that someone is
Tall given Male P(TallMale) with the probability
that someone is Male given Tall P(MakeTall) .

19
Definition of a Bayesian Network (BN)

A Bayesian network consist of
A set variables and a set directed connections
between the variables
Every variable has a finite number of states
The variables form a directed acyclic graph (DAG)
Each variable A met parents B1, ,Bn has a
conditional probability table P(A B1, .,Bn)

20
Variables and states
The nodes are variables with states. The states
are assigned probability values. The collection
of probability values for all states is called a
probability distribution of that variable
If A is a variable with states a1, a2
an, then P(A) is the probability distribution
over these states P(A) P(a1), P(a2)
P(an) SP(ai) 1
i
21
Example
Conditional probabilities
A a1,a2
P(BA)
B b1,b2,b3
Sum of rows 1
22
Compute the joint probability from the
conditional probabilities
Given the probability distribution of A P(A)
(0.4, 0.6)
P(BA)
It holds that P(bi, aj) P(biaj)P(aj)
And so P(B,A)
23
Calculate P(B) from P(B,A)
P(B) SP(B,A)
A
P(bi) SP(bi,aj)
j
And so P(B) (0.52, 0.18, 0.3)
This process is called Marginalisation
24
Calculate P(AB)
25
Evidence

There are two types of evidence
Hard evidence (instantiation). It is known that a
node X is for sure in a particular state.
Example a soccer match can be in three states
win, loose, draw. After the game ended, the
state is known.
Soft evidence. For a node X is known an
indication that enables increasing the
probability of a certain state. Example. If after
the first half of a match, one team leads with 3
to 0, the probability of that teams win state
can be increased.

26
Other example
S stiff neck, M meningtitis What is the
probability that somebody has meningtitis, given
a stiff neck ? a-priori P(S) 0.05 P(M)
0.0002 P(SM) 0.9 (i.o.w. if somebody has
meningtitis, there is a high probability of
having a stiff neck)
(so, if someone has a stiff neck, there is a
small probability that he has meningtitis)
27
Visit to Asia
ASIA
28
Types of connections
In Bayesian networks, there are three types of
connections

Serial
Converging
Diverging

29
Serial connections
Evidence in node A influences both nodes B and C
If there is evidence in node A, evidence in node
B has no influence on node C
Example
30
Converging connection
31
Convergerend
One of the parents is known. This doesnt
influence the other parent node (hoofdpijn)
One of the parents is known and the child node
antwoord is also known. In this case, a parent
influences the other parent node (hoofdpijn).
So, if no evidence is available about the child
node, then the parent nodes are independent,
othetwise not. It is said the parent nodes
are conditionally dependent of child nodes. This
is called d-separation.
Example
32
Diverging connection
33
Diverging
Child node geleerd does not influence the child
node hoofdpijn through parent node antwoord
if this is known (d-separated). (If antwoord
is not known, then node geleerd influences
node hoofdpijn).
The parent node antwoord influences The both
child nodes geleerd and hoofdpijn
Example
34
D-separation

Two nodes B en C in a Bayesians network are
d-separated if for all
Paths between B and C, there exists a node A in
between for which holds
The connection is serial or diverging and the
state of A is known
The connection is converging and the state of A
is not known.

35
Why d-separation is important

If we know that two variables are d-separated,
they can be treated as independent (at that
moments that evidence is available) and we dont
have to compute or use conditional probabilities.
So it speeds up computation.
I can also be used for modeling by creating
models that exploit such causal relations, for
example for the sake of distributedness.

36
Example learned for an exam
Problem A student has to learn for an exam.
What is the probability that he gives the
correct answer? Solution We start with the
probability that the student has learned the
material of 0.5. So P(Ltrue) 0.5 The student
can give a correct or a wrong answer A. So
P(Acorrect) of kortweg P(A). This probability
is conditionally dependent of whether the student
has learned the material.
37
The Bayesian network
P(A L)
In Netica (www.norsys.com)
The a-priori probabilities for P(A) given Learned
are P(Acorrect Ltrue) 0.9 P(Awrong
Ltrue) 0.1 P(Acorrect Lfalse) 0.4
P(Awrong Lfalse) 0.6
38
Computation P(Acorrect)
The probability P(Acorrect) is calculated
as P(Acorrect) P(AcorrectLtrue).
P(Ltrue) P(AcorrectLfalse). P(Lfalse)
0.90.5 0.40.5 0.65
39
Influence of evidence
Suppose that the student takes the exam and
produces a correct answer. This is information
that we can feed as evidence into the network.
With this evidence, what is the probability
that the student has learned the material, so
what is P(LtrueAcorrect) or P(LA)?
40
Evidence
P(A L)
P(L) 0.5 P(A) 0.65 P(A L) 0.9 And
so P(L A) 0.9 0.5 / 0.65 0.692
41
Example, more nodes
Suppose that we want to model the influence of a
headache H on the results. Then we can make a
(converging) network
42
Calculation P(Acorrect)
The probability of a correct answer is the sum
of 4 possibilities, Multiplied with the
conditional probabilty on a correct answer So
P(Acorrect) ?P(AcorrectL,H) . P(L) .
P(H) P(AcorrectLtrue,Htrue).
P(Ltrue).P(Htrue) 0.30.50.2 0.03
P(AcorrectLtrue,Hfalse). P(Ltrue).P(Hfalse
) 0.90.50.8 0.36 P(AcorrectLfalse,Ht
rue). P(Lfalse).P(Htrue) 0.050.50.2
0.005 P(AcorrectLfalse,Hfalse).
P(Lfalse).P(Hfalse) 0.40.50.8 0.16
Total 0.555
43
Explaining away
From the evidence that the car does not start,
the network calculates that most likely the
startmotor is broken (64.4 against 21.7 for
the battery being broken).
However, from new evidence that the lights
neither work, the network now calculates that the
probability that the battery is broken is
highest. Comparing variables in this manner and
drawing conclusions is called explaining away.
44
BNs are populair because

They can model events using semantics
Modeling is done with a graphical representation.
This makes interaction between the domain expert
(who makes the model) and the engineer (who makes
computation possible) very fruitful. Furthermore,
expertise from experts is easily understandable
in this manner.
They can calculate uncertainties of events taking
place using probabilities
The use of evidence is a very powerful method,
for processing, modelling and learning.
Best known popular application is the Help
Wizard in Microsoft Office.
BNs are historically used in (medical) diagnostic
systems
Lately, they are used in real-time reasoning
systems (processing speed allows this now)

45
(No Transcript)
46
Bayesian networks and Multi-agent Systems
47
The Rhino Cooperative Framework

Created by Michael van Wie, Univ. of Rochester,
NY
Used for robot soccer (RoboCup) to observe the
other robots in the field
Display of actions and observations are the only
type of communication

48
Assumptions in Rhino

Each agent may hold only one intention at any
given time
All agents have the same reasoning ability
When choosing actions, agents refer to the same
recipes (descriptions of how to carry out
plans)

49
Observations in Rhino

An observation (of an action) is made when an
action is recognized by an agent at an observed
teammate. A likelihood is returned by the
database storing all possible actions
Action recognition is therefore the process of
generating observations

50
First, define what an action is

An action is a tuple lta, t, F, Wgt
a is an action
t is a time interval during which the action a
must run
F is a list of preconditions (the agents goals)
that must be true before the action gets executed
W is a list of effects that will be true when the
action is completed

51
Definition of observation

An observation O is a tuple lta,vgt
a is the observed agent
v is a vector storing the probabilities of the
actions for each possible action (there are n of
those)

52
Single Agent Recipes (SARs)

A SAR is defining a set of possible actions to be
taken to execute a plan (for example scoring a
goal)
Example
Suppose that the belief set specifies that an
agent must have the ball in order to score
If agent B does not have the ball, then agent B
cannot execute the plan to score

53
Multi-Agent Recipes (MARs)

A MAR is a tuple (S, F)
S is the set of SARs
F is the list of preconditions at team level for
undertaking that MAR
For example, the team needs to have the ball to
start the scoring MAR

54
Action models / action recognition
Action recognition is the problem of inferring
goal-oriented intentions behind observed
movements So the agent needs a world
model Rhino uses Bayesian networks
55
Example a pass plan
After start, the agent only passes the ball to
another agent if it sees an open path, otherwise
it will dribble. The result can be OK or
error.
path open
OK
start
pass
ball intercepted
path open
path not open
ball stolen
error
dribble
path not open
In Rhino start, dribble, pass, etc. are called
atomic actions
56
Belief sensors
Sensor outputs are converted into probabilistic
beliefs
World
Believe (ball is nearby)
Believe (opponent has ball)
Believe (goal is open)
sensors
57
Belief example
The variable ball_distance has four states
ball_distance Adjacent Near Moderate Far
Suppose P(ball_distance) (0.1, 0.3, 0.5, 0.1)
58
Example sensor belief change(approaching ball)
Bayes network
Ball (t1)
Ball (t)
ball sensor
The output is a vector of probabilities for
P(ball_distance)
time
P(ball_distance) (0.25, 0.5, 0.2, 0.05)
(old P(ball_distance) (0.1, 0.3, 0.5, 0.1))
Now we calculate the ball speed
59
Other ExampleAtomic action recognition
60
Other example atomicaction recognition
CPT for ball_possesed by agentA
61
Combining atomic action recognitions into agents
actions
CPT for Agent_A_passed_the_ball
62
Bayesian Networks for MAS

The Rhino system uses action recognition,
exploiting Bayesian networks to generate belief
about the observed actions by the other agents
With those believes, plans are selected and
teamplay is organized (Robo-soccer)
In general, using belief while interacting with
other agents is a useful method in case there is
uncertainty about the received information

63
Bayesian Networks in Distributed Perception
Networks

Write a Comment

User Comments (0)