Student presentations (starting april 13th)

1 / 63
About This Presentation
Title:

Student presentations (starting april 13th)

Description:

Suppose also that P(White) and P(Male) are independent, ... Observations in Rhino ... The Rhino system uses action recognition, exploiting Bayesian networks to ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 64
Provided by: prei173

less

Transcript and Presenter's Notes

Title: Student presentations (starting april 13th)


1
  • Student presentations (starting april 13th)

2
Papers to select from
  1. Coalitional Games in Open Anonymous Environments,
    by M. Yokoo, V. Conitzer, T. Sandholm, N. Ohta,
    and A. Iwasaki. In Proc. AAAI 2005.
  2. A polynomial-time Nash equilibrium algorithm for
    repeated games, by M. L. Littman and P. Stone. In
    Proc. 2003 ACM Conference on Electronic Commerce
    (EC'03).
  3. Communication Complexity as a Lower Bound for
    Learning in Games, by V. Conitzer and T.
    Sandholm. In Proc. ICML 2004.
  4. Coordination in Multiagent Reinforcement
    Learning A Bayesian Approach, by G. Chalkiadakis
    and C. Boutilier. In Proc. AAMAS 2003.
  5. Distributed Implementations of Vickrey-Clarke-Grov
    es Mechanisms, by D. C. Parkes and J. Shneidman.
    In AAMAS 2004.
  6. If Multi-Agent Learning is the Answer, What is
    the Question? , by Y. Shoham, R. Powers and T.
    Grenager. In JAI 2006.
  7. Envy-Free Auctions for Digital Goods. by A.V.
    Goldberg and J.D. Hartline. In Proc. 2003 ACM
    Conference on Electronic Commerce (EC'03).
  8. Distributed Perception Networks An Architecture
    for Information Fusion Systems Based on Causal
    Probabilistic Models. G.Pavlin, P. de Oude, M.
    Maris, T. Hood. In proc. Int. conf. on
    multisensor fusion and integration for
    intelligent systems. Heidelberg, 2006.
  9. Improvement Continuous Valued Q-learning and its
    Application to Vision Guided Behavior
    Acquisition. Y. Takahashi, M. Takeda, and M.
    Asada. In proc. fourth int. workshop on Robocup,
    2000.
  10. Using the Max-Plus Algorithm for Multiagent
    Decision Making in Coordination Graphs. J. Kok
    and N. Vlassis. Robocup 2005 symposium (best
    paper award).

3
What to do?
  • 2 persons per paper
  • Presentation length 30 min ( 15 min discussion)
  • Not twice the same paper

4
  • Multiagent reinforcement learning

5
Multiagent reinforcement learning
  • We assume that each state s ? S is fully
    observable to all agents.
  • Each s ? S defines a local strategic game Gs with
    corresponding payoffs.
  • We also assume a stochastic transition model
    p(ss, a), where a is the joint action of the
    agents.
  • The task is to compute an optimal joint policy
    ?(s) ?i(s) that maximizes discounted future
    reward.
  • In cooperative agents the challenge is to
    guarantee that the individual optimal policies
    ?i(s) are coordinated.

6
Independent learning
  • One approach is to let each agent run Q-learning
    independently of the others.
  • In this case the other agents are treated as part
    of a dynamic environment and are not explicitly
    modeled.
  • Problem is that p(ss, ai) is in this case
    nonstationary (changes with time) because the
    other agents are also learning.
  • Convergence of Q-learning cannot be guaranteed
    anymore.
  • However the method has been used in practice with
    reported success.

7
Joint action learning
  • Better results can be obtained if the agents
    attempt to model each other.
  • Each agent maintains an action value function
    Q(i)(s, a) for all states and joint action pairs
    (i denotes agent i).
  • In this case Q-learning becomes
  • Q(i)(s, a) (1-?)Q(i)(s,a) ?R ? maxa
    Q(i) (s,a)
  • Issues to consider here
  • - Representation how to represent Q(i)(s, a).
  • - Optimization how to compute maxa Q(i)(s,
    a).
  • - Exploration how to choose exploration
    actions a.

8
Representing Q(i)(s, a)
  • The simplest choice is to use a tabular
    representation
  • Q(i)(s, a) is a matrix with as many entries as
    the pairs of states
  • s ? S and joint actions a ? xiAi.
  • Computing maxa Q(s, a) involves just a
    for-loop.
  • Alternatively, if many agents are involved, a
    coordination graph can be used. In this case we
    assume Q(s, a) Pj ?jQj(s, aj) where aj is the
    joint action of a subset of agents.
  • In this case maxa Q(s, a) can be computed with
    variable elimination.

9
Exploration in multiagent RL
  • We assume for simplicity that all agents receive
    exactly the same reward.
  • Then each agent can select an exploratory joint
    action a according to a Boltzmann distribution
    over joint actions.
  • This requires that each agent samples the same
    joint action!
  • Each agent runs Q-learning over joint actions
    identically and in parallel.
  • In this case, the whole multiagent system is
    effectively treated as a big' single agent.

10
(No Transcript)
11
Bayesian Networks
12
Example Bayesian network
Causal network
  • Directed Acyclic Graph (DAG)
  • with nodes A,B,C,D,E

13
Important relations
Range of probability P(A) 0 ? P(A) ? 1
Sum rule If mutually exclusive
Logic equivalent P(A ? B) P(B ? A)
Product rule
14
Conditional probability
Given event b, the probability of a equals x
Or P(ab) x
The joint probability van a ? b equals P(a,b)
P(ab) P(b)
15
Bayes rule
1. P(a,b) P(ab) P(b) Similarly 2. P(a,b)
P(ba) P(a) Combining 1. and 2. results in
Bayes rule P(ab) P(b) P(ba) P(a)
Or
16
Independence and joint probability
Somebody is White and Male Suppose P(White)
0.5 en P(Male) 0.4 Suppose also that P(White)
and P(Male) are independent, Then it holds that
P(WhiteMale) P(White) And the joint
probability P(White ? Male) P(WhiteMale)
P(Male) P(White) P(Male) 0.50.4 0.2
1
Somebody is Tall and Male Suppose P(Tall) 0.5
and P(Male) 0.4 Suppose also that P(Tall) and
P(Male) are dependent and that the conditional
probability of Tall given Male equals P(Tall
Male) 0.8 Now is the joint probability P(Tall
? Male) P(TallMale) P(Male) 0.80.4 0.32
2
17
It holds that P(White ? Male) P(Male ?
White) P(White Male) P(Male) P(Male
White) P (White) 0.5 0.4 P(Male White)
0.5 So P(Male White) 0.4

1
It holds that P(Tall ? Male) P(Male ?
Tall) P(Tall Male) P(Male) P(Male Tall)
P (Tall) 0.8 0.4 P(Male Tall) 0.5 So
P(Male Tall) 0.32 / 0.5 0.64
2
18
Symmetrie, the inverse fallacy
  • So dont confuse the probability that someone is
    Tall given Male P(TallMale) with the probability
    that someone is Male given Tall P(MakeTall) .

19
Definition of a Bayesian Network (BN)
  • A Bayesian network consist of
  • A set variables and a set directed connections
    between the variables
  • Every variable has a finite number of states
  • The variables form a directed acyclic graph (DAG)
  • Each variable A met parents B1, ,Bn has a
    conditional probability table P(A B1, .,Bn)

20
Variables and states
The nodes are variables with states. The states
are assigned probability values. The collection
of probability values for all states is called a
probability distribution of that variable
If A is a variable with states a1, a2
an, then P(A) is the probability distribution
over these states P(A) P(a1), P(a2)
P(an) SP(ai) 1
i
21
Example
Conditional probabilities
A a1,a2
P(BA)
B b1,b2,b3
Sum of rows 1
22
Compute the joint probability from the
conditional probabilities
Given the probability distribution of A P(A)
(0.4, 0.6)
P(BA)
It holds that P(bi, aj) P(biaj)P(aj)
And so P(B,A)
23
Calculate P(B) from P(B,A)
P(B) SP(B,A)
A
P(bi) SP(bi,aj)
j
And so P(B) (0.52, 0.18, 0.3)
This process is called Marginalisation
24
Calculate P(AB)
25
Evidence
  • There are two types of evidence
  • Hard evidence (instantiation). It is known that a
    node X is for sure in a particular state.
    Example a soccer match can be in three states
    win, loose, draw. After the game ended, the
    state is known.
  • Soft evidence. For a node X is known an
    indication that enables increasing the
    probability of a certain state. Example. If after
    the first half of a match, one team leads with 3
    to 0, the probability of that teams win state
    can be increased.

26
Other example
S stiff neck, M meningtitis What is the
probability that somebody has meningtitis, given
a stiff neck ? a-priori P(S) 0.05 P(M)
0.0002 P(SM) 0.9 (i.o.w. if somebody has
meningtitis, there is a high probability of
having a stiff neck)
(so, if someone has a stiff neck, there is a
small probability that he has meningtitis)
27
Visit to Asia
ASIA
28
Types of connections
In Bayesian networks, there are three types of
connections
  • Serial
  • Converging
  • Diverging

29
Serial connections
Evidence in node A influences both nodes B and C
If there is evidence in node A, evidence in node
B has no influence on node C
Example
30
Converging connection
31
Convergerend
One of the parents is known. This doesnt
influence the other parent node (hoofdpijn)
One of the parents is known and the child node
antwoord is also known. In this case, a parent
influences the other parent node (hoofdpijn).
So, if no evidence is available about the child
node, then the parent nodes are independent,
othetwise not. It is said the parent nodes
are conditionally dependent of child nodes. This
is called d-separation.
Example
32
Diverging connection
33
Diverging
Child node geleerd does not influence the child
node hoofdpijn through parent node antwoord
if this is known (d-separated). (If antwoord
is not known, then node geleerd influences
node hoofdpijn).
The parent node antwoord influences The both
child nodes geleerd and hoofdpijn
Example
34
D-separation
  • Two nodes B en C in a Bayesians network are
    d-separated if for all
  • Paths between B and C, there exists a node A in
    between for which holds
  • The connection is serial or diverging and the
    state of A is known
  • The connection is converging and the state of A
    is not known.

35
Why d-separation is important
  • If we know that two variables are d-separated,
    they can be treated as independent (at that
    moments that evidence is available) and we dont
    have to compute or use conditional probabilities.
    So it speeds up computation.
  • I can also be used for modeling by creating
    models that exploit such causal relations, for
    example for the sake of distributedness.

36
Example learned for an exam
Problem A student has to learn for an exam.
What is the probability that he gives the
correct answer? Solution We start with the
probability that the student has learned the
material of 0.5. So P(Ltrue) 0.5 The student
can give a correct or a wrong answer A. So
P(Acorrect) of kortweg P(A). This probability
is conditionally dependent of whether the student
has learned the material.
37
The Bayesian network
P(A L)
In Netica (www.norsys.com)
The a-priori probabilities for P(A) given Learned
are P(Acorrect Ltrue) 0.9 P(Awrong
Ltrue) 0.1 P(Acorrect Lfalse) 0.4
P(Awrong Lfalse) 0.6
38
Computation P(Acorrect)
The probability P(Acorrect) is calculated
as P(Acorrect) P(AcorrectLtrue).
P(Ltrue) P(AcorrectLfalse). P(Lfalse)
0.90.5 0.40.5 0.65
39
Influence of evidence
Suppose that the student takes the exam and
produces a correct answer. This is information
that we can feed as evidence into the network.
With this evidence, what is the probability
that the student has learned the material, so
what is P(LtrueAcorrect) or P(LA)?
40
Evidence
P(A L)
P(L) 0.5 P(A) 0.65 P(A L) 0.9 And
so P(L A) 0.9 0.5 / 0.65 0.692
41
Example, more nodes
Suppose that we want to model the influence of a
headache H on the results. Then we can make a
(converging) network
42
Calculation P(Acorrect)
The probability of a correct answer is the sum
of 4 possibilities, Multiplied with the
conditional probabilty on a correct answer So
P(Acorrect) ?P(AcorrectL,H) . P(L) .
P(H) P(AcorrectLtrue,Htrue).
P(Ltrue).P(Htrue) 0.30.50.2 0.03
P(AcorrectLtrue,Hfalse). P(Ltrue).P(Hfalse
) 0.90.50.8 0.36 P(AcorrectLfalse,Ht
rue). P(Lfalse).P(Htrue) 0.050.50.2
0.005 P(AcorrectLfalse,Hfalse).
P(Lfalse).P(Hfalse) 0.40.50.8 0.16
Total 0.555
43
Explaining away
From the evidence that the car does not start,
the network calculates that most likely the
startmotor is broken (64.4 against 21.7 for
the battery being broken).
However, from new evidence that the lights
neither work, the network now calculates that the
probability that the battery is broken is
highest. Comparing variables in this manner and
drawing conclusions is called explaining away.
44
BNs are populair because
  • They can model events using semantics
  • Modeling is done with a graphical representation.
    This makes interaction between the domain expert
    (who makes the model) and the engineer (who makes
    computation possible) very fruitful. Furthermore,
    expertise from experts is easily understandable
    in this manner.
  • They can calculate uncertainties of events taking
    place using probabilities
  • The use of evidence is a very powerful method,
    for processing, modelling and learning.
  • Best known popular application is the Help
    Wizard in Microsoft Office.
  • BNs are historically used in (medical) diagnostic
    systems
  • Lately, they are used in real-time reasoning
    systems (processing speed allows this now)

45
(No Transcript)
46
Bayesian networks and Multi-agent Systems
47
The Rhino Cooperative Framework
  • Created by Michael van Wie, Univ. of Rochester,
    NY
  • Used for robot soccer (RoboCup) to observe the
    other robots in the field
  • Display of actions and observations are the only
    type of communication

48
Assumptions in Rhino
  • Each agent may hold only one intention at any
    given time
  • All agents have the same reasoning ability
  • When choosing actions, agents refer to the same
    recipes (descriptions of how to carry out
    plans)

49
Observations in Rhino
  • An observation (of an action) is made when an
    action is recognized by an agent at an observed
    teammate. A likelihood is returned by the
    database storing all possible actions
  • Action recognition is therefore the process of
    generating observations

50
First, define what an action is
  • An action is a tuple lta, t, F, Wgt
  • a is an action
  • t is a time interval during which the action a
    must run
  • F is a list of preconditions (the agents goals)
    that must be true before the action gets executed
  • W is a list of effects that will be true when the
    action is completed

51
Definition of observation
  • An observation O is a tuple lta,vgt
  • a is the observed agent
  • v is a vector storing the probabilities of the
    actions for each possible action (there are n of
    those)

52
Single Agent Recipes (SARs)
  • A SAR is defining a set of possible actions to be
    taken to execute a plan (for example scoring a
    goal)
  • Example
  • Suppose that the belief set specifies that an
    agent must have the ball in order to score
  • If agent B does not have the ball, then agent B
    cannot execute the plan to score

53
Multi-Agent Recipes (MARs)
  • A MAR is a tuple (S, F)
  • S is the set of SARs
  • F is the list of preconditions at team level for
    undertaking that MAR
  • For example, the team needs to have the ball to
    start the scoring MAR

54
Action models / action recognition
Action recognition is the problem of inferring
goal-oriented intentions behind observed
movements So the agent needs a world
model Rhino uses Bayesian networks
55
Example a pass plan
After start, the agent only passes the ball to
another agent if it sees an open path, otherwise
it will dribble. The result can be OK or
error.
path open
OK
start
pass
ball intercepted
path open
path not open
ball stolen
error
dribble
path not open
In Rhino start, dribble, pass, etc. are called
atomic actions
56
Belief sensors
Sensor outputs are converted into probabilistic
beliefs
World
Believe (ball is nearby)
Believe (opponent has ball)
Believe (goal is open)
sensors
57
Belief example
The variable ball_distance has four states
ball_distance Adjacent Near Moderate Far
Suppose P(ball_distance) (0.1, 0.3, 0.5, 0.1)
58
Example sensor belief change(approaching ball)
Bayes network
Ball (t1)
Ball (t)
ball sensor
The output is a vector of probabilities for
P(ball_distance)
time
P(ball_distance) (0.25, 0.5, 0.2, 0.05)
(old P(ball_distance) (0.1, 0.3, 0.5, 0.1))
Now we calculate the ball speed
59
Other ExampleAtomic action recognition
60
Other example atomicaction recognition
CPT for ball_possesed by agentA
61
Combining atomic action recognitions into agents
actions
CPT for Agent_A_passed_the_ball
62
Bayesian Networks for MAS
  • The Rhino system uses action recognition,
    exploiting Bayesian networks to generate belief
    about the observed actions by the other agents
  • With those believes, plans are selected and
    teamplay is organized (Robo-soccer)
  • In general, using belief while interacting with
    other agents is a useful method in case there is
    uncertainty about the received information

63
Bayesian Networks in Distributed Perception
Networks
Write a Comment
User Comments (0)