Satisfaction Equilibrium - PowerPoint PPT Presentation

About This Presentation
Title:

Satisfaction Equilibrium

Description:

Satisfaction Equilibrium. St phane Ross. Canadian AI 2006. 2 / 21. Problem ... Agents generally do not know the preferences (rewards) of their opponents ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 22
Provided by: stphan5
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Satisfaction Equilibrium


1
Satisfaction Equilibrium
  • Stéphane Ross

2
Problem
  • In real life multiagent systems
  • Agents generally do not know the preferences
    (rewards) of their opponents
  • Agents may not observe the actions of their
    opponents
  • In this context, most game theoretic solution
    concepts are hardly applicable
  • We may try to define equilibrium concepts that
  • do not require complete information
  • are achievable through learning, over repeated
    play

3
Plan
  • Game model
  • Satisfaction Equilibrium
  • Satisfaction Equilibrium Learning
  • Results
  • Conclusion
  • Questions

4
Presentation Plan
  • Game model
  • Satisfaction Equilibrium
  • Satisfaction Equilibrium Learning
  • Results
  • Conclusion
  • Questions

5
Game model
  • Number of agents
  • Joint action
    space
  • Set of possible outcomes
  • , the outcome function.
  • , agent is reward function.
  • Agent only knows , and .
  • After each turn, every agent observes an outcome
    .

6
Game model
  • Observations
  • The agents do not know the game matrix
  • They are unable to compute best responses and
    Nash Equilibrium.
  • They can only reason on their history of actions
    and rewards.

a,b,c,d
7
Presentation Plan
  • Game model
  • Satisfaction Equilibrium
  • Satisfaction Equilibrium Learning
  • Results
  • Conclusion
  • Questions

8
Satisfaction Equilibrium
  • Since the agents can only reason on their history
    of payoff, we may adopt a satisfaction-based
    reasoning
  • If an agent is satisfied by its current reward,
    it should keep playing the same strategy
  • An unsatisfied agent may decide to change its
    strategy according to some exploration function
  • An equilibrium will arise when all agents are
    satisfied.

9
Satisfaction Equilibrium
  • Formally
  • is the satisfaction
    function of agent
  • if (agent i is
    satisfied)
  • if (agent i is not
    satisfied)
  • is the satisfaction threshold of agent
  • A joint strategy is a satisfaction equilibrium
    if

10
Example
  • Prisoners dilemma
  • Possible satisfaction matrix

Dominant strategy D Nash Equilibrium
(D,D) Pareto-Optimal (C,C), (D,C), (C,D)
11
Satisfaction Equilibrium
  • However, even if a satisfaction equilibrium
    exists, it may be unreachable

12
Presentation Plan
  • Game model
  • Satisfaction Equilibrium
  • Satisfaction Equilibrium Learning
  • Results
  • Conclusion
  • Questions

13
Satisfaction Equilibrium Learning
  • If the satisfaction thresholds are fixed, we only
    need to apply the satisfaction-based reasoning
  • Choose a strategy randomly
  • If satisfied, keep playing the same strategy
  • Else choose a new strategy randomly
  • We can also use other exploration functions which
    favour actions that have not been explored often
  • Ex

14
Satisfaction Equilibrium Learning
  • We use a simple update rule
  • When the agent is satisfied, we increment its
    satisfaction threshold by some variable
  • If the agent is unsatisfied, we decrement its
    satisfaction threshold of
  • is multiplied by a factor each
    turn such that it converges to 0
  • We also use a limited history of our previous
    satisfaction states and thresholds for each
    action to bound the value of the satisfaction
    threshold

15
Presentation Plan
  • Game model
  • Satisfaction Equilibrium
  • Satisfaction Equilibrium Learning
  • Results
  • Conclusion
  • Questions

16
Results
  • Fixed satisfaction thresholds
  • In simple games, we were always able to reach a
    satisfaction equilibrium.
  • Using a biased exploration improves the speed of
    convergence of the algorithm.
  • Learning the satisfaction thresholds
  • We are generally able to learn the optimal
    satisfaction equilibrium in simple games.
  • Using a biased exploration improves the
    convergence percentage of the algorithm.
  • The factor and history size affects the
    convergence of the algorithm and need to be
    adjusted to get optimal results.

17
Results Prisoners dilemma
18
Presentation Plan
  • Game model
  • Satisfaction Equilibrium
  • Satisfaction Equilibrium Learning
  • Results
  • Conclusion
  • Questions

19
Conclusion
  • It is possible to learn stable outcomes without
    observing anything but our own rewards
  • Satisfaction equilibria can be defined on any
    Pareto-Optimal solution.
  • However, satisfaction equilibria are not always
    reachable
  • The proposed learning algorithms achieves good
    performance in simple games
  • However, they require game-specific adjustments
    for optimal performance

20
Conclusion
  • For more information, you can consult my
    publications at
  • http//www.damas.ift.ulaval.ca/ross
  • Thank You!

21
Questions
  • ?
Write a Comment
User Comments (0)
About PowerShow.com