Quiz 5: Expectimax/Prob review - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Quiz 5: Expectimax/Prob review

Description:

The UNC major with the highest average salary is Geology ... What do these actions have in common? smoking 1.4 cigarettes drinking 0.5 liter of wine spending 1 ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 41
Provided by: Preferr1315
Category:

less

Transcript and Presenter's Notes

Title: Quiz 5: Expectimax/Prob review


1
Quiz 5 Expectimax/Prob review
  • Expectimax assumes the worst case scenario. False
  • Expectimax assumes the most likely scenario.
    False
  • P(X,Y) is the conditional probability of X given
    Y. False
  • P(X,Y,Z)P(ZX,Y)P(XY)P(Y). True
  • Bayes Rule computes P(YX) from P(XY) and P(Y).
    True
  • Bayes Rule only applies in Bayesian Statistics.
    False
  • FALSE

2
CSE511a Artificial IntelligenceSpring 2013
  • Lecture 8 Maximum Expected Utility
  • 02/11/2013

Robert Pless, course adopted from one given by
Kilian Weinberger, many slides over the course
adapted from either Dan Klein, Stuart Russell or
Andrew Moore
3
Announcements
  • Project 2 Multi-agent Search is out (due 02/28)
  • Midterm, in class, March 6 (wed. before spring
    break)
  • Homework (really, exam review) will be out soon.
    Due before exam.

4
(No Transcript)
5
  • Prediction of voting outcomes.
  • solve for P(Vote Polls, historical trends)
  • Creates models for
  • P( Polls Vote )
  • Or P(Polls voter preferences) assumption
    preference ? vote
  • Keys
  • Understanding house effects, such as bias caused
    by only calling land-lines and other choices made
    in creating sample.
  • Partly data driven --- house effects are inferred
    from errors in past polls

6
From PECOTA to politics, 2008
Missed Indiana 1 vote in nebraska. 35/35 on
senate races.
7
2012
8
Utilities
  • Utilities are functions from outcomes (states of
    the world) to real numbers that describe an
    agents preferences
  • Where do utilities come from?
  • In a game, may be simple (1/-1)
  • Utilities summarize the agents goals
  • Theorem any set of preferences between outcomes
    can be summarized as a utility function (provided
    the preferences meet certain conditions)
  • In general, we hard-wire utilities and let
    actions emerge

9
Expectimax Search
  • Chance nodes
  • Chance nodes are like min nodes, except the
    outcome is uncertain
  • Calculate expected utilities
  • Chance nodes average successor values (weighted)
  • Each chance node has a probability distribution
    over its outcomes (called a model)
  • For now, assume were given the model
  • Utilities for terminal states
  • Static evaluation functions give us limited-depth
    search

Estimate of true expectimax value (which would
require a lot of work to compute)

400
300

492
362

10
Quiz Expectimax Quantities
8
8
3
7
Uniform Ghost
3
12
9
3
0
6
15
6
0
11
Expectimax Pruning?
Uniform Ghost
3
12
9
3
0
6
15
6
0
Only if utilities are strictly bounded.
12
Minimax Evaluation
  • Evaluation functions quickly return an estimate
    for a nodes true value
  • For minimax, evaluation function scale doesnt
    matter
  • We just want better states to have higher
    evaluations (get the ordering right)
  • We call this insensitivity to monotonic
    transformations

13
Expectimax Evaluation
  • For expectimax, we need relative magnitudes to be
    meaningful

x2
Expectimax behavior is invariant under positive
linear transformation
14
Expectimax for Pacman
demo world assumptions
Results from playing 5 games
Minimizing Ghost Random Ghost
Minimax Pacman
Expectimax Pacman
Pacman uses depth 4 search with an eval function
that avoids troubleGhost uses depth 2 search
with an eval function that seeks Pacman
15
Expectimax for Pacman
demo world assumptions
Results from playing 5 games
Minimizing Ghost Random Ghost
Minimax Pacman Won 5/5 Avg. Score 493 Won 5/5 Avg. Score 483
Expectimax Pacman Won 1/5 Avg. Score -303 Won 5/5 Avg. Score 503
Pacman used depth 4 search with an eval function
that avoids troubleGhost used depth 2 search
with an eval function that seeks Pacman
16
Expectimax Search
Having a probabilistic belief about an agents
action does not mean that agent is flipping any
coins!
Arthur C. Clarke Any sufficiently advanced
technology is indistinguishable from magic.
17
Expectimax Pseudocode
  • def value(s)
  • if s is a max node return maxValue(s)
  • if s is an exp node return expValue(s)
  • if s is a terminal node return evaluation(s)
  • def maxValue(s)
  • values value(s) for s in successors(s)
  • return max(values)
  • def expValue(s)
  • values value(s) for s in successors(s)
  • weights probability(s, s) for s in
    successors(s)
  • return expectation(values, weights)

18
Expectimax for Pacman
  • Notice that weve gotten away from thinking that
    the ghosts are trying to minimize pacmans score
  • Instead, they are now a part of the environment
  • Pacman has a belief (distribution) over how they
    will act
  • Quiz Can we see minimax as a special case of
    expectimax?
  • Quiz what would pacmans computation look like
    if we assumed that the ghosts were doing 1-ply
    minimax and taking the result 80 of the time,
    otherwise moving randomly?
  • If you take this further, you end up calculating
    belief distributions over your opponents belief
    distributions over your belief distributions,
    etc
  • Can get unmanageable very quickly!

19
Mixed Layer Types
  • E.g. Backgammon
  • Expectiminimax
  • Environment is an extra player that moves after
    each agent
  • Chance nodes take expectations, otherwise like
    minimax

ExpectiMinimax-Value(state)
20
Maximum Expected Utility
  • Principle of maximum expected utility
  • A rational agent should chose the action which
    maximizes its expected utility, given its
    knowledge
  • Questions
  • Where do utilities come from?
  • How do we know such utilities even exist?
  • Why are we taking expectations of utilities (not,
    e.g. minimax)?
  • What if our behavior cant be described by
    utilities?

21
Utility and Decision Theory
22
Utilities Unknown Outcomes
Going to airport from home
Take surface streets
Take freeway
Clear, 10 min
Traffic, 50 min
Clear, 20 min
Arrive early
Arrive a little late
Arrive very late
23
Preferences
  • An agent chooses among
  • Prizes A, B, etc.
  • Lotteries situations with uncertain prizes
  • Notation

24
Rational Preferences
  • Transitivity We want some constraints on
    preferences before we call them rational
  • For example an agent with intransitive
    preferences can be induced to give away all of
    its money
  • If B gt C, then an agent with C would pay (say) 1
    cent to get B
  • If A gt B, then an agent with B would pay (say) 1
    cent to get A
  • If C gt A, then an agent with A would pay (say) 1
    cent to get C

25
Rational Preferences
  • Preferences of a rational agent must obey
    constraints.
  • The axioms of rationality
  • Theorem Rational preferences imply behavior
    describable as maximization of expected utility

26
MEU Principle
  • Theorem
  • Ramsey, 1931 von Neumann Morgenstern, 1944
  • Given any preferences satisfying these
    constraints, there exists a real-valued function
    U such that
  • Maximum expected utility (MEU) principle
  • Choose the action that maximizes expected utility
  • Note an agent can be entirely rational
    (consistent with MEU) without ever representing
    or manipulating utilities and probabilities
  • E.g., a lookup table for perfect tictactoe,
    reflex vacuum cleaner

27
Utility Scales
  • Normalized utilities u 1.0, u- 0.0
  • Micromorts one-millionth chance of death, useful
    for paying to reduce product risks, etc.
  • QALYs quality-adjusted life years, useful for
    medical decisions involving substantial risk

28
Q What do these actions have in common?
  • smoking 1.4 cigarettes
  • drinking 0.5 liter of wine
  • spending 1 hour in a coal mine
  • spending 3 hours in a coal mine
  • living 2 days in New York or Boston
  • living 2 months in Denver
  • living 2 months with a smoker
  • living 15 years within 20 miles (32 km) of a
    nuclear power plant
  • drinking Miami water for 1 year
  • eating 100 charcoal-broiled steaks
  • eating 40 tablespoons of peanut butter
  • travelling 6 minutes by canoe
  • travelling 10 miles (16 km) by bicycle
  • travelling 230 miles (370 km) by car
  • travelling 6000 miles (9656 km) by train
  • flying 1000 miles (1609 km) by jet
  • flying 6000 miles (1609 km) by jet
  • one chest X ray in a good hospital
  • 1 ecstasy tablet

Source Wikipedia
29
Human Utilities
  • Utilities map states to real numbers. Which
    numbers?
  • Standard approach to assessment of human
    utilities
  • Compare a state A to a standard lottery Lp
    between
  • best possible prize u with probability p
  • worst possible catastrophe u- with probability
    1-p
  • Adjust lottery probability p until A Lp
  • Resulting p is a utility in 0,1

Pay 50
30
Money
  • Money does not behave as a utility function, but
    we can talk about the utility of having money (or
    being in debt)
  • Given a lottery L p, X (1-p), Y
  • The expected monetary value EMV(L) is pX
    (1-p)Y
  • U(L) pU(X) (1-p)U(Y)
  • Typically, U(L) lt U( EMV(L) ) why?
  • In this sense, people are risk-averse
  • When deep in debt, we are risk-prone
  • Utility curve for what probability p
  • am I indifferent between
  • Some sure outcome x
  • A lottery p,M (1-p),0, M large

31
Money
  • Money does not behave as a utility function
  • Given a lottery L
  • Define expected monetary value EMV(L)
  • Usually U(L) lt U(EMV(L))
  • I.e., people are risk-averse
  • Utility curve for what probability p
  • am I indifferent between
  • A prize x
  • A lottery p,M (1-p),0 for large M?
  • Typical empirical data, extrapolated
  • with risk-prone behavior

32
Example Insurance
  • Consider the lottery 0.5,1000 0.5,0
  • What is its expected monetary value? (500)
  • What is its certainty equivalent?
  • Monetary value acceptable in lieu of lottery
  • 400 for most people
  • Difference of 100 is the insurance premium
  • Theres an insurance industry because people will
    pay to reduce their risk
  • If everyone were risk-neutral, no insurance
    needed!

33
Example Insurance
  • Because people ascribe different utilities to
    different amounts of money, insurance agreements
    can increase both parties expected utility

You own a car. Your lottery LY 0.8, 0
0.2, -200i.e., 20 chance of crashing You do
not want -200! UY(LY) 0.2UY(-200)
-200 UY(-50) -150
Amount Your Utility UY
0 0
-50 -150
-200 -1000
34
Example Insurance
  • Because people ascribe different utilities to
    different amounts of money, insurance agreements
    can increase both parties expected utility

You own a car. Your lottery LY 0.8, 0
0.2, -200i.e., 20 chance of crashing You do
not want -200! UY(LY) 0.2UY(-200)
-200 UY(-50) -150
Insurance company buys risk LI 0.8, 50
0.2, -150i.e., 50 revenue your LY Insurer
is risk-neutral U(L)U(EMV(L)) UI(LI)
U(0.850 0.2(-150)) U(10) gt
U(0)
35
Example Human Rationality?
  • Famous example of Allais (1953)
  • A 0.8,4k 0.2,0
  • B 1.0,3k 0.0,0
  • C 0.2,4k 0.8,0
  • D 0.25,3k 0.75,0
  • Most people prefer B gt A, C gt D
  • But if U(0) 0, then
  • B gt A ? U(3k) gt 0.8 U(4k)
  • C gt D ? 0.8 U(4k) gt U(3k)

36
And now, for something completely different.
  • well actually rather related

37
Digression Simulated Annealing
38
Example Deciphering
Intercepted message to prisoner in California
state prison
Persi Diaconis
39
Search Problem
  • Initial State Assign letters arbitrarily to
    symbols

40
Modified Search Problem
  • Initial State Assign letters arbitrarily to
    symbols
  • Action Swap the assignment of two symbols
  • Terminal State? Utility Function?

A D
D A
41
Utility Function
  • Utility Likelihood of randomly generating this
    exact text.
  • Estimate probabilities of any character following
    any other. (Bigram model)
  • P(ad)
  • Learn model from arbitrary English document

42
Hill Climbing Diagram
43
Simulated Annealing
  • Idea Escape local maxima by allowing downhill
    moves
  • But make them rarer as time goes on

44
(No Transcript)
45
(No Transcript)
46
From http//math.uchicago.edu/shmuel/Network-co
urse-readings/MCMCRev.pdf
Write a Comment
User Comments (0)
About PowerShow.com