Markov Decision Processes (MDPs)

About This Presentation
Title:

Markov Decision Processes (MDPs)

Description:

Markov Decision Processes (MDPs) read Ch 17.1-17.2 utility-based agents goals encoded in utility function U(s), or U:S effects of actions encoded in state transition ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Markov Decision Processes (MDPs)


1
Markov Decision Processes (MDPs)
  • read Ch 17.1-17.2
  • utility-based agents
  • goals encoded in utility function U(s), or US ??
  • effects of actions encoded in state transition
    function TSxA?S
  • or TSxA?pdf(S) for non-deterministic
  • rewards/costs encoded in reward function RSxA??
  • Markov property effects of actions only depend
    on current state, not previous history

2
  • the goal maximize reward over time
  • long-term discounted reward
  • handles infinite horizon encourages quicker
    achievement
  • plans are encoded in policies
  • mappings from states to actions pS?A
  • how to compute optimal policy p that maximizes
    long-term discounted reward?

3
(No Transcript)
4
  • value function Vp(s) expected long-term reward
    from starting in state s and following policy p
  • derive policy from V(s)
  • p(s)maxa?A ER(s,a)gV(T(s,p(s)))
  • max S p(ss,a)(RgV(s))
  • optimal policy comes from optimal value function
    p(s) max S p(ss,a)V(s)


5
Calculating V(s)
  • Bellmans equations
  • (eqn 17.5)
  • method 1 linear programming
  • n coupled linear equations
  • v1 max(v2,v3,v4...)
  • v2 max(v1,v3,v4...)
  • v3 max(v1,v2,v4...)
  • solve for v1,v2,v3... using Gnu LP kit, etc.

6
  • method 2 Value Iteration
  • initialize V(s)0 for all states
  • iteratively update value of each state based on
    neighbors
  • ...until convergence
Write a Comment
User Comments (0)