On learning Tetris and the use of Afterstates - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

On learning Tetris and the use of Afterstates

Description:

On learning Tetris and the use of Afterstates. AKA: It didn't ... No problem for Tetris. also done by human player. Exercise. Calculate Next ... 'Tetris can ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 26
Provided by: kurtdri
Category:

less

Transcript and Presenter's Notes

Title: On learning Tetris and the use of Afterstates


1
On learning Tetris and the use of Afterstates
  • AKA It didnt work this time (...yet)
  • Kurt Driessens
  • (in co-op with Jan Ramon)

2
It is now safe to turn of your brain.
3
Tetris
Reward number of deleted lines
4
Q-learning
  • For each
  • calculate/predict

Q-value
5
State Action features
  • Height of wall (max, avg, min)
  • Number of Holes
  • Height difference adjacent cols
  • ...
  • Fits, Increasesheight, ...
  • Number of deleted lines
  • Blockwidth
  • Blockheight
  • ...

6
Q(State,Action) in Tetris
Q1
Q2
Q1Q22
7
Afterstates
  • Partial Model
  • direct consequence of 1 action
  • Allows computation of next state (partially)
  • calculation of reward
  • No problem for Tetris
  • also done by human player

8
Exercise
  • Calculate Next State and Reward for

9
How to use Afterstates
  • Backup for Regular Q-learning

Q
max
s'
Q
Q?
Q
Q(s,a) r(s,a) ? maxaQ(s,a)
10
How to use Afterstates
  • Back-up using Afterstates

s1
rV
max
s2
s
rV
V?
s3
rV
V(s) maxar(a,s) ?V(?(s,a))
11
How to use Afterstates
  • Policy generation in Q-learning

Q
max
s
Q
Q
? argmaxaQ(s,a)
12
How to use Afterstates
  • Policy generation using Afterstates

s1
rV
max
s2
s
rV
s3
rV
? argmaxar(s,a)V(? (s,a))
13
Exercise
Estimate V(s) of
14
RRL-TG
  • Language
  • Maxheight, minheight
  • Canyons of width 1 and 2 (number of)
  • Info possibilities with next block (fits,
    canScore, )
  • Difference in height of adjacent columns

15
RRL-RIB
  • Height of columns
  • Difference in height
  • Maxheight
  • Inner product for kernel
  • Distance k(x,x)-2k(x,y)k(y,y)

16
Results
Censored because of Awfullness
17
Some Preliminary Results
18
Some Preliminary Results
19
Some Preliminary Results
20
Some Preliminary Results
21
Some Preliminary Results
22
What is going wrong?
  • I dont know
  • I would have solved it otherwise I still hope
    it is a bug ...
  • Premiss Tetris can be broken
  • From a certain number of deleted rows, 1000 extra
    lines is just luck.
  • Good policy and very good policy is there a
    difference?

23
Related Work
  • 1000-3000 lines per game
  • Hand built strategy of 600 lines/game
  • Approximate Policy Iteration

24
How to test?
  • Handconstruct a good policy
  • See if it is very good sometimes
  • Handcoded policies so far
  • 1. Deletes - 67 lines (69, 67, 67, )
  • 2. Deletes - 75 lines
  • TG does not perform well with same features (in
    first try)

25
What now?
  • Kernel Based Regression?
  • Better distance?
  • Better language for TG?
  • Open for suggestions within certain limits
    -)
Write a Comment
User Comments (0)
About PowerShow.com