On learning Tetris and the use of Afterstates

About This Presentation

Title:

On learning Tetris and the use of Afterstates

Description:

On learning Tetris and the use of Afterstates. AKA: It didn't ... No problem for Tetris. also done by human player. Exercise. Calculate Next ... 'Tetris can ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 26

Provided by: kurtdri

Category:

more less

Transcript and Presenter's Notes

Title: On learning Tetris and the use of Afterstates

1
On learning Tetris and the use of Afterstates

AKA It didnt work this time (...yet)
Kurt Driessens
(in co-op with Jan Ramon)

2
It is now safe to turn of your brain.
3
Tetris
Reward number of deleted lines
4
Q-learning

For each
calculate/predict

Q-value
5
State Action features

Height of wall (max, avg, min)
Number of Holes
Height difference adjacent cols
...
Fits, Increasesheight, ...
Number of deleted lines
Blockwidth
Blockheight
...

6
Q(State,Action) in Tetris
Q1
Q2
Q1Q22
7
Afterstates

Partial Model
direct consequence of 1 action
Allows computation of next state (partially)
calculation of reward
No problem for Tetris
also done by human player

8
Exercise

Calculate Next State and Reward for

9
How to use Afterstates

Backup for Regular Q-learning

Q
max
s'
Q
Q?
Q
Q(s,a) r(s,a) ? maxaQ(s,a)
10
How to use Afterstates

Back-up using Afterstates

s1
rV
max
s2
s
rV
V?
s3
rV
V(s) maxar(a,s) ?V(?(s,a))
11
How to use Afterstates

Policy generation in Q-learning

Q
max
s
Q
Q
? argmaxaQ(s,a)
12
How to use Afterstates

Policy generation using Afterstates

s1
rV
max
s2
s
rV
s3
rV
? argmaxar(s,a)V(? (s,a))
13
Exercise
Estimate V(s) of
14
RRL-TG

Language
Maxheight, minheight
Canyons of width 1 and 2 (number of)
Info possibilities with next block (fits,
canScore, )
Difference in height of adjacent columns

15
RRL-RIB

Height of columns
Difference in height
Maxheight
Inner product for kernel
Distance k(x,x)-2k(x,y)k(y,y)

16
Results
Censored because of Awfullness
17
Some Preliminary Results
18
Some Preliminary Results
19
Some Preliminary Results
20
Some Preliminary Results
21
Some Preliminary Results
22
What is going wrong?

I dont know
I would have solved it otherwise I still hope
it is a bug ...
Premiss Tetris can be broken
From a certain number of deleted rows, 1000 extra
lines is just luck.
Good policy and very good policy is there a
difference?

23
Related Work

1000-3000 lines per game
Hand built strategy of 600 lines/game
Approximate Policy Iteration

24
How to test?

Handconstruct a good policy
See if it is very good sometimes
Handcoded policies so far
1. Deletes - 67 lines (69, 67, 67, )
2. Deletes - 75 lines
TG does not perform well with same features (in
first try)

On learning Tetris and the use of Afterstates - PowerPoint PPT Presentation

On learning Tetris and the use of Afterstates

On learning Tetris and the use of Afterstates. AKA: It didn't ... No problem for Tetris. also done by human player. Exercise. Calculate Next ... 'Tetris can ... – PowerPoint PPT presentation