Title: Design Review
1Design Review
- Scooby Doo gang
- Jonathan Hsieh
- Annie Pettengill
- Jim Hollifield
- Jeff Barbieri
- Matt Silverstein
2Design Goals
- Mystery Machine Requirements
- Correctness / Proficiency
- Compliance to external interface / protocol
- support an interface for human to play
- Asynchronous Decide now button
3Other Goals
- Priorities
- Speedup over pure baseline software 68HC11
based-implementation. - Hardware functional units
- software optimization
- Interesting architecture
- Opening / closing book (download/upload in play
configurations) - Hardware HCI. (software hci optional)
4Function call dependance
Main
Think
Search
Queisce
Search
Init
Quiesce
In_check
Gen
Gen
In_check
For
eval
Gen
Gen_caps
Sort_pv
Think
Sort_pv
For
MakeMove
For
Gen
MakeMove
MakeMove
Search
MakeMove
Quiesce
Takeback
Gen
Takeback
5More Call depedances
In_check
eval
Make_move
For
Attack
In_check
Eval_light_pawn
Can_castle
Gen
Eval_light_king
In_check
Gen_push
Takeback
Eval dark pawn
Eval dark king
Gen_push
Gen_promote
6Read / Write access analysis
- Eval
- there are no writes from the board structure (but
many reads). - In_check / attack
- many reads. Returns a boolean, could be a array
of bits to lookup to see if being attacked - Gen
- generates list values.
- Variable times
7Quantify parameters
- A program on sun machines
- Compiles code with special hooks
- graphically displays call info and run time info
for profiling programs. - The idea -- Amdahls law -- speed up the slowest
parts get most speedup - slowest parts move to hardware!
8Quantify Results
- After doing about 20 moves these functions take
the most time (not including print and scanf.
9Summed run-time analysis
- In_check -gt attack
- 55 of program run time!
- Straight forward for hardware
- Eval -gt eval_
- 25 program run time!
- Straight forward for hardware.
- Gen
- 15 of program!
10Conclusion
- Optimize in_check, eval, and gen by placing in
hardware - This is most effective if board in FPGA
registers. -- try to figure out if possible to
use FPGA as memory for processor. - Keep recursion on processor.
11Hardware software partition
Serial interface Can access anything memory cpu
can in simulation
SW / CPU Memory structure allow for recursion /
dynamic structures Compiler can handle
that Recursion cannot really happen parallely
(?) Should be able to access RAM as well as FPGA
registers using
Memory Move histories Recursion stacks
FPGA Many parallel executions happening High
speed custom implementations Good for static
structures and constants Simple for read only
functions if things read in registers. (always
execute!)
12Implementation plan
13Implementation Plan
- Design hierarchy
- HW/SW split
- HW subsystems / goals
- SW goals
- Physical design
- HW/SW interface
- Memory access architecture
- FPGA/HC11/mem interaction.
14Implementation Plan
- Handle all recursion on hc11 -- compiler and
assembler code best for memory structures (trees,
hash tables, etc.) - Software analysis shows that 3 function trees
in_check/attack, eval, and gen take the majority
of the algorithms time.
15Shared memory architecture
FPGA
HC11
Psuedo clock clk
Clk
clk
Memory (Shared)
16Architecture features
- Observation
- Memory (12 nsgt 83 Mhz) is as fast as max FPGA
speed(100 Mhz). - about 10x faster than 8Mhz HC11.
- Zoinks!
- Clock set at high FPGA clock speed
- HC11 clock psuedo clock. a function in the
FPGA -- slows the FPGA clock to something in HC11
range.
17Clocking Diagram
FPGA clk
FPGA clk x2
FPGA clk x4 (HC11 psuedo clk)
Counter for clk
000
001
010
011
100
101
110
111
18Clocking Diagram
FPGA clk
FPGA Mem read / Write
HC11 Mem read / Write
FPGA Mem read / Write
FPGA Mem read / Write
FPGA Mem read / Write
FPGA Mem read / Write
FPGA clk x8 (HC11 psuedo clk)
FPGA mem
HC11 mem
FPGA mem
19Psuedoclock possiblities
- Could allow for Processor to access memory as if
it were the only thing using it. - While the Processor is waiting for next clock
tick, and done with memory, FPGA can R/W memory.
- FPGA can run and calculate information
concurrently with the HC11!
20FPGA Hardware Units
HC11
Psuedo Clk
Memory Bus Controller
Mem
Chess Board Registers
Chess Piece Registers
HCI
Eval Unit
Attack/Check Unit
Gen Unit
21FPGA/Memory organization
- Specific addressses would contain specific
information all the time. - Board representation address
- current eval score
- in check map
- next generated moves
- Addresses can be proxied by fpga so that fpga
registers acts like memory to HC11!
22Performance prediction
- Baseline 1
- Best case based on profiling (assuming hyper
idealized HW) - 55 gt 0 attack
- 25 gt 0 eval.
- 15gt 0 gen.
- HW accelerated gt 0.05 baseline!
- 20x speedup.
23Attack/in_check
24In_check
- Input is the color of the side to check if it is
in check - Outputs true if in check, otherwise outputs false
25in_check
- In_checks looks at each of the 64 squares for the
king of the color passed in to the function - It then calls attack on that square and color
- If we used a pieces implementation (versus board)
this would change a for loop and if statement
into a single call of the attack function
26In_check
Board Implementation
Piece Implementation
64 times..
27Attack
- Inputs the square the piece is on and the color
of the other side - Outputs true if the square is being attacked by
the color s and false if it is not
28Details about Attack
- The pawn is looked at separately because the way
it moves is different from the way it attacks - The moves as organized now are different for
black and white pawns - The different pieces are evaluated for every
direction-to see whether they can actually move
there and whether they can slide
29Bigger Picture Attack Tables
- Construct two chess boards one for white pieces
and one for black - Instead of a piece, each square would contain a
true or false depending on whether the square was
being attacked for that color piece
30So.
- Every time a player makes a move, the attack
function on the fpga, rebuilds the table - Can tailor the attack function for specific
pieces in specific squares using a combination of
board and piece implementation
31Implementation of Attack Tables
32Advantages
- Avoid lots of useless searching, you know exactly
where each piece is with the piece implementation - If running attack on one square, why not on 128
squares in parallel? or perhaps use a piece
implementation for of attack table and only run
it on 32 squares.. - Acts as a lookup table for other functions
33Parameter versus Internal
- Use special tailoring for parameter squares that
only check for pertinent cases use a special
numbering scheme for the location of pieces in
piece implementation - Internal squares stay the same
34Eval
35Eval() Function Inputs
- What side the current move is for
- Light or Dark
- Board Configuration
- Algorithm uses two 64 space arrays to represent
the board - Which pieces are where (piece64 structure)
- What color the pieces are (color64 structure)
- The hardware overhead of can be cut by using an
array that is indexed by piece, not but position
36Eval() Function Outputs
- Score
- an integer value
- based the present configuration of the board
- Calibrated for if the current player is Light or
Dark
37Eval
- The function call breaks down into three main
subsections - Initialization
- Takes the current board configuration and sets
all of the internal registers to an appropriate
value - Sets up the pawn_rank, pawn_mat, and pawn_count
structures
38- Bonus / Penalty assess
- For each square calculates either a bonus or
penalty based upon relative benefit of certain
pieces being on that square - Sums the results for each square and provides a
Light_score and a Dark_score - Calculate score
- Combines the Light and dark scores to provided an
single return value for the function, based on if
it is presently Light or darks turn.
39Eval structure
Init
Board registers
Bonus and penalty cases
Light or dark
Calculate
40Bonus / Penalty Assess structure
One for each input square
.
adder
Adds the values generated at each block
Score_light
41Eval_pawn
- Inputs
- Square to calculate penalty for
- Pawn_rank structure
- Pawn_count structure
- Based on the inputs there is a possibility of
assessing up to four different penalties
42Eval_pawn Penalties, Bonuses
- Penalty A if theres a pawn behind this one
- Penalty B if there are no friendly pawns
adjacent to the current pawn - Penalty C if the pawn is not isolated
- Bonus D if the pawn is passed
43Eval_pawn structure
Square pawn_rank pawn_count
Penalty A 0 Penalty B 0 Penalty C 0 Penalty D 0
Pawn penalty Control logic
mux
mux
mux
mux
adder
44Eval_king
- Inputs (same as eval_pawn)
- Square to calculate penalty for
- Pawn_rank structure
- Pawn_count structure
- The function returns a penalty value that is
adjusted depending on how well shielded the king
is by its own pawns
45Eval_king Penalties
- The File A, B, C, F, G, and H Penalties
- These penalties are assessed when there is no
pawn in File, one row away from the king. - The magnitude of the penalty is dependent on the
distance in the row the pawn is from the king - The pawn attack Penalty
- This penalty is assessed if the enemy's pawns
have advanced too far down the board towards the
king
46Eval_king structure
File A penalty File B penalty File C
penalty Pawn Approach File F penalty File G
penalty File H penalty Pawn Approach No
penalty
Pawn_count pawn_rank
Adder
Adder
control
mux
47Bonus Penalty Assess Structure
- Switching from a position to a piece
representation of the board - No longer need to repeat mux 64 times
- Adder now has 16 inputs one for each piece (vs.
64 inputs). - Knight, Bishop, and Rock still strait table
lookups - Pawn_eval is repeated 8 times
- Still better then 64 times
48Gen
49gen() function
- Searches through all 64 spaces
- Skips empty spaces and opponent pieces
- Creates all possible moves for each friendly
piece - Pushes (with helper function gen_push() onto
move_stack
50Possible Moves
- Basic Pawn Moves
- Move forward 1 or 2 spaces
- Take Left or Right
- Non-pawn Piece (N, B, R, Q, K) moves
- B, R, Q can slide (move more than one space), but
stops when another piece is blocking path - Castle (King or Queen side)
- En Passant
51For Pawns
Pawn
Light
Dark
Take Left
Take Right
Same as Light, But Reversed
Move Forward 1 Space
Move Backward 1 Space
52For Non-Pawn Pieces
Is space Empty?
No
Yes
Is Piece Friendly?
Does Piece Slide?
Yes
No
Yes
No
Take Piece
Move to next square in current direction
Move to next direction
No More Directions
Edge of Board
Move to next piece
53Other Functions
- gen_caps()
- Same as gen(), except only checks for capture
moves - called by quiesce()
- gen_push()
- Pushes moves from gen and gen_caps onto
move_stack - gen_promote()
- Pushes pawn promote move onto move_stack
- One move for each possible piece (Q, B, R, or N)
54move_stack
Before Ply 0
gen_begin
move 0
move 1
. . .
gen_end
move 2
Ply 0
After Ply 0 (before Ply 1)
move n
move 0
move 1
. . .
gen_begin
move 2
Ply 1
gen_end
move n
After Ply 1 (before Ply 2)
move 0
Ply 2
gen_begin
. . .
gen_end
55HW/SW Breakdown
- FPGA puts moves into stack structure
- Currently done by gen(), gen_caps(), gen_push(),
and gen_promote() functions
- HC11 sorts stack structure
- Currently done by sort() and sort_pv() functions
56gen() Hardware
Move Generator (FSM)
Pieces Board
Stack (in Shared Memory)
Pusher
Moves
gen_begin
current_ply gen_begin gen_end
57Generator FSM
Reset
x7
x7
x7
Q move Back Right
P take Left
Q move Back
Q move Right
Pawn
x7
x7
P take Right
P move 2
R move Right
x8
Q move Forward Right
Q move Back Left
Queen
x7
x7
P move 1
x7
R move Forward
R move Left
x2
Q move Left
Q move Forward
Q move Forward Left
x7
Rook
R move Back
x7
N move 7
x7
N move 6
x7
N move 0
Knight
N move 5
K move Forward Right
N move 1
K move Right
x7
K move Forward
x2
B move Forward Right
N move 4
N move 2
K move Back Right
K move Forward Left
King
x7
N move 3
B move Forward Left
B move Back Left
x2
K move Back
K move Left
x7
B move Back Right
K move BackLeft
Bishop
Special
x7
- moves skipped by gen_caps()
Castle K side
Castle Q side
En Passant Left
En Passant Right
DONE
- moves checked by gen_promote()
58HCI
59Possible Ideas
- Interface Design 1
- 8x8 Bar LED board on left displaying the piece
that is in each location (period is black/white) - DIP Switches to select the from/to for a move
- Clock to make the move
- Interface Design 2
- 8x8 Bar LED board on left displaying the piece
that is in each location (period is black/white) - Button in each square next to the Bar LED to
select the from and then to for a move - Clock to make the move
60Interface Design Idea 1
Latches Other Logic
Ribbon Cable Connection
Make Move
From
To
DIP
DIP
61Interface Design Idea 2
Ribbon Cable Connection
Make Move
62Other Ideas
- Beep to signify illegal move
- Touch-screen for the board
- LEDs to signify which players move it is
- Alternate board layouts (many of these)
63Considerations
- Selection of parts
- Numeric, Alphanumeric, LCD, etc. LEDs
- Push buttons
- Latches
- Costs for parts
- Number of FPGA pins needed
- Time to wire-wrap board
- for parts
64Considerations
- Feasibility of design
- Design of board in relation to design of chess
game
65Summary
- Many possibilities
- Two basic likely designs
- Lots of thought and planning needs to go into
design before acquiring parts and building
66Software optimizations
67Software optimizations
- All that remains is
- sort 2.49 -gt currently a O() alg, can change to
be a heap (log n / constant) - makemove 1.15 -gt these will be slower
- takeback 0.88 -gt these will be slower
- quiesce 0.38 -gt will probably go up
- search 0.19
68Algorithm improvements
- search
- Killer Heuristic (search attack branches first,
implemented already done) - think on opponents time. (multi
threading/interrupts! Do it) - history heuristic. (built in already?)
- tighter searches (could be implemented)
- refutation tables (not sure what they are)
- transposition tables (not sure what they are)
69- Eval
- Pawn formation hash table (not needed, in hw)
- King safety hash table. (not needed, in hw)
- Jinkies!
- can probably squeeze out another 20 reduction.
0.05 gt 0.04. - Idealized speedup target 25x!
70Integration plan
Software/Profiling
HW/SW Partitioning
Baseline Stats
FPGAAttack
FPGAEval
FPGAGen
HW/SW Interface
SW modification / optimization
FPGAHCI
CoSim
Physical Design
Integration/Debugging
Optimizations
Baseline Stats
71Division of labor
If it werent for those meddling kids!
72Jon
- Group leader
- prevent him from dropping the class
- Software guru (algorithm, HC11, HiWare)
- strong software background
- hates wires
- overall system design
- some hw/sw partitioning experience (research).
73Jim
- move generation hardware considerations
- when wiring requirements dried up, we moved from
individual projects to pairs -- volunteered to
put thought into and implement - Wirewrap Whiz (Physical Interfacing)/ FPGA
interface Whiz - doesnt care what he does and didnt volunteer
for a verilog job at first
74Jeff
- Project management software
- Experience with lots of MS software.
- HCI Hardware
- display, inputs, verilog, (beeper?) and (wiring?)
75Matt
- Verilog Whiz
- eval function design
- wanted it and did a good job with it.
- Memory Master
- got FPGA demo 0 mem-gtfpga-gtmem proof working.
76- Annie
- Soldering Wire wrap Queen
- wire wrapped everything a lot faster than jim
- HC11 hardware interface
- Got that portion of demo 0 to work.
- Attack function design.
77Demonstration Plan
- Demo 1 (week of 10/4)
- Stats on baseline chess algorithm.
- HW/SW partitioning and interfacing method.
- Details about HW sub systems
- eval / attack / gen / hci
- (Co)simulation of separate parts of HW/SW
partitioning
78- Demo 2 (Week of 11/1)
- Frozen Physical Hardware
- Co Simulation working with hw/sw
- Chess that works and communicates.
- Preliminary stats on new design
- Optimizing / Debugging process
79- Final Demo 11/29
- Optimizations and speedup statistics.
- PC interface GUI (depending on interface)
80Demo 1 Work Schedule
- 9/17 F. Demo 0 completion
- 9/20 M. Internal design review
- 9/22 W. HW/SW partitioning details
- 9/23 R. Design review
- 9/29 W. HW/SW interfacing resolved
- 10/4 M. Verilog simulations for FPGA stuff.
81Demo 2 Work Schedule
- 10/11 M. HW/SW integration/Co-simulation
- 10/18 M Physical Hardware frozen
- 10/25 Algorithm Optimizations
- 11/1 Clock speed optimizations
82Final Demo
- Final review ready. Add more bells and whistles
- Have one month for unpredicted delays..