Title: Applying Machine Learning to Circuit Design
1Applying Machine Learning to Circuit Design
- David Hettlinger
- Amy Kerr
- Todd Neller
2Channel Routing Problems (CRPs)
- Chip Design
- Each silicon wafer contains hundreds of chips.
- Chip components, specifically transistors, are
etched into the silicon. - Often groups of components need to be wired
together.
3A Simple CRP Instance
Silicon Channel
4One Possible Solution
Goal Decrease the number of horizontal tracks.
5CRPs Constraints
- Horizontal Horizontal segments cannot overlap.
- Vertical If subnet x has a pin at the top of
column a, and subnet y has a pin at the bottom of
column a, then subnet x must be placed above
subnet y.
6Simulated Annealing (SA)
- Background
- Annealing came from a process blacksmiths use.
- Metal is heated then cooled slowly to make it as
malleable as possible. - Statistical physicists developed SA.
7SA Problem Components
- Definable states The current configuration
(state) of a problem instance must be
describable. - New state generation A way to generate new
states. - Energy function A formula for the relative
desirability of a given state. - Temperature and annealing schedule A
temperature value and a function for changing it
over time.
8A Visual Example
Local Minimum
Global Minimum
9Applying Simulated Annealing to CRPs
- Definable states The partitioning of subnets
into groups. - New states generation Change the grouping of
the subnets. - Energy function The number of horizontal tracks
needed to implement a given partition. - Temperature and annealing schedule Start the
temperature just high enough to accept any new
configuration. As for the annealing schedule,
reinforcement learning can help find that.
10A Simple CRP Example
Start State of a CRP Instance
Partition Graph of this State
11A Simple CRP Example
States 1 and 2
Partition Graphs of theses States
12A Simple CRP Example
States 1, 2 and 3
Partition Graphs of these States
13A Simple CRP Example
Starting through Ending States
Partition Graphs of these States
14A Generated CRP Instance
Start State
A Solution
12 Horizontal Tracks
15 Horizontal Tracks
15SA Problem Components
- Definable states The current configuration
(state) of a problem instance must be
describable. - New state generation A way to generate new
states. - Energy function A formula for the relative
desirability of a given state. - Temperature and annealing schedule A
temperature value and a function for changing it
over time.
16The Drunken Topographer
- Imagine an extremely hilly landscape with many
hills and valleys high and low - Goal find lowest spot
- Means airlift a drunk!
- Starts at random spot
- Staggers randomly
- More tired ? rejects more uphill steps
17Super-Drunks, Dead-Drunks, and Those In-Between
- The Super-Drunk never tires
- Never rejects uphill steps
- How well will the Super-Drunk search?
- The Dead-Drunk is absolutely tired
- Always rejects uphill steps
- How well will the Dead-Drunk search?
- Now imagine a drunk that starts in fine condition
and very gradually tires.
18Traveling Salesman Problem
- Have to travel a circuit around n cities (n
400) - Different costs to travel between different
cities (assume cost distance) - State ordering of cities (gt 8?10865 orderings
for 400 cities) - Energy cost of all travel
- Step select a portion of the circuit and reverse
the ordering
19Determining the Annealing Schedule
- The schedule of the cooling is critical
- Determining this schedule by hand takes days
- Takes a computer mere hours to compute!
20Reinforcement Learning Example
- Need to learn how long we need to study to get an
A
- Trial Error study for various amts. of time
- Short term reward exam grades, amt free time
- Grade affects future opportunities
- i.e. whether we can slack off later
- Our semester grades (goal is to max. this!)
21Reinforcement Learning (RL)
- Learns completely by trial error
- No preprogrammed knowledge
- No human supervision/mentorship
- Receives rewards for each action
1. Immediate reward (numerical)
2. Delayed reward actions affect future
situations opportunities for future rewards
- Goal maximize long-term numerical reward
22RL The Details
Agent
Sensation
Reward
Action
Environment
- Agent the learner (i.e. the student)
- Environment everything the agent cannot
completely control. - Includes reward functions (i.e. grade scale)
- Descript. of current state (i.e. current average)
- Call this description a Sensation
23RL Value Functions
- Use immediate delayed rewards to evaluate
desirability of actions/learn task
- Value function of a state-action pair, Q(s,a)
- The expected reward for taking action a from
state s
- Includes immediate delayed reward
- Strategy Most of the time, choose the action
that corresponds to the maximal Q(s,a) value for
the state. - Remember, must explore sometimes!
24RL Q-Functions
- To start, set Q(s,a) 0 for all s, a.
- Agent tries various actions.
- Each time experiences action a from state s,
updates estimate of Q(s,a) towards the actual
reward experienced
- If usually pick the action a that has the
maximal Q-value for that state
? max. total reward ? optimal performance
25Example Grid World
- Can always move up, down, right, left
- Board wraps around
Goal
Start
- Get to goal in as few steps as
- possible.
- Reward ?
- Meaning of Q?
- What are the optimal paths?
26Applying RL to CRP
- Can learn an approx. optimal annealing schedule
using RL
- Rewarded for reaching better configuration
- Penalized for the amount of time used to find
- this better configuration
? Computer learns to find an approx optimal
annealing schedule in a time-efficient manner.
27Conclusions
- CRP is an interesting but complex problem
- Simulated Annealing helps us solve CRPs
- Simulated Annealing requires annealing schedule
(how to change temperature over time)
- Reinforcement learning which is just learning
through trial error lets a computer learn an
annealing schedule in hours instead of days.
28Any Questions?