Title: Multi-Agent Exploration
1Multi-Agent Exploration
Matthew E. Taylor
http//teamcore.usc.edu/taylorm/
2DCOPs Distributed Constraint Optimization
Problems
- Multiple domains
- Multi-agent plan coordination
- Sensor networks
- Meeting scheduling
- Traffic light coordination
- RoboCup soccer
- Distributed
- Robust to failure
- Scalable
- (In)Complete
- Quality bounds
3DCOP Framework
a2 a3 Reward
10
0
0
6
a1 a2 Reward
10
0
0
6
a1
a2
a3
Different levels of coordination possible
4Motivation DCOP Extension
- Unrealistic often environment is not fully
known! - Agents need to learn
- Maximize total reward
- Real-world applications
- Mobile ad-hoc networks
- Sensor networks
5Problem Statement
- DCEE
- Distributed Coordination of Exploration
Exploitation - Address Challenges
- Local communication
- Network of (known) interactions
- Cooperative
- Unknown rewards
- Maximize on-line reward
- Limited time-horizon
- (Effectively) infinite reward matrix
5
6Mobile Ad-Hoc Network
- Rewards signal strength between agents 1,200
- Goal Maximize signal strength over time
- Assumes
- Small Scale fading dominates
- Topology is fixed
a1
a2
75
95
100
a3
a4
50
7MGM
8Static Estimation SE-Optimistic
Rewards on 1,200
If I move, Id get R200
a1
a2
a3
a4
100
50
75
9Static EstimationSE-Optimistic
Rewards on 1,200
If I move, Id gain 275
If I move, Id gain 250
If I move, Id gain 100
If I move, Id gain 125
a3
a1
a2
a4
a3
100
50
75
10Results SimulationMaximize total reward area
under curve
SE-Optimistic
No Movement
11Balanced Exploration Techniques
- BE-Backtrack
- Decision theoretic calculation of exploration
- Track previous best location Rb
- Bid to explore for some number of steps (te)
Reward while exploiting P(improve reward)
Reward while exploiting P(NOT improve reward)
Reward while exploring
12Results SimulationMaximize total reward area
under curve
BE-Backtrack
SE-Optimistic
No Movement
13Omniscient Algorithm
- (Artificially) convert DCEE to DCOP
- Run MGM algorithm Pearce Tambe, 2007
- Quickly find local optimum
- Establish upper bound
- Only works in simulation
13
14Results SimulationMaximize total reward area
under curve
Omniscient
BE-Backtrack
SE-Optimistic
No Movement
15Balanced Exploration Techniques
- BE-Rebid
- Allows agents to backtrack
- Re-evaluate every time-step Montemerlo04
- Allows for on-the-fly reasoning
15
16Balanced Exploration Techniques
- BE-Stay
- Agents unable to backtrack
- True for some types of robots
- Dynamic Programming Approach
16
17Results (simulation)
(10 agents, random graphs with 15-20 links)
18Results (simulation)
(chain topology, 100 rounds)
19Results (simulation)
(20 agents, 100 rounds)
20Also Tested on Physical Robots
Used iRobot Creates (Unfortunately, they dont
vacuum)
21Sample Robot Results
21
22k-Optimality
- Increased coordination
- Find pairs of agents to change variables
(location) - Higher communication overhead
- SE-Optimistic SE-Optimistic-2 SE-Optimistic-3
- SE-Mean SE-Mean-2
- BE-Rebid BE-Rebid-2
- BE-Stay BE-Stay-2
22
23Confirm Previous DCOP Results
If (artificially) provided rewards, k2
outperforms k1
24Sample coordination results
Full Graph
Chain Graph
24
25Surprising ResultIncreased Coordination can Hurt
26Surprising ResultIncreased Coordination can Hurt
27Regular Graphs