Title: Pierre Sermanet Raia Hadsell Jan Ben
1SPEED-RANGE DILEMMAS FOR VISION-BASEDNAVIGATION
IN UNSTRUCTURED TERRAIN
- Pierre Sermanet¹² Raia Hadsell¹ Jan Ben²
- Ayse Naz Erkan¹ Beat Flepp² Urs Muller² Yann
LeCun¹ - (1) Courant Institute of Mathematical
Sciences, New York University - (2) Net-Scale Technologies, Morganville, NJ
07751, USA
2Outline
- Program and System overview
- Problem definition
- Architecture
- Results
3Overview Program
Overview Problem Architecture Results
- LAGR Learning Applied to Ground Robots
- Demonstrate learning algorithms in unstructured
outdoor robotics - Vision-based only (passive), no expensive
equipment - Reach a GPS goal the fastest without any prior
knowledge of location - DARPA funded, 10 teams (Universities and
companies), common platform - Comparison to state-of-the-art CMU baseline
software and other teams - Monthly tests by DARPA in various unknown
locations
- Unstructured outdoor robotics is highly
challenging due to wide diversity of environments
(colors, shapes, sizes of obstacles, lighting and
shadows, etc) - Conventional algorithms are unsuited, need for
adaptability and learning
4Overview Platform
Overview Problem Architecture Results
- Constructor CMU/NREC
- Vision based only 2 stereo pairs of cameras (
GPS for global navigation) - 4 Linux machines linked by Gigabit ethernet
- Two eye machines (dual core 2Ghz) Image
processing - planner machine (single core 2Ghz) Planning
and control loop - controller machine Low level communication
- Maximum speed 1.3m/s
- Proprietary CMU/NREC API to sensors and actuators
- Proprietary CMU/NREC Baseline
- end-to-end navigation software (D, etc)
- (not re-used)
GPS
Dual stereo cameras
Bumper
5Overview Philosophy
Overview Problem Architecture Results
- Main goal
- Demonstrate machine learning algorithms for
long-range vision (RSS07). - Supporting goal
- Build a solid software platform for long-range
vision and navigation - Robust and reliable
- Resistant to sensors imprecisions and failures
Input image
Stereo labels (short-range)
Self-supervised learning using convolutional
network
Input context-rich image windows
Output long-range labels
6Overview System
Overview Problem Architecture Results
Note Latency is not only tied to frequency but
also to sensors latency, network, planning and
actuators latency.
7Problem
Overview Problem Architecture Results
- Important performance drop in local obstacle
avoidance with too high latency and frequency
Performance Test of July 2006, Holmdel Park, NJ
- Artificially increasing latency and period almost
linearly increases the number of crashes in
obstacles - Human expert drivers of the UPI Crusher vehicle
reported a feedback latency of 400ms was the
maximum for good remote driving.
- How to guarantee good performance with increasing
complexity introduced by sophisticated long-range
vision modules? - When does processing speed prevails over vision
range, and vice-versa?
8Problem Delays
Overview Problem Architecture Results
- Latency and frequency determine performance, but
latency is actually composed of 3 types of
latencies or delays - Sensors/Actuators latency LAGR API latency
Images are already 190ms
old when made available to image processing - Processing latency
- Robots dynamics latency (inertia
acceleration/deceleration) 1.5sec (worst case)
between a wheel command and actual desired speed - (1) and (3) are relatively high on the LAGR
platform and must be caught up to and taken in
account by (2).
9Problem Solutions to delays
Overview Problem Architecture Results
- To account for sensors and processing latencies
(1) and (2) - Reduce processing time.
- Estimate delays between path planning and
actuation. - Place traversibility maps according to delays
before and after path planning. - To account for dynamics latencies (3)
- Modeling or record robots dynamics.
- ? All (a), (b), (c) and (d) solutions are part
of the global solution presented in the results
section, but here
we will only describe a successful
architecture for (a)
10Architecture
Overview Problem Architecture Results
- Idea
- Wagner et al.¹ showed that a walking human gazes
more frequently close by than far away - ? need higher frequency closer than far away
- Close by obstacles move toward robot faster than
far obstacles - ? need lower latency closer than far away
- To satisfy those requirements, short and long
range vision must be separated into 2 parallel
and independent OD modules - Fast-OD processing has to be fast, vision is
not necessarily long-range. - Far-OD vision has to be long-range, processing
can be slower. - How to make Fast-OD fast?
- ? Simple processing and reduced input
resolution. - ? Can we reduce resolution without reducing
performance? - ¹ M. Wagner, J. C. Baird, andW. Barbaresi. The
locus of environmental attention. J. of
Environmental Psychology, 1195-206, 1980.
11Architecture Fast-OD // Far-OD
Overview Problem Architecture Results
Far-OD
Fast-OD
12Architecture Implementation notes
Overview Problem Architecture Results
- CPU cycles All cycles must be given to Fast-OD
when it runs to guarantee low latency. Different
solutions are - Use real-time OS and give high priority to
Fast-OD. - With regular OS, give Fast-OD control of Far-OD
- ? Fast-OD pauses Far-OD, runs, then sleeps for a
bit and resume Far-OD. - Use dual-core CPU.
- Map merging Fast and Far maps are merged
together before planning according to their
respective poses.
- 2-step planning This architecture makes it
easier to separate the different planning
algorithms suited for short and long range - Fast-OD planning happens in Cartesian space and
takes robot dynamics in account (more important
in short range) - Far-OD planning happens in image space and uses
regular path planning.
Long-range planning Image space
infinity
5m
10m
10m
Dynamics planning Cartesian space
13Results Timing measures
Overview Problem Architecture Results
Fast-od actuation latency
250ms
190ms
Fast-od sensors latency
Fast-od period (frequency)
100ms (10Hz)
Far-od period (frequency)
370ms (2-3Hz)
Far-od actuation latency
700ms
14Results
Overview Problem Architecture Results
- Short and long range navigation test
- 1st obstacle appears quickly and suddenly to
robot ? testing short range navigation - Cul-de-sac ? testing long range navigation
- Parallel architecture is consistently better at
short and long range navigation than series
architecture or FAST-OD only.
Note Here Fast-od has 5m radius and Far-od 15m
radius.
15Results More recent results
Overview Problem Architecture Results
- Fast-od Far-od in parallel
- Short-range navigation consistently successful
- 0 collision over gt5 runs
- Finish run in about 16sec along shortest path
- Fast-od 10Hz 250ms 3meters range
- Far-od 3Hz 700ms 30meters range
Video 1 collision-free bucket maze
Video 2 collision-free bucket maze
- Fast-od Far-od in series
- Short-range navigation consistently failing gt 2
collisions over gt5 runs - Finish run in gt40sec along longer path
- Fast-od/Far-od 3Hz 700ms 3m/30m range
(frequency is acceptable but latency is too high)
Videos 3,4,5 obstacle collisions due to high
latency and period.
16Results More recent results
Overview Problem Architecture Results
- Fast-od Far-od in parallel
- Short-range navigation consistently successful
0 collision over gt5 runs - Fast-od 10Hz 250ms 3meters range
- Far-od 3Hz 700ms 30meters range
- Note long-range planning is off, i.e. Far-od is
processing but ignored. Only short-range
navigation was tested here.
Video 6 Natural obstacles
Video 7 Tight maze of artificial obstacles
17Results Moving obstacles
Overview Problem Architecture Results
- Detects and avoids moving obstacles consistently.
Video 8 Fast moving obstacle
18Results Beating humans
Overview Problem Architecture Results
- Autonomous short-range navigation is consistently
better than inexperienced human drivers and equal
or better than experienced human drivers. - (driving with only robots images would be even
harder for a human)
Video 9 Experienced human driver
19Results Processing Speed - Vision Range dilemma
Overview Problem Architecture Results
- We showed that processing speed prevails over
vision range for short range navigation, whereas
vision range prevails over speed for long range
navigation. - Only 3m vision range were necessary to build a
collision-free short range navigation for a
1.3m/s non-holonomic vehicle - Vehicles worst-case stopping delay 1.0 sec.
- Systems worst-case reaction time 0.25 sec.
latency 0.1 sec period - Worst-case reaction and stopping delay 1.35
sec., (or 1.75m) - Only 1.0 sec. anticipation necessary in addition
to worst-case reaction and stopping delay. - A vision range of 15m with high latency and lower
frequency consistently improved the long range
navigation in parallel to the short range module.
20Summary
Overview Problem Architecture Results
- We showed that both latency and frequency are
critical in vision-based systems because of
higher processing times. - A simple and very low resolution OD in parallel
with a high resolution OD proved to increase
greatly the performance of a short and long range
vision-based autonomous navigation system over
commonly used higher resolution and sequential
approaches - Processing speed prevails over range in
short-range navigation and only 1.0 sec.
additional anticipation to dynamics and
processing delays was necessary. - Additional key concepts such as dynamics modeling
must be implemented to build a complete
end-to-end successful system. - A robust collision-free navigation platform,
dealing with moving obstacles and beating humans,
was successfully built and is able to leave
enough CPU cycles available for computationally
expensive algorithms.
21Questions?