Title: Ponte Corporate Overview (Investor)
1Real-World Techniques for Automating
Configuration of Network Devices
_at_NANOG 24
Mark Epstein CTO, Ponte
February 11th, 2002
2The Challenge Large Scale
Epsteins rule of large numbers Responsibility
for large numbers of anything that must be
individually managed is a real pain
- Large firms have large numbers
- Specific business initiatives and functions
- Vendors, models, and instances of devices
- Employees and Operators
- Security breaches and breach attempts
- Additional Challenges
- High employee turnover
- More operators than device-savvy staff
3Service Provider Network
Customer
NOC2
RPOP
Customer
Core WAN
RPOP
Customer
Customer
NOC1
RPOP
Customer
Customer
RPOP
Customer
Customer
Customer
Customer
Customer
Customer
4Regional POP
Many Devices working in Concert
Core WAN
5Network Security Control
Ponte nsControl Architecture
Network Operations Center
CONTROL SERVER
Security Service Modules
Assembly Templates
Business Systems
Delivery Drivers
Internet
INTRANET
Secured Channels
HomeOffice
HeadquartersOffice
BranchOffice
6Tcl/Expect
Many Interrelated Problems
- Issues
- Buffer skew and device prompts
- Timing and reset behavior
- Terminal servers
- Firmware revisions and delivery
- High Availability and Fail-over
- Control channel problems
- Using existing configurations
- Differential configuration
7Tcl/Expect Issues
Prompts
- Buffer skew
- Your code isnt looking at what you expect
- typical enable prompt ends with
- banner authorized users only
- enable promptrouter23
- Canonical vs custom prompts
- Can cause buffer skew
- Know the prompts or be very flexible
- Strategies
- Resync with unique text
- Use time of output as additional sync?
8Tcl/Expect Code
Prompts
- resynchronize the buffer with text unlikely
- to occur in the input buffer
- PIX example
- proc BufferResync
- set buffer_data ""
- send "who ?\r"
- expect
- -ex usage who ip set
- buffer_data expect_out(buffer)
- timeout error
-
- now safe to expect prompt
- expect
- -ex
- timeout error
-
empty the input buffer expect -re .
error out if anything arrived after
prompt expect -timeout 1 -re . error
timeout erase input send
control u return buffer_data
9Tcl/Expect Issues
Speed Timing
- Device Reset Behavior
- IOS devices disconnect the control terminal on a
'reload - But still accept new connections
- And leave other active connections up until
later in the reload - process
- Thus difficult to detect when device has
completed its reset - Typing Speed Some devices are command speed
limited - Device communication over slow serial lines
- Minimum-cost processors (i.e. slow)
- Inter-command speed can be naturally limited
- Throttle inter-command speed by processing
intervening - prompts
- You cannot depend on prompts
- Ex when connecting through a terminal server to
a device - Do not to send an initial CR too quickly or
device may drop it
10Tcl/Expect Code
Speed Issues
- Measure actual device reset time, encode into
scripts - Different for every device type
- Sophistication makes sense but still
device-specific - Slow command entry (typing) may be critical for
reliable behavior
- ask for the reload, then wait 5min before
- attempting to reconnect
- ExpectReload
- sleep 300
- ...
- slow our typing speed for slow device
- set sendRate JobVar DeviceBitRate
- can only accept data at 25 of bit rate
- set loadFactor 25
- set send_slow deviceSpeed sendRate loadFactor
- ...
- send -s "long data string\r"
11Tcl/Expect Issues
Device Control via Terminal Servers 1-0
- Unpredictable prompt at connection
- Serial vs. virtual-terminal TCP connection
- Device may be in any state at all
- Get device into known state
- Terminal server port resets
- Terminal server ports get wedged
- Good configuration reduces this problem
- Need to be terminal-server-aware
- Pay careful attention to timeouts
- Rebooting terminal server may cause device
reboots!!
12Tcl/Expect Code
Device Control via Terminal Servers 1-1
- proc ExpectLogin access
- set timeout 10
- set retries 3
- set passwordfailed 0
- expect
- -ex gt
- -ex
- warning "device was left in \
- enable mode"
- send "disable\r"
-
- -ex sername
- send "getCSUserName \
- access\r"
- exp_continue
-
-ex assword if passwordfailed 0
send "getSystemPasswd \
access\r" set passwordfailed 1 else
error "System password was \
rejected" exp_continue -ex
Enter Selection for c1900, enterprise
edition send "K" exp_continue
13Tcl/Expect Code
Device Control via Terminal Servers 1-2
- -ex Press any key to continue.
- send "\r"
- exp_continue
-
- -ex Password required, but none \
- set
- error "Connection closed by \
- foreign host. Possible cause\
- no password on device"
-
- eof
- retry "Telnet connection to \
- device closed unexpectedly"
-
- timeout
- set timeout 120
- if retries gt 0
- incr retries -1
send "\r" exp_continue else
retry "Login timed out \ waiting for
\"Password\""
14Tcl/Expect Issues
Device Control via Terminal Servers 2-0
- Console output
- Usually console (serial) is the true console
- Terminal page length may be fixed over the
serial port - Asynchronous, unrelated output increases need
for resynchronization and fault tolerance
15Tcl/Expect Code
Device Control via Terminal Servers 2-1
- Suppress console output
-
- suppress line width editing
- sendCmd "terminal width 0"
- suppress console monitor messages
- sendCmd "terminal no monitor"
- ... do stuff ...
- sendCmd "terminal monitor"
- suppress "More" prompts
- sendCmd "no pager"
- ...
- sendCmd "pager"
Try, try again (What to do when you cant
suppress console output) for set retries 0
retries lt 3 incr retries sendCmd "show
version" set buffer_data BufferResync if
regexp VERSION (\W) \ buffer_data junk
version break if ! info exists
version error
16Tcl/Expect Issues
Special Concerns RE Firmware
- Configuration File Issues
- Commands may be added or removed
- Differences in meaning between versions
- Often must reconfigure to support firmware
- Wholesale firmware change (E.G. CatOS to
- IOS)
- Transfer Concerns
- Distance vs. Reliability
- Some devices require local access
- Pilot error
- TFTP
17Tcl/Expect Issues
Fail-over Devices
- Active/standby and primary/secondary
- IP address vs. terminal server mismatch
- Two men say theyre Jesus, one of them
- must be wrong
- Change volume limits (PIX example) (i.e., 200
lines of conduit changes per commit) - New and expanded commands
18Tcl/Expect Code
Fail-over Devices detection (1)
- proc PIXActive
- set cablestatus NOT FOUND
- set iam
- set state
- send "\r"
- expect
-
- send "sho fai\r"
-
- timeout
- sendAbort
- error "PIXActive timed out \
- waiting for first prompt"
-
-
expect -re "Cable status (\\r\n)"
set cablestatus \ expect_out(1,string)
-re "(\n\r)lt--- More ---gt" send
" \r" exp_continue timeout
sendAbort error "PIXActive timed out \
searching for Cable \ status.'"
19Tcl/Expect Code
Fail-over Devices detection (2)
- expect
- -re "This host (\ ) - \
- (\ \r\n)"
- set iam expect_out(1,string)
- set state expect_out(2,string)
-
- -re "(\n\r)lt--- More ---gt"
- send " \r"
- exp_continue
-
- timeout
- sendAbort
- error "PIXActive timed out \
- searching for This host.'"
-
-
expect -re "(\n\r)lt--- More ---gt"
send "q\r" exp_continue
timeout sendAbort error "PIXActive
timed out \ waiting for final prompt"
20Tcl/Expect Code
Fail-over Devices detection (3)
- if iam "Secondary" \
- state "Active"
- JobRetAdd -append Warning \
- failover_secondary_active \
- "Secondary PIX is Active, \
- cable status cablestatus\n"
-
- if cablestatus ! "Normal"
- sendAbort
- error "PIXActive cable status \
- failure iam Cable status \
- cablestatus"
-
switch -- state Standby return 0
Active return 1 default
sendAbort error "PIXActive failed to \
determine if this host \ active, host
iam, state \ state, \ cable status
cablestatus"
21Tcl/Expect Code
Fail-over Devices change volume
- proc ExpectConfigure cmds
- ExpectConfigMode
- set count 0
- foreach cmd cmds
- send -s cmd
- send "\r"
- expect
- -ex Type help or '?' for a list of \
- available commands.
- sendAbort
- error "ExpectConfigure invalid \
configuration command \ - detected, check session log"
-
(config) timeout
sendAbort error "ExpectConfigure timed
\ out waiting for (config) \ after
cmd" if incr count gt
JobVar MaxConfigurationLines set
count 0 ExpectWriteConfig sleep 30
ExpectConfigMode ExpectWriteConfig
22Tcl/Expect Issues/Code
Control Channel Problems
- Loss of connection triggers Expect EOF
- Many scripts consider this retry-able
- Often caused by transient network failure
- But what state was the device in, anyway?
- Distribute control to reduce risk
- Place control close to devices
- Distance between control and controlled
- device risk of network failure
expect .... eof retry "lost connection,
retry request"
23Tcl/Expect Issues
Turning Found Configurations into Data
- Retrieve configuration from a PIX
- roam-request -a pixdevice -- req_classAuditConfig
\actionimport-pixconfig - After the configuration is retrieved, import-
pixconfig uses roam-pixload to parse
configuration - roam-pixload -r requestId
- roam-pixload gets relevant data from
configuration - interface name, security level, mtu,
speed/options,ip address, netmask, fail-over
configuration,access and enable passwords - Then pushes found data back into device profile
- roam-device pixdevice -- iface.inside.speedauto
- ...
- Device profile is used in combination with
template to create configuration file
24Tcl/Expect Issues
Differential Configuration
- The only way to update device configurations
without losing connections on most
devices - Often not possible - many device commands are
not invertable - Must take care to maintain control connectivity
- Cannot do it for firmware update
- Often difficult
- Often cannot just add onto end of existing
configuration - Can cause serious security issues
- Order-dependent configuration changes often
cannot be made at all - Much more difficult to do reliably than just
replacing startup configuration and reloading
25Tcl/Expect Code
Differential Configuration
- proc ConfigUpdate
- global spawn_id timeout
- set system getSystemPasswd \
- JobGetVar Access
- set enable getEnablePasswd \
- JobGetVar Access
- if JobVarExists Failover \
- JobGetVar Failover
- ConnectFailover system enable \
- JobGetVar RemoteAddrList
- else
- Connect lindex JobGetVar \
- RemoteAddrList 0 \
- system enable
-
set conffile ExpectGetConfig VerifyTarget
conffile set oldconf prepare_config
conffile set newconf prepare_config \
JobGetFile JobGetVar ConfigFile ExpectConfig
ure ComputeDeltaConfig \ oldconf
newconf send "exit\r" ExpectClose