Title: Structured approach in trouble shooting
1Structured approach in trouble shooting
- Collect and analyze symptoms
- Localize the problem
- Isolate the trouble
- Locate and correct the problem
- Verify the fix
2Network Baseline
- network monitoring
- by monitoring the day-to-day operating of the
network to establish what is normalfor a
network. - learn average traffic of the network
- learn the peak traffic time, over the day, over
the week, and over the month - learn the most and least application being used
- identify the networks users that are most prone
to difficulties - logs should be kept, so that administrator can
compare the encounter problems with those
baseline information showing what normal network
operation should be
3Document network problems
- The more information an administrator have, the
easier it should be to solve the problem - information can be collected by
- Network Management tools
- network analyzers
- problems collected from clients
- identification
- preliminary information (who report, time,
related to previous problem, location etc) - network information collected by network
technician - list of action taken
- summary (hardware, software, configuration, user
problems)
4Analyzing locate and fix
- localize and locate problems
- list all possible causes
- generate a problem scenario based on knowledge
and previous information - determine the most likely cause, by isolation and
elimination - use diagnostic utilities built into the devices
(e.g. NIC, routers, PC) to help to solve locate
the problem - check physical and logical indicators (LEDs) for
the status of devices - correct the problems
- use replacement method to eliminate the possible
causes - start from basic user, cable, patch cords,
malfunction machines - verify the problems had disappeared
5Focus Basics and Standard Tools
- Solving network problems depends a lot on your
understanding - Simple tools can tell you what you need to know
- Example ping is incredibly useful!
6Troubleshooting
- Avoid it by
- redundancy
- documentation
- training
- Try quick fixes first
- simple problems often have big effects
- is the power on?
- is the network cable plugged into the right
socket? Is LED flashing? - has anything changed recently?
- Change only one thing at a time
- test thoroughly after the change
- Be familiar with the system
- maintain documentation
- Be familiar with your tools
- before trouble strikes
7Troubleshooting Learn as you go
- Study and be familiar with the normal behaviour
of your network - Monitoring tools can tell you when things are
wrong - if you know what things look like when they are
right - Using tools such as Ethereal can help you
understand - your network, and
- TCP/IP better
8Documentation
- Maintain an inventory of equipment and software
- a list mapping MAC addresses to machines can be
very helpful - Maintain a change log for each major system,
recording - each significant change
- each problem with the system
- each entry dated, with name of person who made
the entry - Two categories of documentation
- Configuration information
- describes the system
- use system tools to obtain a snapshot, e.g.,
sysreport in Red Hat Linux - Procedural information
- How to do things
- use tools that automatically document what you
are doing, e.g., script
9Connectivity Testing Cabling
- Label cables clearly at each end
- Cable testers
- ensure wired correctly, check
- attenuation
- length is it too long?
- 100BaseT less than 100m
- Is the activity light on the interface blinking?
10Software tools ping
- Most useful check of connectivity
- Universal
- If ping hostname, includes a rough check of DNS
- Sends an ICMP (Internet Control Message Protocol)
ECHO_REQUEST - Waits for an ICMP ECHO_REPLY
- Most pings can display round trip time
- Most pings can allow setting size of packet
- Can use to make a crude measurement of throughput
11ping Roughly Estimating Throughput
- Example
- ping with packet size 100 bytes, round-trip
time 30ms - ping with packet size 1100 bytes, round-trip
time 60ms - So takes 30ms extra (15ms one way) to send
additional 1000 bytes, or 8000 bits - Throughput is roughly 8000 bits per 15ms, or
about 540,000 bits per second - A very crude measurement no account for other
traffic, treats all links on path, there and
back, as one.
12ping Roughly Estimating Throughput
- This can be expressed as a simple formula
13Multi Hop Paths
14Throughput Measuring with ping 1
- Measure throughput between two remote hosts may
use tools like ping - ping two locations with two packet sizes (4 pings
altogether, minimum) - Example
15Throughput Measuring with ping 2
- Time difference / 2 (round trip time (RTT) -gt one
way) - Divide by size difference in bits 8000
- Multiply by 1000 (ms -gt seconds)
- Convert bps to Mbps
16Throughput Measuring with ping 3
17Throughput Measuring with ping 4
18Throughput Measuring with ping 5
- Completing calculation for throughput between
205.153.61.1 and 205.153.61.2
19How to Use ping?
- Ensure local host networking is enabled first
ping localhost, local IP address - ping a known host on local network
- ping local and remote interfaces on router
- ping by IP as well as by hostname if hostname
ping fails - confirm DNS with nslookup (or dig) see later
- Ping from more than one host
20What ping Result is Good, Bad?
- A steady stream of consistent replies indicates
probably okay - Usually first reply takes longer due to ARP
lookups at each router - After that, ARP results are cached
- ICMP error messages can help understand results
- Destination Network Unreachable indicates the
host doing ping cannot reach the network - Destination Host Unreachable may come from
routers further away
21Ping Responses
- On a Cisco router you will get the responses as
to the right - Actual response is
- routergtping www.yahoo.com
- Translating "www.yahoo.com"...domain server
(209.1.221.10) OK - Type escape sequence to abort. Sending 5,
100-byte ICMP Echos to 216.115.102.81, timeout is
2 seconds - !!!!!
- Success rate is 100 percent (5/5), round-trip
min/avg/max 4/16/24ms
22Troubleshooting with ping (1)
- Standard ping used to check the availability of
a host - Ping ltip addressgt
- Extended ping used to track packet loss or
latency (sending out 1 ping per second until
the process is halted by CTRL-C) - Unix / Linux
- Ping s ltip addressgt
- Windows
- Ping t ltip addressgt
23Troubleshooting with ping 2
- Cisco router sends a fixed no. of packets as
fast as it can and waits for response - routergtpingProtocol ipTarget IP address
www.inetdaemon.comTranslating 209.1.221.10Repeat
count 5 100Datagram size 100Timeout in
seconds 2Extended commands nSweep range
of sizes nType escape sequence to
abort.Sending 100, 100-byte ICMP Echos to
207.150.192.12, timeout is 2 seconds!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Success
rate is 100 percent (100/100), round-trip
min/avg/max 12/19/280 ms
24Possible Causes of unable to ping
- If you are trying to ping a name, try pinging the
IP address of the destination machine. If ping
fails when you try the name of the site, but
works when you try the IP address, it's NOT a
network problem, it's DNS. - If you are trying to ping a site and both the
name and the IP address fail, they may be
blocking ping with an access list. Try a
traceroute instead. - If there are multiple hops between you and the
destination, then try pinging each host in the
path to the destination until you find the host
that fails to respond to ping. Use a traceroute
(successfully) to get a list of the hosts between
you and the destination for this purpose.
25What ping is not
- Standard ping should not be used to prove the
following - Routing problems
- Latency
- Packet loss
- There are eight reasons why you cannot trust ICMP
ping
261. Ping is end to end
- Ping reveals nothing regarding the intermediate
devices - Cannot be used for Routing
272. Platform differences
- PCs, Unix machines and routers all handle ICMP
and ping packets differently - Introduces false latency that does not occur with
ordinary TCP or UDP data which is treated
identically by all of the above platforms
283. Ping does not identify the host causing the
problem
- If you ping www.yahoo.com and you think you see a
problem at Yahoo, you have no way to know what
the cause is without running additional tests
(tracerourt, dig or nslookup) - A device in the middle of the path between you
and Yahoo might be failing or over-utilised,
making it appear that Yahoo is dropping packets
when they are not and the reverse could be true
too
294. Queuing and QoS
- Routers can implement queuing strategies, forcing
them to handle ICMP differently from TCP and UDP - Devices providing Q0S functions may also handle
ICMP in a way that differs from the standards and
specifications in order to optimise availability
for TCP and UDP traffic - A QoS device might be programmed to drop 80 of
all ICMP regardless of how much TCP or UDP
traffic there is currently
305. Rate Limits
- A host may have an artificial rate limit, or
access-list imposed to reduce the effect of a
possible future denial of service attack - This will artificially drop only the ICMP packets
and leave the TCP and UDP packets untouched, i.e.
100 of the TCP and UDP packets will get through
even though there is loss seen with ICMP
316. Baseline Dependency
- Most network administrators fail to do a 24-hour
baseline performance evaluation before they buy
bandwidth - When the Internet pipe hits maximum utilisation
during peak hours the administrators panic and
starts screaming at their service providers after
running a ping or two to various remote sites and
before checking their own networks
327. Local Network Issues
- Momentary glitches in performance are normal
occurrences on every network this is yet
another reason for performing extensive
baselining - In networks running OSPF, the entire network
experiences latency every time the update timer
ticks down to zero and the network is flooded
with OSPF updates good baselining and network
planning will help to avoid this - Ping can do nothing to identify the OSPF latency
problem above, it will get totally random,
unpredictable and therefore useless results
338. Bad Network Design
- A bottleneck may be engineered into the network
making everything on the far side of that
connection appear to be slow - There is no physical failure and all equipment is
functioning normally there simply isnt enough
capacity - A bad ping result here is useless in this case as
there is no faulty equipment
34arping uses ARP requests
- Limited to local network
- Can work with MAC or IP addresses
- use to probe for ARP entries in router (very
useful!) - packet filtering
- can block ICMP pings, but
- won't block ARP requests
35Path Discovery traceroute
- Sends UDP packets
- (Microsoft tracert sends ICMP packets)
- increments Time to Live (TTL) in IP packet header
- Sends three packets at each TTL
- records round trip time for each
- increases TTL until enough to reach destination
36traceroute How it Works
- As IP packets pass through each router, TTL in IP
header is decremented - Packet is discarded when TTL decrements to 0
- ROUTER sends ICMP TIME_EXCEEDED message back to
traceroute host - When UPD packet reaches destination, gets ICMP
PORT_UNREACHABLE, since uses an unused high UDP
port
37traceroute Limitations
- Each router has a number of IP addresses
- but traceroute only shows the one it used
- get different addresses when run traceroute from
other end - sometimes route is asymmetric
- router may be configured to not send ICMP
TIME_EXCEEDED messages - get stars instead of round-trip time in
traceroute output
38traceroute Example
- Explain the functions that are performed by
packets 1 to 26.
39traceroute Example (2)
- Packet 1 sends a DNS query to DNS server
(140.112.254.4) to query IP address of
www.csie.ntu.edu.tw - Packet 2 sends back the IP address of
www.csie.ntu.edu.tw which is 140.112.30.28 - The host sends a UDP to 140.112.30.28 with
time-to-live (TTL) set to 1. TTL decrements by 1
when the packet passes a router. Here the TTL
turns to 0, causing the first router
(192.168.5.1) to send back an ICMP message
Time-to-live exceeded. - Packets 5 and 6 try to resolve the name of the
first router but unsuccessful. - Packets 7 to 10 repeat what packet 3 and 4
did two more times so that different response
time can be collected to calculate the average.
40traceroute Example (3)
- Packets 11 to 18 send UDP packet to
140.112.30.28 with TTL set to 2. This time the
packet managed to reach the second router
(140.112.4.126) before it dies and causing the
second router to send back an ICMP message
Time-to-live exceeded. The same UDP packet is
repeated two times to calculate the average
response time. - Packets 19 to 26 send UDP packet to
140.112.30.28 with TTL set to 3. This time the
packet managed to reach the third router
(140.112.30.28) before it dies and since this
third routers IP address matches the destination
address of the UDP, the router return another
ICMP message Destination unreachable because
the destination port is deliberately selected to
one that is normally not used (gt 3000). Name
resolution is performed by packets 21 and 22
successfully. The same packet is repeated two
more time to calculate the average response time.
41Performance Measurements delay
- Three sources of delay
- transmission delay time to put signal onto
cable or media - depends on transmission rate and size of frame
- propagation delay time for signal to travel
across the media - determined by type of media and distance
- queuing delay time spent waiting for
retransmission in a router
42Performance Measurements 2
- bandwidth the transmission rate through the
link - relates to transmission time
- throughput amount of data that can be sent over
link in given time - relates to all causes of delay
- is not the same as bandwidth
- Other measurements needed
- i.e., for quality of service for multimedia
43Using netstat tua to See Network Connections
- netstat tua shows all network connections,
including those listening - netstat tu shows only connections that are
established - netstat i is like ifconfig, shows info and stats
about each interface - netstat nr shows the routing table, like route
n - Windows provides netstat also.
44Traffic Measurements netstat -i
- The netstat program can show statistics about
network interfaces - Linux netstat shows lost packets in three
categories - errors,
- drops (queue full shouldnt happen!)
- overruns (last data overwritten by new data
before old data was read shouldnt happen!) - drops and overruns indicate faulty flow control
bad! - These values are cumulative (since interface was
up) - Could put a load on interface to see current
condition, with ping l, to send large number of
packets to destination - See the difference in values
45Measuring Traffic netstat -i
- Here we run netstat i on ictlab
- netstat -i
- Kernel Interface table
- Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR
TX-OK TX-ERR TX-DRP TX-OVR Flg - eth0 1500 0 407027830 0 0
0 1603191764 0 0 3
BMRU - lo 16436 0 2858402 0 0
0 2858402 0 0 0
LRU - Notice that of the 1.6 billion bytes transmitted,
there were 3 overuns. - Next, blast the path you want to test with
packets using ping l or the spray program, and
measure again.
46Issues with netstat -i
- (CollisIerrsOerrs)/(IpktsOpkts) gt 2 network
hardware problem - (Collis/Opkts) gt 10 Interface overloaded.
Redistribute traffic to other interface or
servers - (Ierrs/Ipkts) gt 25 host drops packets, network
/ servers overloaded - If gt 120 collisions/s network is overloaded
- If sum of input and output packets is gt 600 for a
10Mbps interface or 6000 for a 100Mbps interface,
network segment is too busy
47What is Packet Capture?
- Real time collection of data as it travels over
networks - Tools called
- packet sniffers
- packet analysers
- protocol analysers, and sometimes even
- traffic monitors
- Sniffer, tcpdump, EtherealSee Ethereal Lab at
- http//ictlab.tyict.vtc.edu.hk/tsangkt/snm/Tutori
als/ethereal/
48When Packet Capture?
- Most powerful technique
- When need to see what client and server are
actually saying to each other - When need to analyse type of traffic on network
- Requires understanding of network protocols to
use effectively
49Example
- The following gives the contents of an
Ethernet - frame captured by a protocol analyzer
- Sequence Captured Bit stream
- 0000 23 87 45 9A 43 88 34 CD 7E FF 34 62 08 00
45 FF - 0010 12 34 23 76 40 00 64 06 CD AB 85 23 43 59
85 23 - 0020 43 5A 23 87 52 63 25 41 40 43 00 00 00 00
FF 75 - 0030 20 00 35 75 00 00 82 04 05 91 70 90
- Given that Ethernet II frame format is
- 8 6 6 2
variable 4 - Preamble Dest. Source Type
Data FCS - Address Address
- What are the source and destination MAC
addresses? - What is the type of Ethernet data?
50Example (contd)
- If the frame contains an IP datagram with the
above format, determine - The Ethernet type value and version no. of the IP
protocol. - The source and destination IP addresses in
dot-decimal notation. - What is the protocol type?
51Troubleshooting Protocols
52DNS troubleshooting
- Suspect DNS when get long timeouts before see any
response - ping name, IP address, see if only IP address
works - tools on Linux, Unix
- nslookup,dig, host
- tools on Windows
- nslookup
53nslookup an interactive program
Here a user asks nslookup to provide address of
sysadmin.no-ip.com nslookup displays the name
and address of the server used to resolve the
query, it then displays the answer to the query.
- nslookup
- gt sysadmin.no-ip.com
- Server dns04.netvigator.com
- Address 218.102.32.20853
- Non-authoritative answer
- Name sysadmin.no-ip.com
- Address 202.69.77.139
54nslookup reverse lookups
-
- Maps IP address to hostname (PTR)
- gt 202.69.77.139
- Server dns04.netvigator.com
- Address 218.102.32.20853
- Non-authoritative answer
- 139.77.69.202.in-addr.arpa name
077-139.onebb.com. - Authoritative answers can be found from
- 77.69.202.in-addr.arpa nameserver
ns1.onebb.com. - 77.69.202.in-addr.arpa nameserver
ns2.onebb.com. - ns1.onebb.com internet address 202.180.160.1
- ns2.onebb.com internet address 202.180.161.1
- gt
55DNS Record Types
- Type Name Function
- Zone Records
- SOA Start of Authority This name server
is authoritative for this domain - NS Name Server Identifies the
name server for this domain - Basic Records
- A Address Name-to-address
mappings - PTR Pointer Address-to-name mappings MX Mail
Exchanger Makes mail routing decision - Optional Records
- CNAME Canonical Name Nicknames for a host
- HINFO Host Info Identifies hardware and OS
56The SOA Record
- Indicates that this name server is the best
source of information for the data within this
domain. - There is only one SOA record for each zone the
zone continues until another SOA record is
encountered. - Other secondary name servers within a domain are
non-authoritative - Non-authoritative name server requests zone
transfer periodically (refresh time) from the
primary authoritative name server in a domain
whenever the serial number is incremented. -
57SOA Record Example
- _at_ IN SOA rusty.austin.edu admin.austin.edu
( 1 Serial - 10800 Refresh after 3 hours 3600
Retry after 1 hour 604800 Expire after 1
week 86400 ) Minimum TTL of 1 day - The symbol _at_ in the name field is a shorthand for
the name of the current zone. In this example, it
is the same as austin.edu - rusty.austin.edu is the zone's primary name
server - admin.austin.edu is the email address (replace
first dot with _at_) of the technical contact in
charge of the data.
58Email testing with telnet
- Email protocols SMTP, POP3 are text
- telnet a good tool to test them
- syntax
- telnet server portnumber
- SMTP port 25
- POP3 port 110
59Test the VTC mail server
- telnet smtp.vtc.edu.hk 25
- Trying 192.168.79.191...
- Connected to smtp.vtc.edu.hk (192.168.79.191).
- Escape character is ''.
- 220 pandora.vtc.edu.hk ESMTP Mirapoint 3.2.2-GA
Tue, 25 Feb 2003 111530 0800 (HKT) - helo nickpc.tyict.vtc.edu.hk
- 250 pandora.vtc.edu.hk Hello 172.19.32.30,
pleased to meet you - mail fromltnicku_at_vtc.edu.hkgt
- 250 ltnicku_at_vtc.edu.hkgt... Sender ok
- rcpt toltnicku_at_vtc.edu.hkgt
- 250 ltnicku_at_vtc.edu.hkgt... Recipient ok
- data
- 354 Enter mail, end with "." on a line by itself
- My message body.
- .
- 250 AFF21826 Message accepted for delivery
- quit
- 221 pandora.vtc.edu.hk closing connection
- Connection closed by foreign host.
60SMTP commands for sending mail
- helo identify your computer
- mail from specify sender
- rcpt to specify receiver
- data indicates start of message body
- quit terminate session
- Use names, not IP addresses, to specify
destination
61Testing the VTC pop3 server 1
- telnet pop.vtc.edu.hk 110
- Trying 192.168.79.12...
- Connected to pop.vtc.edu.hk (192.168.79.12).
- Escape character is ''.
- OK carme.vtc.edu.hk POP3 service (iPlanet
Messaging Server 5.2 Patch 1 (built Aug 19 2002)) - user nicku
- OK Name is a valid mailbox
- pass password
- OK Maildrop ready
- stat
- OK 1 673
62Testing the pop3 server 2
- retr 1
- OK 673 octets
- Return-path ltnicku_at_vtc.edu.hkgt
- Received from pandora.vtc.edu.hk
(pandora.vtc.edu.hk 192.168.79.191) - by carme.vtc.edu.hk (iPlanet Messaging Server
5.2 Patch 1 (built Aug 19 2002)) - with ESMTP id lt0HAU00I35H3HGL_at_carme.vtc.edu.hkgt
for nicku_at_ims-ms-daemon - (ORCPT nicku_at_vtc.edu.hk) Tue, 25 Feb 2003
111629 0800 (CST) - Received from nickpc.tyict.vtc.edu.hk
(172.19.32.30) - by pandora.vtc.edu.hk (Mirapoint
Messaging Server MOS 3.2.2-GA) - with SMTP id AFF21826 Tue, 25 Feb 2003
111601 0800 (HKT) - Date Tue, 25 Feb 2003 111530 0800 (HKT)
- From Nick Urbanik ltnicku_at_vtc.edu.hkgt
- Message-id lt200302250316.AFF21826_at_pandora.vtc.edu
.hkgt - My message body.
- .
- dele 1
- OK message deleted
- quit
63pop3 commands retrieving mail
- See RFC 1939 for easy-to-read details
- First, must authenticate
- user username
- pass password
- stat shows number of messages and total size in
bytes - list list all the message numbers and size in
bytes of each message - retr messagenum retrieve the message with
number messagenum - dele messagenum delete the message with message
number messagenum - quit
64telnet Testing Other Applications
- Many network protocols are text. telnet can be
helpful in checking - IMAP servers
- telnet hostname 143
- Web servers
- telnet hostname 80
- Ftp servers
- telnet hostname 21
- Even ssh (can check version, if responding)
- telnet hostname 22
65Conclusion
- Check the simple things first
- Document what you do
- Become familiar with common tools
- Use the tools to become familiar with your
network before troubles strike - Know what is normal