Title: Structured approach in trouble shooting
1Structured approach in trouble shooting
- Collect and analyze symptoms
- Localize the problem
- Isolate the trouble
- Locate and correct the problem
- Verify the fix
2Network Baseline
- network monitoring
- by monitoring the day-to-day operating of the
network to establish what is normalfor a
network. - learn average traffic of the network
- learn the peak traffic time, over the day, over
the week, and over the month - learn the most and least application being used
- identify the networks users that are most prone
to difficulties - logs should be kept, so that administrator can
compare the encounter problems with those
baseline information showing what normal network
operation should be
3Document network problems
- The more information an administrator have, the
easier it should be to solve the problem - information can be collected by
- Network Management tools
- network analyzers
- problems collected from clients
- identification
- preliminary information (who report, time,
related to previous problem, location etc) - network information collected by network
technician - list of action taken
- summary (hardware, software, configuration, user
problems)
4Analyzing locate and fix
- localize and locate problems
- list all possible causes
- generate a problem scenario based on knowledge
and previous information - determine the most likely cause, by isolation and
elimination - use diagnostic utilities built into the devices
(e.g. NIC, routers, PC) to help to solve locate
the problem - check physical and logical indicators (LEDs) for
the status of devices - correct the problems
- use replacement method to eliminate the possible
causes - start from basic user, cable, patch cords,
malfunction machines - verify the problems had disappeared
5Focus Basics and Standard Tools
- Solving network problems depends a lot on your
understanding - Simple tools can tell you what you need to know
- Example ping is incredibly useful!
6Troubleshooting
- Avoid it by
- redundancy
- documentation
- training
- Try quick fixes first
- simple problems often have big effects
- is the power on?
- is the network cable plugged into the right
socket? Is LED flashing? - has anything changed recently?
- Change only one thing at a time
- test thoroughly after the change
- Be familiar with the system
- maintain documentation
- Be familiar with your tools
- before trouble strikes
7Troubleshooting Learn as you go
- Study and be familiar with the normal behaviour
of your network - Monitoring tools can tell you when things are
wrong - if you know what things look like when they are
right - Using tools such as Ethereal can help you
understand - your network, and
- TCP/IP better
8Documentation
- Maintain an inventory of equipment and software
- a list mapping MAC addresses to machines can be
very helpful - Maintain a change log for each major system,
recording - each significant change
- each problem with the system
- each entry dated, with name of person who made
the entry - Two categories of documentation
- Configuration information
- describes the system
- use system tools to obtain a snapshot, e.g.,
sysreport in Red Hat Linux - Procedural information
- How to do things
- use tools that automatically document what you
are doing, e.g., script
9Connectivity Testing Cabling
- Label cables clearly at each end
- Cable testers
- ensure wired correctly, check
- attenuation
- length is it too long?
- 100BaseT less than 100m
- Is the activity light on the interface blinking?
10Software tools ping
- Most useful check of connectivity
- Universal
- If ping hostname, includes a rough check of DNS
- Sends an ICMP (Internet Control Message Protocol)
ECHO_REQUEST - Waits for an ICMP ECHO_REPLY
- Most pings can display round trip time
- Most pings can allow setting size of packet
- Can use to make a crude measurement of throughput
11ping Roughly Estimating Throughput
- Example
- ping with packet size 100 bytes, round-trip
time 30ms - ping with packet size 1100 bytes, round-trip
time 60ms - So takes 30ms extra (15ms one way) to send
additional 1000 bytes, or 8000 bits - Throughput is roughly 8000 bits per 15ms, or
about 540,000 bits per second - A very crude measurement no account for other
traffic, treats all links on path, there and
back, as one.
12ping Roughly Estimating Throughput
- This can be expressed as a simple formula
13Throughput Measuring with ping 1
- Measure throughput between two remote hosts may
use tools like ping - ping two locations with two packet sizes (4 pings
altogether, minimum) - Example
14Throughput Measuring with ping 2
- Time difference / 2 (round trip time (RTT) -gt one
way) - Divide by size difference in bits 8000
- Multiply by 1000 (ms -gt seconds)
- Convert bps to Mbps
15How to Use ping?
- Ensure local host networking is enabled first
ping localhost, local IP address - ping a known host on local network
- ping local and remote interfaces on router
- ping by IP as well as by hostname if hostname
ping fails - confirm DNS with nslookup (or dig) see later
- Ping from more than one host
16What ping Result is Good, Bad?
- A steady stream of consistent replies indicates
probably okay - Usually first reply takes longer due to ARP
lookups at each router - After that, ARP results are cached
- ICMP error messages can help understand results
- Destination Network Unreachable indicates the
host doing ping cannot reach the network - Destination Host Unreachable may come from
routers further away
17Ping Responses
- On a Cisco router you will get the responses as
to the right - Actual response is
- routergtping www.yahoo.com
- Translating "www.yahoo.com"...domain server
(209.1.221.10) OK - Type escape sequence to abort. Sending 5,
100-byte ICMP Echos to 216.115.102.81, timeout is
2 seconds - !!!!!
- Success rate is 100 percent (5/5), round-trip
min/avg/max 4/16/24ms
18Troubleshooting with ping (1)
- Standard ping used to check the availability of
a host - Ping ltip addressgt
- Extended ping used to track packet loss or
latency (sending out 1 ping per second until
the process is halted by CTRL-C) - Unix / Linux
- Ping s ltip addressgt
- Windows
- Ping t ltip addressgt
19Troubleshooting with ping 2
- Cisco router sends a fixed no. of packets as
fast as it can and waits for response - routergtpingProtocol ipTarget IP address
www.inetdaemon.comTranslating 209.1.221.10Repeat
count 5 100Datagram size 100Timeout in
seconds 2Extended commands nSweep range
of sizes nType escape sequence to
abort.Sending 100, 100-byte ICMP Echos to
207.150.192.12, timeout is 2 seconds!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Success
rate is 100 percent (100/100), round-trip
min/avg/max 12/19/280 ms
20Possible Causes of unable to ping
- If you are trying to ping a name, try pinging the
IP address of the destination machine. If ping
fails when you try the name of the site, but
works when you try the IP address, it's NOT a
network problem, it's DNS. - If you are trying to ping a site and both the
name and the IP address fail, they may be
blocking ping with an access list. Try a
traceroute instead. - If there are multiple hops between you and the
destination, then try pinging each host in the
path to the destination until you find the host
that fails to respond to ping. Use a traceroute
(successfully) to get a list of the hosts between
you and the destination for this purpose.
21Path Discovery traceroute
- Sends UDP packets
- (Microsoft tracert sends ICMP packets)
- increments Time to Live (TTL) in IP packet header
- Sends three packets at each TTL
- records round trip time for each
- increases TTL until enough to reach destination
22traceroute How it Works
- As IP packets pass through each router, TTL in IP
header is decremented - Packet is discarded when TTL decrements to 0
- ROUTER sends ICMP TIME_EXCEEDED message back to
traceroute host - When UPD packet reaches destination, gets ICMP
PORT_UNREACHABLE, since uses an unused high UDP
port
23traceroute Limitations
- Each router has a number of IP addresses
- but traceroute only shows the one it used
- get different addresses when run traceroute from
other end - sometimes route is asymmetric
- router may be configured to not send ICMP
TIME_EXCEEDED messages - get stars instead of round-trip time in
traceroute output
24traceroute Example
- Explain the functions that are performed by
packets 1 to 26.
25traceroute Example (2)
- Packet 1 sends a DNS query to DNS server
(140.112.254.4) to query IP address of
www.csie.ntu.edu.tw - Packet 2 sends back the IP address of
www.csie.ntu.edu.tw which is 140.112.30.28 - The host sends a UDP to 140.112.30.28 with
time-to-live (TTL) set to 1. TTL decrements by 1
when the packet passes a router. Here the TTL
turns to 0, causing the first router
(192.168.5.1) to send back an ICMP message
Time-to-live exceeded. - Packets 5 and 6 try to resolve the name of the
first router but unsuccessful. - Packets 7 to 10 repeat what packet 3 and 4
did two more times so that different response
time can be collected to calculate the average.
26traceroute Example (3)
- Packets 11 to 18 send UDP packet to
140.112.30.28 with TTL set to 2. This time the
packet managed to reach the second router
(140.112.4.126) before it dies and causing the
second router to send back an ICMP message
Time-to-live exceeded. The same UDP packet is
repeated two times to calculate the average
response time. - Packets 19 to 26 send UDP packet to
140.112.30.28 with TTL set to 3. This time the
packet managed to reach the third router
(140.112.30.28) before it dies and since this
third routers IP address matches the destination
address of the UDP, the router return another
ICMP message Destination unreachable because
the destination port is deliberately selected to
one that is normally not used (gt 3000). Name
resolution is performed by packets 21 and 22
successfully. The same packet is repeated two
more time to calculate the average response time.
27Performance Measurements delay
- Three sources of delay
- transmission delay time to put signal onto
cable or media - depends on transmission rate and size of frame
- propagation delay time for signal to travel
across the media - determined by type of media and distance
- queuing delay time spent waiting for
retransmission in a router
28Performance Measurements 2
- bandwidth the transmission rate through the
link - relates to transmission time
- throughput amount of data that can be sent over
link in given time - relates to all causes of delay
- is not the same as bandwidth
- Other measurements needed
- i.e., for quality of service for multimedia
29What is Packet Capture?
- Real time collection of data as it travels over
networks - Tools called
- packet sniffers
- packet analysers
- protocol analysers, and sometimes even
- traffic monitors
- Sniffer, tcpdump, EtherealSee Ethereal Lab at
- http//ictlab.tyict.vtc.edu.hk/tsangkt/en/Tutoria
ls/ethereal/
30When Packet Capture?
- Most powerful technique
- When need to see what client and server are
actually saying to each other - When need to analyse type of traffic on network
- Requires understanding of network protocols to
use effectively
31Example
- The following gives the contents of an
Ethernet - frame captured by a protocol analyzer
- Sequence Captured Bit stream
- 0000 23 87 45 9A 43 88 34 CD 7E FF 34 62 08 00
45 FF - 0010 12 34 23 76 40 00 64 06 CD AB 85 23 43 59
85 23 - 0020 43 5A 23 87 52 63 25 41 40 43 00 00 00 00
FF 75 - 0030 20 00 35 75 00 00 82 04 05 91 70 90
- Given that Ethernet II frame format is
- 8 6 6 2
variable 4 - Preamble Dest. Source Type
Data FCS - Address Address
- What are the source and destination MAC
addresses? - What is the type of Ethernet data?
32Example (contd)
- If the frame contains an IP datagram with the
above format, determine - The Ethernet type value and version no. of the IP
protocol. - The source and destination IP addresses in
dot-decimal notation. - What is the protocol type?
33Troubleshooting Protocols
34DNS troubleshooting
- Suspect DNS when get long timeouts before see any
response - ping name, IP address, see if only IP address
works - tools on Linux, Unix
- nslookup,dig, host
- tools on Windows
- nslookup
35nslookup an interactive program
Here a user asks nslookup to provide address of
sysadmin.no-ip.com nslookup displays the name
and address of the server used to resolve the
query, it then displays the answer to the query.
- nslookup
- gt sysadmin.no-ip.com
- Server dns04.netvigator.com
- Address 218.102.32.20853
- Non-authoritative answer
- Name sysadmin.no-ip.com
- Address 202.69.77.139
36nslookup reverse lookups
-
- Maps IP address to hostname (PTR)
- gt 202.69.77.139
- Server dns04.netvigator.com
- Address 218.102.32.20853
- Non-authoritative answer
- 139.77.69.202.in-addr.arpa name
077-139.onebb.com. - Authoritative answers can be found from
- 77.69.202.in-addr.arpa nameserver
ns1.onebb.com. - 77.69.202.in-addr.arpa nameserver
ns2.onebb.com. - ns1.onebb.com internet address 202.180.160.1
- ns2.onebb.com internet address 202.180.161.1
- gt
37DNS Record Types
- Type Name Function
- Zone Records
- SOA Start of Authority This name server
is authoritative for this domain - NS Name Server Identifies the
name server for this domain - Basic Records
- A Address Name-to-address
mappings - PTR Pointer Address-to-name mappings MX Mail
Exchanger Makes mail routing decision - Optional Records
- CNAME Canonical Name Nicknames for a host
- HINFO Host Info Identifies hardware and OS
38The SOA Record
- Indicates that this name server is the best
source of information for the data within this
domain. - There is only one SOA record for each zone the
zone continues until another SOA record is
encountered. - Other secondary name servers within a domain are
non-authoritative - Non-authoritative name server requests zone
transfer periodically (refresh time) from the
primary authoritative name server in a domain
whenever the serial number is incremented. -
39SOA Record Example
- _at_ IN SOA rusty.austin.edu admin.austin.edu
( 1 Serial - 10800 Refresh after 3 hours 3600
Retry after 1 hour 604800 Expire after 1
week 86400 ) Minimum TTL of 1 day - The symbol _at_ in the name field is a shorthand for
the name of the current zone. In this example, it
is the same as austin.edu - rusty.austin.edu is the zone's primary name
server - admin.austin.edu is the email address (replace
first dot with _at_) of the technical contact in
charge of the data.
40Email testing with telnet
- Email protocols SMTP, POP3 are text
- telnet a good tool to test them
- syntax
- telnet server portnumber
- SMTP port 25
- POP3 port 110
41Test the VTC mail server
- telnet smtp.vtc.edu.hk 25
- Trying 192.168.79.191...
- Connected to smtp.vtc.edu.hk (192.168.79.191).
- Escape character is ''.
- 220 pandora.vtc.edu.hk ESMTP Mirapoint 3.2.2-GA
Tue, 25 Feb 2003 111530 0800 (HKT) - helo nickpc.tyict.vtc.edu.hk
- 250 pandora.vtc.edu.hk Hello 172.19.32.30,
pleased to meet you - mail fromltnicku_at_vtc.edu.hkgt
- 250 ltnicku_at_vtc.edu.hkgt... Sender ok
- rcpt toltnicku_at_vtc.edu.hkgt
- 250 ltnicku_at_vtc.edu.hkgt... Recipient ok
- data
- 354 Enter mail, end with "." on a line by itself
- My message body.
- .
- 250 AFF21826 Message accepted for delivery
- quit
- 221 pandora.vtc.edu.hk closing connection
- Connection closed by foreign host.
42SMTP commands for sending mail
- helo identify your computer
- mail from specify sender
- rcpt to specify receiver
- data indicates start of message body
- quit terminate session
- Use names, not IP addresses, to specify
destination
43Testing the VTC pop3 server 1
- telnet pop.vtc.edu.hk 110
- Trying 192.168.79.12...
- Connected to pop.vtc.edu.hk (192.168.79.12).
- Escape character is ''.
- OK carme.vtc.edu.hk POP3 service (iPlanet
Messaging Server 5.2 Patch 1 (built Aug 19 2002)) - user nicku
- OK Name is a valid mailbox
- pass password
- OK Maildrop ready
- stat
- OK 1 673
44Testing the pop3 server 2
- retr 1
- OK 673 octets
- Return-path ltnicku_at_vtc.edu.hkgt
- Received from pandora.vtc.edu.hk
(pandora.vtc.edu.hk 192.168.79.191) - by carme.vtc.edu.hk (iPlanet Messaging Server
5.2 Patch 1 (built Aug 19 2002)) - with ESMTP id lt0HAU00I35H3HGL_at_carme.vtc.edu.hkgt
for nicku_at_ims-ms-daemon - (ORCPT nicku_at_vtc.edu.hk) Tue, 25 Feb 2003
111629 0800 (CST) - Received from nickpc.tyict.vtc.edu.hk
(172.19.32.30) - by pandora.vtc.edu.hk (Mirapoint
Messaging Server MOS 3.2.2-GA) - with SMTP id AFF21826 Tue, 25 Feb 2003
111601 0800 (HKT) - Date Tue, 25 Feb 2003 111530 0800 (HKT)
- From Nick Urbanik ltnicku_at_vtc.edu.hkgt
- Message-id lt200302250316.AFF21826_at_pandora.vtc.edu
.hkgt - My message body.
- .
- dele 1
- OK message deleted
- quit
45pop3 commands retrieving mail
- See RFC 1939 for easy-to-read details
- First, must authenticate
- user username
- pass password
- stat shows number of messages and total size in
bytes - list list all the message numbers and size in
bytes of each message - retr messagenum retrieve the message with
number messagenum - dele messagenum delete the message with message
number messagenum - quit
46telnet Testing Other Applications
- Many network protocols are text. telnet can be
helpful in checking - IMAP servers
- telnet hostname 143
- Web servers
- telnet hostname 80
- Ftp servers
- telnet hostname 21
- Even ssh (can check version, if responding)
- telnet hostname 22
47Conclusion
- Check the simple things first
- Document what you do
- Become familiar with common tools
- Use the tools to become familiar with your
network before troubles strike - Know what is normal