Network Troubleshooting

Ok, now we have talked a lot about binary math, about how to calculate subnets, now to determine what is the valid ip ranges to use for your network. You were diligent and careful, calculated all your addresses carefully, set up yourself a router to the Internet, configured everything, but it still doesn't work. What is wrong? How do you figure it out?

One of the common mistakes I see with young technicians is the "guessing game" approach. They start making changes to things hoping to guess what is causing the problem. They do NOT take the simple approach of slicing the problem down and eliminating sections of the network, or possible items. Most times by the time they give up, they have made so many incorrect changes to the network, that it is impossible to determine what was the original problem.

Being an engineer, using the division method comes natural to me. You first start doing something to narrow down your search on the network. What are the tools available for you to use under Linux (or any other operating system for that matter)?

ifconfig (or ipconfig on a windows box) - This little command will quickly tell you what you have on your machine, what ip addresses, what interfaces, etc that are up and working. The only caveat to it all is on a windows 95 machine, the command is winipcfg, and it pops up a little gui box with the same information in it. On NT, ipconfig /all will show you all the settings. And ifconfig on Linux/unix. All operating systems that support tcp/ip have some version of this command available. All BSD/Linux ones use ifconfig as far as I know. Below is an example ifconfig from my suse Linux box:

	bash-2.03# ifconfig
	eth0      Link encap:Ethernet  HWaddr 00:10:5A:A0:8D:37
	          inet addr:192.168.3.12  Bcast:192.168.3.255  Mask:255.255.255.0
	          inet6 addr: fe80::10:5aa0:8d37/10 Scope:Link
	          inet6 addr: fe80::210:5aff:fea0:8d37/10 Scope:Link
	          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
	          RX packets:532303 errors:0 dropped:0 overruns:0 frame:0
	          TX packets:250499 errors:0 dropped:0 overruns:0 carrier:0
	          collisions:75 txqueuelen:100
	          Interrupt:11 Base address:0xe400
 
	lo        Link encap:Local Loopback
	          inet addr:127.0.0.1  Mask:255.0.0.0
	          inet6 addr: ::1/128 Scope:Host
	          UP LOOPBACK RUNNING  MTU:3924  Metric:1
	          RX packets:689 errors:0 dropped:0 overruns:0 frame:0
	          TX packets:689 errors:0 dropped:0 overruns:0 carrier:0
	          collisions:0 txqueuelen:0
	 
	bash-2.03#

Not only does it show me the ip address and subnet mask, but it shows me the MAC address of the network cards, the type of interface (eth0, eth1, lo, etc) but also the address and irq of each device. This is very helpful in troubleshooting hardware problems when setting up your network. Try this command on your own machine at home and see what it tells you. This is one excellent place to start understanding your own machine and network.

PING - ping is one of the MOST useful tools you could ever have in your bag of tricks. As far as I know, every networkable operating system ever written has a ping command. What does a ping command do? It basically sends a message to the address you provide and says "hey, are you there?" What is the format for a ping command under Linux?

	bash-2.03#  ping 192.168.7.7

or ping ip address, where ip address is the address of the machine you want to check to make sure you can communicate with. Ping is one of the commands I use the most when troubleshooting network equipment and Internet connections.

PROBLEM: you just configured a computer to access the Internet using a cable modem. You set everything up, got your ip address from the cable modem dhcp, and your machine seems happy. BUT, when you open your netscape navigator and type in the url of http://www.nolug.org, it cannot find that address, and gives you either a 404 error, or host not found. What is your steps to determine what is going on? Step 1. Check your OWN machine. Ping yourself. Ping not only will work on other machines, but you can ping yourself. If you ping yourself and you don't get a response, then something is wrong with your tcp/ip setup, so don't look any further. What does the response from a ping look like?

	bash-2.03# ping 192.168.3.12
	PING 192.168.3.12 (192.168.3.12): 56 data bytes
	64 bytes from 192.168.3.12: icmp_seq=0 ttl=255 time=0.238 ms
	64 bytes from 192.168.3.12: icmp_seq=1 ttl=255 time=0.157 ms
	64 bytes from 192.168.3.12: icmp_seq=2 ttl=255 time=0.166 ms
	64 bytes from 192.168.3.12: icmp_seq=3 ttl=255 time=0.130 ms
	64 bytes from 192.168.3.12: icmp_seq=4 ttl=255 time=0.158 ms
	64 bytes from 192.168.3.12: icmp_seq=5 ttl=255 time=0.139 ms
	64 bytes from 192.168.3.12: icmp_seq=6 ttl=255 time=0.129 ms
	64 bytes from 192.168.3.12: icmp_seq=7 ttl=255 time=0.150 ms
	--- 192.168.3.12 ping statistics ---
	8 packets transmitted, 8 packets received, 0% packet loss
	round-trip min/avg/max = 0.129/0.158/0.238 ms
	bash-2.03#     

This is where I pinged myself on my machine at home. A ping command under Linux sends pings repeatedly until you hit the ctrl^C to kill the sending. Under windows, the ping is sent 4 times and quits on it's own. What does it look like if it doesn't ping correctly? Look at the following example of me pinging something I know isn't there.

	bash-2.03# ping 192.168.3.254
	PING 192.168.3.254 (192.168.3.254): 56 data bytes
	--- 192.168.3.254 ping statistics ---
	8 packets transmitted, 0 packets received, 100% packet loss
	bash-2.03#

I ran the ping command and the first line of PING ***** showed up on the screen. Then nothing happened for several seconds so I hit the ctrl^c. Then it told me that it had sent 8 packets and none were returned. So I know that that machine is either not on, or the communication link between my machine and it is not correct.

You can also ping a domain name, i.e. www.insecure.org , or any other domain name. Below is where I pinged bellsouth by domain name. Notice I can find out the ip address of someone's domain by pinging the name. This is very useful in checking to see if your machine can translate the dns names correctly to an ip address. Remember this trick if you are having trouble connecting to a website. Ping it's name and see what happens.

	bash-2.03# ping www.bellsouth.net
	PING www.bellsouth.net (205.152.0.46): 56 data bytes
	64 bytes from 205.152.0.46: icmp_seq=0 ttl=238 time=71.949 ms
	64 bytes from 205.152.0.46: icmp_seq=1 ttl=238 time=70.421 ms
	64 bytes from 205.152.0.46: icmp_seq=2 ttl=238 time=70.759 ms
	64 bytes from 205.152.0.46: icmp_seq=3 ttl=238 time=72.407 ms
	64 bytes from 205.152.0.46: icmp_seq=4 ttl=238 time=70.705 ms
	64 bytes from 205.152.0.46: icmp_seq=5 ttl=238 time=71.554 ms
	--- www.bellsouth.net ping statistics ---
	6 packets transmitted, 6 packets received, 0% packet loss
	round-trip min/avg/max = 70.421/71.299/72.407 ms
	bash-2.03#
             
 

Now if you get a response similar to the first one above on your machine, then your tcp/ip is installed and working correctly. The next step is to ping the next device in your network.

For the sake of argument, let's use my telocity adsl modem as an example. I just pinged myself, so I know tcp/ip is working on my machine, and now I need to check the next computer down the line. My next connection is my FREESCO router. It has an INSIDE ip address of 192.168.3.1. So I ping 192.168.3.1 and get the following response:

	bash-2.03# ping 192.168.3.1
	PING 192.168.3.1 (192.168.3.1): 56 data bytes
	64 bytes from 192.168.3.1: icmp_seq=0 ttl=64 time=0.683 ms
	64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.583 ms
	64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.546 ms
	64 bytes from 192.168.3.1: icmp_seq=3 ttl=64 time=0.603 ms
	--- 192.168.3.1 ping statistics ---
	4 packets transmitted, 4 packets received, 0% packet loss
	round-trip min/avg/max = 0.546/0.603/0.683 ms
	bash-2.03#

This tells me I can communicate with the inside of my router, so I have connection to the router. Next step is to ping the outside interface of my router. I do that and get a good response from it. Then the next step is to ping the adsl modem itself. I ping it's ip address and get a good response. Ok, so I am good to the modem. I look at the modem and the lights are all on showing I have connectivity, so I now need to ping something outside my network. A good place to start is your isp's dns servers. Telocity uses 216.227.36.96 as one of their dns servers. So I ping that server, and get a good response. So the ip portion of the network is working fine, and I am able to communicate to the Internet. BUT, I still can't open a webpage in my netscape browser!!! If my ip addresses are right, what is the problem? There are many different sections of networking, and different services that must all work correctly if you want your network to work correctly. TCP/IP is the basis and the base from which all unix/Linux and Internet connectivity originates. But being able to ping them isn't enough. The problem that I just describes is a classic example of not having your dns servers set correctly on the machine I am working. I can ping everything in the world, but my browser cannot find a dns server to tell it what is the ip address of www.nolug.org. Many people have this problem. They think there is something wrong with their network, something isn't working, or something is broke with tcp/ip and start changing things, when all they need to do is to go add their isp's dns server to their machine. It is a good idea to have a list of common local dns servers.

        216.227.108.197         Barry's dns server and webserver running NOLUG.org
        205.152.128.20          Bellsouth's primary dns server
        205.152.0.5             Bellsouth's secondary dns server
        216.227.36.96           Telocity's dns server

And there are thousands of other dns servers out there that can be used to have your machine understand url names.

traceroute - Let us say you have your ip addresses correct, and your pinging different machines outside your network, but there seems to be one address or network that is unreachable. How can you find out where the problem lies? Try a traceroute to the offending address to see where your packets are going. Below is a traceroute I did to the bellsouth dns server from my machine. Notice that some hosts do not return a name, just * * * instead. There is nothing wrong with this. Using traceroute you can find out all kinds of things about the connections between you and another system.

bash-2.03# traceroute 205.152.0.5
traceroute to 205.152.0.5 (205.152.0.5), 30 hops max, 40 byte packets
 1  192.168.3.1 (192.168.3.1)  1 ms  1 ms  0 ms
 2  dsl-216-227-108-198.telocity.com (216.227.108.198)  2 ms  1 ms  1 ms
 3  route-64-34-214-1.telocity.com (64.34.214.1)  20 ms  21 ms  23 ms
 4  fe1-2-core1.dfw.tlct.net (216.227.96.65)  25 ms  22 ms  22 ms
 5  209.246.152.61 (209.246.152.61)  22 ms  22 ms  23 ms
 6  gigaethernet6-0.core1.Dallas1.Level3.net (209.244.15.37)  21 ms  25 ms  22 ms
 7  so-5-0-0.mp2.Dallas1.level3.net (209.247.10.105)  22 ms  21 ms  23 ms
 8  209.247.10.110 (209.247.10.110)  22 ms  22 ms  21 ms
 9  209.245.240.138 (209.245.240.138)  24 ms  23 ms  23 ms
10  140.at-6-0-0.XR1.DFW9.ALTER.NET (152.63.98.126)  25 ms  23 ms  23 ms
11  185.at-2-0-0.TR1.DFW9.ALTER.NET (152.63.98.34)  23 ms  24 ms  23 ms
12  128.at-5-1-0.TR1.ATL5.ALTER.NET (152.63.0.125)  66 ms  65 ms  78 ms
13  297.ATM5-0.XR1.ATL1.ALTER.NET (152.63.81.29)  65 ms  67 ms  66 ms
14  195.ATM4-0.GW6.ATL3.ALTER.NET (146.188.233.217)  67 ms  68 ms  68 ms
15  bs-stonemountain-gw.customer.alter.net (157.130.72.54)  68 ms  70 ms  70 ms
16  205.152.37.204 (205.152.37.204)  68 ms  68 ms  68 ms
17  205.152.3.46 (205.152.3.46)  70 ms  69 ms  72 ms
18  * * *
19  * * *
20  * * *
21  * * *
22  * * ns.bellsouth.net (205.152.0.5)  71 ms
bash-2.03#

Notice that there are connections in Dallas-Fort Worth (DFW9.ALTER.NET) in my path to the local bellsouth dns server. Just because a machine is physically close to you doesn't always mean that the path to it is direct. This all depends on the connections that your isp has to other networks. For telocity, my packets go to the Dallas area to get to the backbone of the internet, then are transmitted to the correct subnet through Bellsouth's backbone connection. This seems stupid and wasteful, but it is the best way for redundancy to occur and prevent network outages from affecting too many people, and to allow multiple paths to the destinations.

nmap - there is a lot written about nmap, but what exactly does it do? nmap very simply goes to an address and scans to see what ports are open on the machine. We talked about the common ports earlier, and their uses. If you are able to ping a certain ip address successfully, but cannot connect to the mail server, or get ssh to connect to it correctly, then run nmap and see if the port you need is open. See the following example of nmapping an ip address:

bash-2.03# nmap 216.227.110.205 	
Starting nmap V. 2.3BETA14 by fyodor@insecure.org ( www.insecure.org/nmap/ )
Interesting ports on dsl-216-227-110-205.telocity.com (216.227.110.205):
Port    State       Protocol  Service
9       open        tcp       discard
21      open        tcp       ftp
22      open        tcp       ssh
25      open        tcp       smtp
53      open        tcp       domain
80      open        tcp       http
113     open        tcp       auth
443     open        tcp       https
993     open        tcp       imaps
995     open        tcp       pop3s

Nmap run completed -- 1 IP address (1 host up) scanned in 15 seconds
bash-2.03#                              

So you see nmap gives you the details of what ports are open. If you are trying to ssh into a machine, and do not get the port 22 open, then either the machine or the firewall is NOT letting you through. Investigate further. Another example of using nmap on the bellsouth domain:

bash-2.03# nmap www.bellsouth.net
 
Starting nmap V. 2.3BETA14 by fyodor@insecure.org ( www.insecure.org/nmap/ )
Interesting ports on services.bellsouth.net (205.152.0.46):
Port    State       Protocol  Service
80      open        tcp       http
179     filtered    tcp       bgp
 
Nmap run completed -- 1 IP address (1 host up) scanned in 13 seconds
bash-2.03#
              

Now don't just go using nmap to hit everyone in sight. If a website has an Admin like Scott, then that person will get real mad at you nmapping them over and over. Nmap is used by some people to find ports to hack into someone else's machine. One security issue is to set up something to log who is scanning you and see what they are doing. If you are unsure of how to do this, ask someone with experience on how to do this. That is an entire topic of it's own to explain the intimate details of securing your firewall, or your machine.

Route - running a simple route command on your machine is a quick way to see how your routing table for your machine is set up. What the routing table shows you is where it wants to send any packets you send to the network. Below is the routing table of my suse box at home:

bash-2.03# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.3.0     *               255.255.255.0   U     0      0      0   eth0
loopback        *               255.0.0.0       U     0      0      0   lo
default         192.168.3.1     0.0.0.0         UG    0      0      0   eth0
bash-2.03#

Notice my default gateway is set to 192.168.3.1 which is my FREESCO Router. Any packet addressed to any network other than my 192.168.3.X internal network is sent to the router to be sent on it's way. You must have a default route set for your machine to understand where to send packets. You also notice the first entry of 192.168.3.0. This is the standard notation for everything on my 192.168.3.X internal network. Your routing table can have many entries in it if you have a complicated internal network.

Arp - is a command to show what hosts are in your cache on your machine. Sometimes useful, but seems to be more useful to debug the crazy caching of windows. Below is the help portion of arp command from linux.

bash-2.03# arp --help
Usage:
  arp [-vn]  [<HW>] [-i <if>] [-a] [<hostname>]             <-Display ARP cache
  arp [-v]          [-i <if>] -d  <hostname> [pub][nopub]    <-Delete ARP entry
  arp [-vnD] [<HW>] [-i <if>] -f  [<filename>]              <-Add entry from file
  arp [-v]   [<HW>] [-i <if>] -s  <hostname> <hwaddr> [temp][nopub] <-Add entry
  arp [-v]   [<HW>] [-i <if>] -s  <hostname> <hwaddr> [netmask <nm>] pub  <-''-
  arp [-v]   [<HW>] [-i <if>] -Ds <hostname> <if> [netmask <nm>] pub      <-''-

        -a                       display (all) hosts in alternative (BSD) style
        -s, --set                set a new ARP entry
        -d, --delete             delete a specified entry
        -v, --verbose            be verbose
        -n, --numeric            dont resolve names
        -i, --device             specify network interface (e.g. eth0)
        -D, --use-device         read <hwaddr> from given device
        -A, -p, --protocol       specify protocol family
        -f, --file               read new entries from file or from /etc/ethers

  <HW>=Use '-H <hw>' to specify hardware address type. Default: ether
  List of possible hardware types (which support ARP):
    ether (Ethernet) tr (16/4 Mbps Token Ring) tr (16/4 Mbps Token Ring (New))
    ax25 (AMPR AX.25) netrom (AMPR NET/ROM) arcnet (ARCnet)
    dlci (Frame Relay DLCI) irda (IrLAP)
bash-2.03#                                

Below is a copy of running the arp command on my machine at home. Not very exciting.

bash-2.03# arp
Address                 HWtype  HWaddress           Flags Mask            Iface
192.168.3.1             ether   00:60:97:6A:D9:22   C                     eth0
bash-2.03#