Diagnosing and troubleshooting a network problem in an enterprise network can be a daunting task. With the potential for multiple branch offices, hundreds or even thousands of hosts, dozens of routers, switches, and servers, all with different vendors or firmware, and good old fashioned human error, knowing where to start is key in implementing a quick solution. There is an established methodology when it comes to diagnosing a large network problem, and following its guidelines will help administrators keep an organized approach to troubleshooting.
Prior experience with the network in question can aid administrators in finding the issue and fixing it. If the majority of network issues that arise during the operation of a network come from specific errors with a known fix, this will quickly give troubleshooting a “go-to” first choice for solving a problem. Even without familiarity with the network, a procedure can be adhered to that will help keep everyone involved on the right track.
The first and most obvious first step is defining the problem in order to troubleshoot network errors. If a user is unable to connect to a file server to access their work, that would define the problem. This initial step generally makes itself known simply by its nature. It’s rare to be called in for troubleshooting without a clear issue already presenting itself!
Next, gather information from the affected users or systems. In the above example about a user having trouble connecting to a file server, it would be worth the time to ask some basic questions. When was the last time the user was able to access the server? Has anything changed since then? Are other users also experiencing the same issue? If the problem is more widespread, it’s likely there’s an issue upstream in the network. If it’s isolated to just that one host, there probably isn’t a wider network issue that needs to be addressed. Gathering information might be one of the most important, and often overlooked, steps in troubleshooting a large network. The data and testimony gathered here can be used to guide administrators throughout the rest of the troubleshooting process.
This is important enough to garner its own section. The ping and trace route tools provide much more information than their simplistic functions would imply. A large amount of data can be gathered for later analysis using just these two commands.
Using another example, let’s say that some users in one part of an office are unable to connect to the network. The ping command can be used to gather information and isolate the problem. This diagnostic tool works across the network layer and using this first can be attributed to the divide and conquer approach to troubleshooting. It simply sends a packet from the host machine to the destination. Keep in mind that some interfaces may have access controls or there may be a hardware/software firewall preventing pings from reaching a host, so this command can have its uses limited, particularly on incoming WAN interfaces.
Cisco recommends a specific four-step procedure when using ping to help diagnose IP errors at the network layer:
Depending on the information gathered about the problem, some of these steps can be skipped. In the above example, if it’s already known that host’s inside that network can still communicate with each other, it makes sense to skip steps one and two.
Another powerful command is traceroute (on Cisco IOS) or tracert (on Windows command prompt). Trace route will send a packet to the destination and report the steps it took on its way there. If the packet fails to communicate with a router on the way to its destination, that will be reported back to the user running the command. This can highlight where a potential issue is occurring and give administrators a good idea of where to start looking for the problem.
Once the problem has been defined and information has been gathered, an analysis needs to take place in order to troubleshoot network problems. This can be simple or complex, depending on the data present. Analyzing the available data is an important step in troubleshooting a network issue, as it gives guidance on which methodology to start working the problem with.
Every network is different, every problem is different, and administrators need to be able to adapt to a changing network environment in order to quickly and effectively diagnose and fix network issues. While a consistently followed and well-documented troubleshooting plan will help keep everyone on the same page to quickly address potential problems, flexibility is needed in order to speed up response and fix times. Understanding when not to follow procedures is key in maintaining a large network.
All networks will undergo a significant number of errors and problems. However, if the same issue is constantly rearing its ugly head, looking for a permanent fix is important. If one router is consistently failing, for example, it may be time for a replacement. Redundancy can help to address, but not solve, recurring network problems. Likewise, “stop-gap” or “quick-fix” solutions need to have long-term solutions implemented as soon as possible to prevent future headaches. Getting ahead of a problem is often the best way to solve it.
People make mistakes. They forget to plug things in, turn them on, configure them correctly, or just don’t know how to make something work. The best way to combat human error is with knowledge and practice. A well-informed user will cause far fewer networking nightmares than one who has received no instruction at all. Always account for the human factor when analyzing data and looking for a solution to a problem.
Likewise, humans will sometimes have unscrupulous goals when accessing a network. Always follow best security practices and be aware that network errors can sometimes have malicious origins designed to disrupt service. These kinds of attacks come in many forms and the best way to prevent them is with education and proactive defense.
There is a wealth of networking software available that will help monitor, diagnose, and troubleshoot large networks. From open-source tools available freely on the internet to full-service enterprise oriented options, there will be a software solution for everyone that can aid administrators in managing their networks. Utilizing these tools can help to expedite network troubleshooting, taking a large portion of human resources and time and placing it in the hands of the software.
The biggest hurdle any network administrator will face is always going to be troubleshooting and maintaining their network. There are an infinite number of potential problems and an equal number of potential solutions, and covering them all is an impossible task. If specific procedures are followed and adhered to, pinpointing the trouble and getting a fix implemented will be made much easier for administrators and their associates.
Get the latest content from our NOC Services Blog.