Troubleshooting basics

Implementing effective network troubleshooting involves a multi-step approach. These steps both provide a framework for troubleshooting and help reduce the amount of time spent resolving problems:

Identify the problem: This seems obvious, but we often assume that we know the exact scope of the problem, when we might be better served gathering information, identifying symptoms, and, when applicable, questioning users. If there is more than one problem, we should recognize it as such so we can approach each problem individually. Sometimes, the end users are good sources of information. For example, you can ask a user how the system behaves during normal operation and compare it to how the system currently behaves. Recreate the problem, if possible, and try to isolate the location of the problem.
Formulate a theory of probable cause: A single problem can have many causes, but if you have done your homework with information gathering and if you apply a modicum of common sense, you can eliminate many of these causes. Often, the most obvious solution is the correct one; looking at the easiest solution first is a reasonable approach. Keep in mind, however, that your initial theory may be incorrect and you may have to consider other theories.
Test the theory: Once you have established a theory of probable cause, you should attempt to confirm the theory. If the theory can be confirmed, then you can move on to the next step. If not, you need to formulate another theory.
Establish a plan: Once you think you have identified the cause, you need to establish a plan of action. This becomes more important when troubleshooting in enterprise-level environments. Implementing a solution may involve taking systems offline, and you have to determine when they will be taken offline and for how long. In many cases, your organization may have formal or informal procedures for taking the system offline. This often includes scheduling a time – often during non-working hours – when the work will be done. Nonetheless, once you have a plan in place, you should be able to implement a solution.
Implement the solution: Once the corrective change has been made to your network, you still need to test the solution. You can't assume that the solution has worked without testing it, and often you need to be mindful that early results may be deceiving, and you may need to test it again to make sure the solution has worked.

Verify system functionality: Sometimes a solution that fixes one problem creates another. This is why it is important to verify full system functionality before you decide the solution was successful. In fact, it might be best to assume that the changes you make will affect the network in one way or another and determine how it will affect it.
Document the problem and solution: Documenting the solution involves keeping a record of all the steps taken while solving the problem. Documenting both failures and successes can save you time in the future, and in large organizations, keeping a record of the person who implemented the solution can be helpful if someone in the organization has a question about it.

If the problem was initially reported by an end user, you might also consider providing feedback to the user. Such feedback might not only encourage users to report problems in the future, but you might be able to provide information as to how the problem could have been avoided in the first place.

One of the ways we can evaluate problems is to use the seven-layer OSI model and try to determine what layer or layers the problem is on:

Physical layer: This covers such problems as damaged or dirty cabling, or terminations, high levels of signal attenuation, and insufficient cable bandwidth. It also covers such problems as wireless interference or access points malfunctioning.
Data link layer: This covers such problems as MAC address misconfiguration and VLAN misconfiguration or sub-optimal VLAN performance. It also encompasses some protocol issues such as improper L2TP or OSPF configuration.
Network layer: This covers such problems as damaged or defective networking devices, misconfigured devices, sub-optimal device configuration, authentication issues, and lack of sufficient network bandwidth.
Transport layer: This covers issues with the TCP and UDP protocols.
Session/Presentation/Application layers: This covers problems related to applications and application layer protocols (for example, FTP and SMTP).

Table of Contents for Troubleshooting basics

Create new playlist

Sign In

Sign Up

Table of Contents for
Troubleshooting basics