CHAPTER 48. Strategies for Troubleshooting Network Problems

SOME OF THE MAIN TOPICS IN THIS CHAPTER ARE


A Documented Network Is Easier to Troubleshoot 918

Problem-Solving Techniques 922

Pitfalls of Troubleshooting 926

Although networks can be composed of many types of physical components, from copper wire or fiber-optic cables to wireless access points and network adapters, there are steps you can take to make troubleshooting network problems a little easier, regardless of their composition. Although each device, protocol, or standard that is a part of your network may come with its own tools used for troubleshooting purposes, it’s important to realize that you should take a structured approach to solving problems on the network. This chapter introduces a few concepts that make life much simpler for a network administrator, including documenting network components, and also documenting problems (and solutions that work).


Note

In other chapters you’ll find discussions of specific tools used for troubleshooting. For example, the use of ping and traceroute for testing IP networks is covered in Chapter 27, “Troubleshooting Tools for TCP/IP Networks.” In Chapter 49, “Network Testing and Analysis Tools,” we’ll look further at some tools that can be used to troubleshoot physical components of the network.


A Documented Network Is Easier to Troubleshoot

One of the oldest abbreviations used on the Internet doesn’t have anything to do with a specific protocol or network service. It’s RTFM. If you ever get this in response to posting a question on a newsgroup, you can probably guess what the letters stand for. For those who don’t know, it’s something along the lines of “read the fine manual!” although “fine” is often replaced with a slightly different word. Use of this term is intended to point out that your question is a simple one that you can easily find an answer to, so you should quit wasting bandwidth by your postings.

Documentation consists of the manuals that come with software applications, operating systems, switches, and other network components. The quality of this sort of vendor-supplied documentation can vary widely from one vendor to another. You’ll find that many companies, such as Cisco, Microsoft, and Novell, provide a lot of online documentation for their products. Often the documentation you get from a vendor is a simplified booklet combined with more extensive documentation on a CD. One of the most widely used formats for creating user documents is the Adobe Portable Document Format (.PDF files), and you can download the Adobe Reader application free from www.adobe.com.

However, after you find yourself with an assortment of documentation—from hard-copy manuals to files on a CD or a Web site—then it’s time to consider what you will use to document how your particular network is laid out, from both the physical and the logical point of view. When it comes time to troubleshoot a problem on the network, it’s nice to have documentation that enables you to quickly get an answer to such simple questions as “Where are the configuration instructions for that router stored?” or “Just who is that user anyway?”


Note

Documentation made available online via the Internet can serve two purposes. First, you can quickly search and find information in a problem scenario. Second, you can read through any online documentation a vendor provides before you make a decision to purchase the particular software or hardware product. Along the same lines, you can also get an idea of the type of support you’ll receive if you review the documentation before you buy. If the documentation isn’t up to par, it might not matter how good the product is—support is everything.


For individual applications or operating systems, you can visit USENET newsgroups and participate in (or just lurk around and read) discussions about problems with particular products. You may just find your answer there. If not, you can post your question. One of the things that newsgroup members most dislike is someone posting a question without providing the details that led up to the problem. Provide the details! If you read the newsgroups on a regular basis, you may be apprised of problems before they appear on your network.


Tip

The most convenient way to search USENET newsgroups is to use Google’s Group search option (http://groups.google.com), and use the Advanced search option to specify the groups to search, text to search for, dates, and so forth.


Some of the important things you should consider as potential candidates for documenting include the following:

image A logical map of the network. This may or may not match up with the physical way the network is laid out.

image A physical map of the network. This documentation should describe each physical component and illustrate the ways in which the different components are connected.

image Cabling and patch panel information. When you’ve got hundreds of cables in a wiring closet patching together different physical segments, you’ll need to know which cable connects this to that.

image Default settings for computers and other devices on the network. A spreadsheet is good for this. An application that manages servers, network components, and client computers is even better.

image Listings of applications and the computers or users that make use of them, as well as software versions, patch levels, and so on. Be sure to know who to contact for a particular application. If you are a network administrator, you are primarily responsible for the underlying network. If a particular application is failing, but the network is up and running, you need to know who to call. There should always be a contact on your list for application managers. A network manager can do only so much.

image Information about the user accounts, and associated permissions and rights, for the users and user groups on the network.

image A network overview. It’s nice to be able to give a new user a document that explains what she needs to know about the network. This should be a short document telling the user such things as which drives are mapped to her computer, and which printers offer what features. This should not be an extensive document such as the physical and logical maps described earlier in this list.

image Problem reports. Keep track of problems as they arise, and document the cause and remedy. No need to solve the same problem twice! This also includes outage reports—keeping track of unscheduled downtime for a computer or network device can tell you over time just how capable the device is.

A logical map of the network shows the relationships between components and the flow of information through the network. A physical map of the network tries to approximate on paper a representation of how each component of the network is connected to the network. For example, a logical map for a Windows network might show computers grouped by domains, even though the computers are not located physically in the same part of the network. A physical map would show the location of each of the computers, the hub or switch to which they are connected, and so on. In general, logical maps can be used to help isolate configuration or application problems, whereas physical maps can be used to isolate a problem that affects only a portion of the network, perhaps a single computer or other device.

You can do the same for any Ethernet or other technology-based network. Knowing the physical layout can be a very important factor in troubleshooting a network problem. For example, Unix and some Linux systems use both NIS (Network Information Systems) and now LDAP (the Lightweight Directory Access Protocol).


Note

You can learn more about NIS in Chapter 29, “Network Name Resolution.” You can lean more about LDAP in Appendix D, “The Lightweight Directory Access Protocol.” If you want to learn about a specific instance of LDAP, read Chapter 30, “Using the Active Directory.”


You can use simple tools, such as Microsoft Paint, to create network mapping documents, or you can buy applications that automate the process. Using an application that is written specifically for creating network maps should be considered for anything but the smallest network. The capability to locate components, update them, and produce easy-to-understand printed documentation is the hallmark of a good network diagramming application. One such tool is Microsoft’s Visio, which allows you to create complex network drawings, and includes pictographic elements for most modern network devices that you can easily use.

Inside the wiring closet you can have a tangled mess of wires on a patch panel that haphazardly tie one network link to another. Or you can have an orderly system in which each port on the patch panel is labeled, using a standardized method so that making changes won’t be a hit-or-miss effort. The same goes for configuration information for other components of the wiring closet, such as switch ports or routers. In-depth documentation is important so that you can re-create the configuration from scratch if it becomes necessary to replace a device.

Applications should be standardized, which means you shouldn’t have multiple applications that all perform the same function. It’s much simpler to support a standard application, such as an office suite, than it is to support multiple applications. And, although the same configuration might not be appropriate for every user, you can at least try to create several standard configurations for classes of users. This makes deploying a desktop computer for a new employee much easier. On that odd occasion when you find that something nonstandard is required, document that also, and also document the reasons behind the decision to use an alternative configuration.

Keeping track of which applications are in use and how they are configured serves another purpose. Some applications interact with others, or are tied to specific versions of an operating system. If you have adequate documentation of the applications used on your network, you can better plan for upgrades.

After you’ve documented the physical components of the network and the applications, what’s left? Oh, yes, the users. If not for the users, you would not have a job. Having a document of some sort that shows a user profile can be useful for troubleshooting purposes. If you know only a user’s logon username and the name of his computer, you have little to go on when he calls in with a problem. If you can quickly locate more information about the user, such as the applications installed on his computer, or the privileges and permissions assigned to the user account or the computer, then you have valuable information to use to help solve problems. Often you can’t get all this information from the user over the phone because many users don’t know that much about what resides on their system. They know only the applications they use and how they use them.

Lastly, keep track of problems. Record the symptoms, the tools used to troubleshoot the problem, and the resolution of the problem. This documentation can assist you in the future so you can quickly determine the solution to a problem based on the symptoms reported by the users. You can also use this information to assist in creating documentation that you give to new users. By informing them of problems that have occurred in the past, you can help prevent the same problems from happening again.

Documentation and Maintenance—Keeping Things Up-to-Date

Documentation is an ongoing process. Networks rarely stay the same for a long time. It has been my experience that the larger the network, the faster the rate of change, as users or departments are relocated and new equipment replaces older equipment. So when you consider what means you’ll use to create network documentation, be sure to take into consideration that it will need to be updated and you’ll need some way for keeping track of changes in an orderly fashion.

Some of the tools you can use to create network documentation include these:

image Word processors and spreadsheets—Each of these is beneficial. Word processors enable you to create professional-looking documents that can be easily changed and reprinted. Spreadsheets can be used to locate information quickly and that information can be easily organized by indexing.

image Online tools—Use simple Web pages to create online documentation. If you have a specific application that has been customized for your network, create a frequently asked questions (FAQ) document for it and put it online (on your intranet). Additionally, you might shy away from pointing users to FAQs and other documents available on the Internet, unless they are sites known to contain accurate information (such as www.rfc-editor.org). There is a great deal of information, as well as disinformation, on the Internet.

image Network mapping tools—Microsoft’s Visio and other applications can assist you with developing a complete map of your network. This type of tool is not inexpensive, but it may prove invaluable in a large installation.

image Hard copy—Printed paper documentation. Two words: Read it.


Tip

When you find a useful page online, you can save the page and all its graphics as a single file by using Internet Explorer. Click File, Save As, and specify Web Archive as the file type.

This is also a useful way to record the current configuration of a router, a wireless access point, or any other device with an integrated web server.


Word Processors and Spreadsheets

These two tools can be useful for creating documentation. You can use either one to gather information about the network and organize it to locate information quickly and easily. Word processing and spreadsheet applications are easy to update, and for instances in which printed documentation is necessary, most of these programs provide excellent formatting and printing capabilities. For example, you can use tables in Microsoft Office’s Word program, or possibly a spreadsheet, to create a list of all the network devices and computers that have an IP address assigned to them. If you want to locate a particular item of data, Word enables you to search a document, and spreadsheets allow you to create multiple indices so that important identifiers are sorted to make it easy to locate information.

For a typical LAN today, it’s likely that you’ll have only a few important devices or servers that have static IP addressing information assigned. It’s easier to use DHCP servers to allocate IP configuration information to computers automatically when they boot. To keep track of dynamically assigned IP configuration information, you can consult the DHCP server application to determine what listing or reporting features are available. For computers or devices you configure with static IP information, you can use a spreadsheet to keep track of this information. Then, when it becomes necessary to replace a router or similar device, you can consult the documentation to get the required configuration information to use on the replacement.


Note

The Dynamic Host Configuration Protocol (DHCP) is discussed in detail in Chapter 28, “BOOTP and Dynamic Host Configuration Protocol (DHCP).” If you use the Microsoft DHCP server that comes with Windows 2000 Server and the Windows 2003 family of servers, you can also enter into the DHCP database the static information that you manually configure some of your servers or devices to use. You can do this by entering static IP addresses and setting up reservations using the GUI for the DHCP service. In addition to being sure that the DHCP server doesn’t try to use an address that you’ve already manually assigned to another computer, this enables you to use the DHCP database for reporting and analysis. Microsoft’s DHCP server and many others allow you to export data to files, such as commadelimited ASCII text files, that can be imported into programs such as spreadsheets or other databases.

Many other programs and utilities have “output” capabilities so that you can send their information to a file. For example, on Windows (both workstation and operating-system platforms) from earlier versions to Windows Server 2003 servers, the IPCONFIG command can be used to display information about the current IP configuration on the computer. If you use the syntax ipconfig /all > %computername%.txt, the output from the command is sent to the file named the same as the computer’s name with the .txt extension instead of to the screen. The point is that you don’t necessarily have to manually create all your documentation. Instead, make use of the tools and utilities provided by the operating system and applications to get the data, and then import it into other programs that make it easier to manage.


Other important things you may want to consider keeping track of for individual computers include the particulars of the hardware that make up the system, any customizations made on the system that aren’t part of a standard, and the user(s) of the system. If the computer is a server on your network, it’s a good idea to keep track of contact phone numbers for client representatives so that you can keep them informed during any troubleshooting efforts or downtime.

Online and Paper Documentation

The paperless office that was forecast during the early days of the PC revolution in the 1980s has yet to come about. No matter how small PDAs and laptops become, it’s generally easier to sit down with a printed manual. Having to stare at a screen for hours at a time can be a lot more cumbersome. Although word processors and other programs are great at making it easy to find information quickly, sometimes the best option is to print things for easier handling.

Today it is not uncommon to find paper documentation being replaced by hyperlinked text files on a Web site. Instead of looking in the index of a book to find the information you need, you can utilize the Web. A Web site can be useful for several reasons. First, for common problems, a simple FAQ document can help end users solve problems themselves so that your help desk doesn’t get a call. Second, for those who do sit at a help desk, clicking through a set of links to find information can be faster than having to juggle one or more manuals and talk to the end user on the phone at the same time.

User Feedback Can Improve Documentation

You can easily judge how well your documentation assists end users by soliciting feedback. If you create the greatest looking documents that can possibly be created, that won’t matter if the end user can’t make sense of the content. After you’ve created any kind of documentation, be sure to provide a mechanism that can enable users to provide you with questions or comments on the documentation. Take these suggestions into consideration when it comes time to make updates.

Problem-Solving Techniques

After you’ve got a well-documented network, all you have to do is sit back and wait for problems to occur. Spurious as that may seem, it’s true. Sometime, some day, when you least expect it, something out of the blue will knock a server offline, disable a printer, and so on. If you have good documentation, you can tackle the problem and do so from a structured point of view.

The troubleshooting method known as the problem resolution cycle builds on accurate documentation for the network and uses a simple question/answer technique to determine what has changed to bring about the problem.

The Problem Resolution Cycle

The problem resolution cycle is a method designed to meet two needs: to solve the immediate problem that prevents the network (or a component of the network) from working, and to provide insights as to the cause of the problem so that it can be avoided or quickly solved in the future. The elements of a structured problem resolution cycle approach are as listed here:

image Accurate and complete descriptions of the symptoms. Determine whether a problem really exists, or whether the user is using the computer or application improperly.

image Understanding how the network functions from a logical and physical point of view.

image Solving the problem instead of creating a makeshift fix.

image Providing a follow-up mechanism for recording and distributing solutions to others who may have a need to know, such as staff at a help desk or a departmental supervisor.

image Development of a solution-tracking system to keep you from having to solve the same problem over and over again.

In most cases, the more data you can collect about a problem, the easier the problem will be to solve. When selecting employees who will serve as help-desk personnel, for example, try to get someone with both good verbal and good listening skills, not just someone with technical know-how. Although the initial problem report might be something like “I can’t print this document,” a good help-desk technician can usually walk the user through a series of questions to determine whether other symptoms are present. In the example just given, it would be prudent to ask whether the user can print other documents, or whether the problem is with just the one document. What about different types of documents?

If the user can print a spreadsheet but not a word processor document, the problem may be with the application. If the user can print a text document with a laser printer but not a document containing a lot of graphics, the printer might not have enough memory to hold the document. Another good question would be to ask whether any other users of the printer are having a problem. As you gather more data, you can focus your troubleshooting efforts on the local user PC or the printer. If the user can’t print anything but no one else is having a problem, you can begin to troubleshoot the printer configuration (has the user made changes you are unaware of?). Or perhaps the user has lost network connectivity and it’s a simple matter to try to ping the computer. You can use utilities such as ping or tracert to determine whether connectivity exists between the user and the printer or print server. After that, you could start investigating to be sure that the correct print driver is installed, and so on.

image Utilities such as ping and tracert are covered in Chapter 27, “Troubleshooting Tools for TCP/IP Networks.”

This brings up the network maps mentioned earlier in this chapter. You can quickly locate what hub, switch, or other network device the user’s computer is attached to by using a physical map of the network. Using a logical map, you can find other users or computers that make use of the same information flow through the network.

Sometimes things just fix themselves. For example, it may be that the user could not print because a router standing between the user and the printer was overloaded temporarily and was not able to route packets from the user’s network segment to the printer. In these situations, don’t let sleeping dogs lie. Instead, keep investigating (using your network maps) and try to determine what caused the problem. You can use performance and capacity reporting techniques for servers and network devices.

In the next chapter we’ll talk about the Simple Network Management Protocol (SNMP) and RMON (Remote Monitoring Protocol) that enable you to gather statistical information about network devices. Find out what caused a problem so that you can anticipate when it might happen again, and try to take measures to prevent it.

Keep track of all incidents in an orderly fashion, and make the information known to others who might encounter the same problem. A help desk should have a log of some sort so that every problem called into the help desk is tracked from the time the call is placed until the problem is solved and the call is closed. Provide feedback to the user about how the problem was solved. This is especially important when you have problems that are self-induced, such as when users try to change the configuration of their computer although they know only enough to be dangerous to themselves!

Don’t repeat past mistakes. By tracking problems and recording the troubleshooting effort and the solution to the problem, you make it easier to solve the same, or similar, problems in the future. Your help desk should have a database of some sort (such as a spreadsheet, or perhaps a Web site with documentation linked via HTML code) that can be used to see whether a problem with similar symptoms has been called in before.

Is There Really a Problem?

Sometimes, as noted in the preceding section, problems just fix themselves. There are times when you can’t ever find the reason for a particular problem. In many cases, you’ll find that sporadic problems are caused not by equipment or software failure, but by users who are not using the system correctly. When any new application is deployed on a network, you need to be sure that the end users receive adequate training for using the application or else you may find that user errors begin to account for many of your help-desk calls. For example, a user may have corrupted files on a hard disk. Should you replace the disk? Should you search for a virus or another harmful program? These sound like logical things to do.

Or you could simply ask whether the user is properly shutting down the computer or just “power cycling” it when he gets stuck in an application and can’t find a way out. Some people find that just turning a computer off and back on again is a fine way to start anew, without realizing the problems they may encounter down the line. So, when troubleshooting, try to find out what has led up to the problem. It may be a simple case of user training that needs to be addressed.

I can’t stress enough the importance of training new users in the workings of the environment in which they will be placed. If you have configured a desktop in a certain manner, you can’t assume that a new employee will be able to make proper use of it. Although it’s easy to check someone’s résumé to determine what applications they are skilled at using, it’s difficult to be sure what the configuration of the application was at their previous place of employment. The same goes for training classes offered by temp agencies and other similar organizations. Although they may have used a standard installation for training purposes, any customizations or configuration changes you make need to be explained to the new user. So, as a general rule, no matter how qualified a new employee may appear to be, it’s just an appearance. You should have in place a structured training program and require each new employee to attend, or at least initiate a mentoring system so that one user can teach another.


Tip

Remember that training doesn’t stop at new hire orientation. As the network, applications, and so on change over time, retraining should also be a requirement.

When service packs for client and server operating systems or applications are installed, make sure you know whether the service pack changes the operating system or application’s user interface. For example, the move from the original or Service Pack 1 (SP1) version of Windows XP to Service Pack 2 (SP2) introduced a revamped firewall, changes to the Hardware tab of the Device Manager, a new Security Center, and other changes. In rare cases, some applications that ran flawlessly before SP2 couldn’t even start up once it was applied! Make sure help desk employees and computer users in general are aware of changes like this and know how to deal with them.


Has This Happened Before—What Is the Procedure to Follow?

Keeping track of how problems were solved will keep you from expending a lot of effort solving the same problem again and again. Using documentation that enables a quick lookup of information based on symptoms can help you find older problem reports or perhaps standard help-desk documentation that was written specifically because a particular problem frequently occurs. Indeed, when a problem does occur frequently, it’s time to find a better solution to the problem. So by tracking problems and the methods used to troubleshoot and solve the problem, you can not only find it easier to solve the current incident, but also provide a feedback mechanism so that you will know that a particular problem needs a better long-term solution.

For problems that occur on a frequent basis, but that you don’t have a lot of control over (such as a user causing errors by not using an application or the network in the appropriate manner), you can at least create a step-by-step outline for solving the problem to make life at the help desk a little less frustrating.


Tip

If your help-desk personnel are bombarded with the same questions over and over again, consider creating a FAQ (frequently asked questions) document for each application or other topic, and distribute it in print form and via the corporate intranet or internal network. A basic FAQ lists questions and answers. If the document becomes more than one printed page or a couple of screens online, consider breaking it up into sections.

To make the FAQ more useful, create it in HTML, add figures if appropriate, and use hyperlinks to point to particular sections, examples, or additional internal or external resources.

Make sure users and help-desk personnel know how to access the FAQ. If nobody knows where to find it, it won’t help relieve pressure on your help desk.


First Things First: The Process of Elimination

If you understand how your network is put together, from both a logical and a physical point of view, then it is possible to use the process of elimination to narrow the focus of your troubleshooting efforts. Some things to think about when trying to pinpoint the cause of a network program include the following:

image What devices—computers, hubs, switches, cables, and so on—are involved? Can you use troubleshooting tools to narrow your search to a single device or a subset of the network?

image If a single computer or device appears to be the only part of the network affected, what is unique about it? If another similar device is up and running, how do the devices differ in their configuration or location in the network?

image If the problem is occurring on multiple systems, what do they all have in common? Are they all on the same network segment? Do they all share a common subnet address? Do they all use the same path through the network to access a device or service that now appears to be unreachable?

image What task was the user performing when the problem occurred? Get specifics about exactly what the user was doing, both up to and when the event occurred. For example, was he using more than one application, printing to more than one printer, or perhaps doing something he should not (like opening an attachment from email that came from outside the local network)?

image Can the problem be reproduced? Walk the user through the same set of steps again and see whether the problem recurs. Next try the same with another user to determine whether the problem is localized to only one computer or is a symptom of a bigger problem or configuration issue.

By narrowing your focus to only the section of the network that experiences the problem, you can more quickly look at the computers and other components of that part of the network to solve the problem. By reproducing the problem, you can be sure that you’ve isolated the cause. Eliminate the obvious (“Is it plugged in?”) and get to the specifics as quickly as you can. Actually, silly as it may sound, asking whether a computer is plugged in is really a very good question. More than once I’ve come in to work to find that a monitor or another device was off. A quick glance at the power strip can indicate that someone, perhaps a housekeeping employee, may have accidentally unplugged the strip, or flipped the switch to turn off the power.

Auditing the Network to Locate Problem Sources

It is important to know how your network operates from a logical and physical point of view. It’s also important to know the capacity of the components of the network, and the degree to which they are utilized. Sometimes problems are simply due to congestion on the network. You can determine these problems by using monitoring software, such as SNMP and RMON, and by baselining your network so that you know what the typical usage patterns are. Knowing when components of the network are stressed close to their usable capacity allows you to plan an upgrade to eliminate the bottleneck, or to reschedule user work habits to make more efficient use of the network.

Pitfalls of Troubleshooting

Above all, when trying to research a problem on the network, remember that you are indeed on a network, and the actions you take can potentially affect other users. When using troubleshooting tools, be sure you first understand how they work and also the correct way to use them. For example, what procedure do you have in place to help users who forget their password? Is a simple call to the help desk all that is required to get the password changed? If so, how does the help-desk technician know who’s on the other end of the line? Although it may seem inconvenient to require the user to report to a supervisor or another person who is a delegated local authority to change his password, this technique is more secure than allowing a simple phone call to place your network in jeopardy.

Other examples include using the route command to change routing tables. An experienced network technician should do this, and not as a quick fix to solve some network problem that you can’t quite put your finger on. If you don’t know why the routing problem is happening, don’t try to fix it with a quick fix! You might end up causing other routers or computers to use less efficient routes and, in the long run, experience network loss through degradation. Understand the tools you use for your troubleshooting efforts.

In a complex network that involves DNS servers, DHCP servers, and possibly even WINS servers, you should be very careful before making changes on the fly. Again, I want to emphasize that a quick fix may solve the current problem, but can also possibly create another that you don’t become aware of until much later, after the damage has been done. Help-desk personnel should be required to contact experienced system administrators before changes are made on these types of servers.

A simple name change in a DNS server, for example, could render a server unreachable for everyone on your network if the wrong address or record type is entered by mistake. Along the same lines, a very common mistake is to use an IP address for a server that falls within the range of the addresses offered by a DHCP server. When the DHCP server allocates that address to another client, everything gets screwed up on the network! Coordinate changes to important network databases and make sure that the person doing the work is fully competent to do it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset