Chapter 2: Managing Common Problems
Exam Objectives
Recognizing and resolving Windows service problems
Identifying and correcting auto-restart and blue screen errors and system lockup problems
Knowing and fixing device driver failures
Resolving issues with missing dll files
Distinguishing and rectifying application install, start, and load failures
In this chapter, you examine some common errors that you will encounter when servicing computers. I show you how to gather the information that you need to properly diagnose and correct the problem. You also look at several common errors and problems that you are likely to encounter when dealing with Windows-based systems.
And because these are some of the most common problems that you will run into at the OS level as a CompTIA A+ Certified Professional, this chapter will prepare you for these problems, and will help you on the exam.
Solving Windows-Specific Service Problems
Many jobs are performed on your Windows computer by services. Services are programs that operate in the background, typically programs without a user interface or windows. Most of the time, services will tick away in the background doing their jobs without an issue; but at times, they stop doing their job. A service is a program and is really no different from any other program; it may have programming flaws and may crash or stop responding. As an example, in this section, you take a look at the print spooler service.
Prior to print spooling being integrated with an OS, the application that was printing had to send each piece of data to the printer, halting the user from doing anything until the printing process was complete. A print spooler is used to quickly accept print data from the application, allowing control of the application to be returned to the user after a very brief spooling process. When the spooler takes the print data from the application, it stores the data in a temporary file, until the spooler has time to send the data out to the printer, or de-spool the data.
From time to time, when dealing with Windows, you can encounter a stalled print spooler. This seems to be a feature of the print spooler service. After a period of printing, and printing well, the print spooler decides to stop responding to further commands. This has been a long-running issue for Microsoft, but seems to happen less frequently with newer versions of the Windows OS. Although the printer spooler service seems to be better behaved these days, other services on your computer may not be as well behaved.
When a service stops performing its job properly, the symptoms can vary. You can identify a stalled print spooler service from the following:
Users cannot add new jobs to the print queue.
Nobody can remove jobs from the print queue.
Existing jobs do not print.
The print queue appears empty even though it was sent print jobs.
If the print queue exhibits these symptoms, restart the print queue; for any other service, you can restart that service. Use either of the following methods to restart the spooler service:
The Services MMC snap-in
The Administrative Tools folder holds Services console. In Windows XP, choose My Computer⇒Control Panel⇒Administrative Tools⇒Services, or in Windows Vista, choose Computer⇒Control Panel⇒System and Maintenance⇒Administrative Tools⇒Services. Locate the spooler service in the list of services, right-click it, and choose Restart (as shown in Figure 2-1). You could also choose to Stop, wait for that action to complete, and then choose Start.
The net
command
From the command prompt, use
net stop spooler
net start spooler
or
net stop “print spooler”
net start “print spooler”
If you don’t know the name of the service you want to start or stop, use net start
, and Windows lists the registered Services.
If your print spooler is suffering from any problem, restarting the spooler service is usually the correct answer.
After the service restarts, your problems should be gone.
Figure 2-1: Restarting services is easy from the Services Administrative tool.
Solving Boot Errors and Errors Requiring Restarting
With the number of computers in most offices and how the law of averages works, it seems like there is always a problem that requires or causes a reboot. In large offices, it can seem like the Welcome to Windows chime is always playing. In fact, so many support people have used rebooting as the first step in dealing with problems that some users now instinctively reboot even before calling in a support person. This is not always the best course of action because it often removes information that can be used to locate the root of the problem.
Auto-restart errors
In many cases, auto-restart errors are caused by power-related issues, other hardware failures, or software configuration problems. If the computer auto-restarts, something is definitely wrong.
Some common issues that can cause auto-restarts include
Service configuration: When configuring Services, you can choose an action to be taken when the Service fails. One action is to restart the computer. If you have a Service that is failing and it is configured to restart the computer when it fails, this causes an auto-restart. This setting can be seen in Figure 2-2. More information on Services can be found in Book VI, Chapter 3.
Figure 2-2: Auto-restarting a service after a failure is just one of the options.
CPU fan: Most computers have a default thermal setting for the computer to cause a temporary shutdown. If the CPU fan fails, the CPU’s temperature quickly rises to where the motherboard interrupts the power to prevent damage to the CPU. Dust caked in and around the heat sink and fan has the same effect. A visual inspection of the CPU fan will show you whether it is running, or whether dirt or debris are the culprits. Book IV, Chapter 1, talks about ventilation methods as well as problems related to poor ventilation.
Power supply problem: Power supplies have limited life spans. When they fail, their demise is often complete and immediate. In some cases, they can supply some power for a period of time, but as they heat up, they cause an interruption in the power, thus restarting the computer. It is easier to diagnose a complete failure of the power supply rather than a power supply that fails when it is hot because for the latter, you need the power supply operating for a period of time — possibly days — before you see the problem. When in doubt, swap it out. Power supplies are not expensive and are easy to replace — and if it fixes the problem, you will know it quickly. For more on power supplies, see Book II, Chapter 7.
Power source problem: Some buildings are notorious for power spikes or drops. In some cases, the power can drop to a point that there is insufficient power for the computer, even though other electrical devices do not show any signs. If this problem happens regularly, it will likely affect a number of computers.
Choose from a variety of tools for testing the quality of the power coming from the wall. Some low-end, single-computer uninterruptible power supply (UPS) systems give you a display of the current voltage and provide line conditioning on a 120V line, down to 89V. The UPS can also supply some power when the voltage drops below that level. For more on power sources, see Book II, Chapter 7.
Bad memory: When memory is the problem (the computer’s, not yours), you might see the system running fine until the damaged or bad blocks of memory are used. At that time, the system spontaneously reboots itself or generates a Windows Stop error. Testing tools are available to test for bad memory, so this situation should be easy to diagnose after letting the system run with the testing software working through the blocks of memory. Of course, as the size of the system memory increases, so does the length of time that it takes to test it.
Network attack: The TCP/IP network stack is very complex, and with Services running over the stack, many software systems can have flaws. Many network attacks performed on Windows systems cause various buffer overflows, which then cause the system to reboot. Worms like Zotob fall into this category. Tools like TDIMon and TCPView (http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx
) can be used to view current Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) network activity on a computer, which can in turn be used to identify network attacks or problems. It is beneficial to learn how these tools work when you are not experiencing a problem because it will make it easier to identify nonstandard traffic. In most cases, keeping your system up to date on security patches and virus definitions offers a high level of protection. The only sure protection from network attacks is to not be connected to the network. For more information about network security, read through Book IX.
Automatic Updates: If your system is configured for Automatic Updates from the Microsoft Windows Update or Microsoft Update Web site, you might find that your computer reboots while you are not there. If the computer is configured to install updates in the middle of the night, you might see the next morning that it rebooted. Usually, there are warning messages leading up to the reboot although you might not be there to see them. To solve this issue, check the Windows Event Logs and the settings on Automatic Updates in the Control Panel. As part of the Windows Security Center, Windows suggests enabling Automatic Updates. Book IX, Chapter 3, has more information on the Windows Update process.
Unexpected reboots can happen for other reasons, but the preceding list describes the most common.
Blue screen errors
Blue screen errors (shown in Figure 2-3) are almost always related to driver, DLL, or configuration errors. As some people call them, “Blue Screen of Death,” or BSOD, errors are officially Stop errors. In many cases, these will be related to a recent driver change prior to the last reboot. If this is the case, try booting to the Last Known Good Configuration, which will restore the system configuration to the state it was in during the previous reboot. For more information about booting to the Last Known Good Configuration, see Book VII, Chapter 3.
Figure 2-3: Stop errors are “affectionately” known as the Blue Screen of Death.
The most common reasons for Stop errors are
Service, application, or device errors
Compatibility problems
Hardware problems
File system corruption or errors
Compatibility issues with firmware or BIOS
Viruses
Stop errors occur when one driver or application attempts to access the memory used by another driver. Because drivers operate in the unprotected kernel memory space, this is very bad for system stability. When this happens, Windows does the only thing that it can think of to prevent further corruption — it stops everything. Because applications run in protected memory spaces, they cannot write to memory being used by another application. Thus, Stop errors are caused only by access attempts in the kernel memory space.
The image referenced in Figure 2-3 is a Stop error caused by an application, and the format of the text on the screen is not the same as the Stop error you might be more familiar with, which I discuss later in this section. Alternatively, the first two lines of the Stop error message tell you the name of one of the drivers involved, or at least the memory locations involved.
Listing 2-1 shows text that might appear on a standard Stop error:
Listing 2-1: Sample Stop Error Message
*** STOP: 0x0000000A (0x802aa502, 0x00000002, 0x00000D00, 0xFA84001C)
IRQL_NOT_LESS_OR_EQUAL*** Address fa84001c has base fa840000 - i804prt.SYS
CPUID: GenuineIntel 5.2.c irql:1f SYSVER 0xF0000565
Dll Base Date Stamp - Name Dll Base Date Stamp - Name
80100000 2be154c9 - ntoskrnl.exe 80400000 2bc153b0 - hal.dll
80200000 2bd49628 - ncrc710.sys 8025c000 2bd49688 - SCSIPORT.SYS
...
Address dword dump Build [1381] - Name
fe9cdaec fa84003c fa84003c 00000000 00000000 80149905 - i8042prt.SYS
fe9cdaf8 8025dfe0 8025dfe0 ff8e6b8c 80129c2c ff8e6b94 - SCSIPORT.SYS
fe9cdb10 8013e53a 8013e53a ff8e6b94 ff8e6f60 ff8e6b94 - ntoskrnl.exe
...
Hexadecimal (the Base 16 number system) is used to display all error codes and memory addresses on the Stop error. The first line tells you the error type numerically (0x0000000A
), then the memory address of the code that was attempted to be accessed (0x802aa502
), the type of access attempted or error parameter (0x00000002
, 0x00000D00
), and the memory address that made the access attempt (0xFA84001C
). This is followed by a line providing you with a text-based description of the error and, if possible, the name of the driver that was found at that location. This driver might be the one that caused the error or the target of the error. After these lines is a section that lists the drivers found in the memory spaces near where the error occurred, then a section listing memory information for some of those memory spaces. If the error was a result of a recent change to the system, undo that change or update your drivers.
Some common Stop errors you may encounter are
0x0000000A IRQL_NOT_LESS_EQUAL
0x0000001E KMODE_EXCEPTION_NOT_HANDLED
0x00000024 NTFS_FILE_SYSTEM
0x0000002E DATA_BUS_ERROR
0x00000050 PAGE_FAULT_IN_NONPAGED_AREA
0x0000007B INACCESSIBLE_BOOT_DEVICE
0x0000007F UNEXPECTED_KERNEL_MODE_TRAP
0xC000021A STATUS_SYSTEM_PROCESS_TERMINATED
Blank screen on boot
When you boot your computer, at times you may find that you are looking at a blank screen. That may cause you to say, “What could be causing that?” Here is a short list of things that you can check:
If you do not see anything from the start of the POST: Ensure that your monitor is powered up and check all your video connections. If they are good, you could be looking at a bad video card, a defective monitor, or a completely dead computer. If this is a networked computer, verify whether the computer has completed its boot process and is visible on the network. If it is on the network, this is a video problem.
If you see POST information and then lose your video during the boot process, you know that your hardware is working; rather, you have an issue with the driver or screen resolution. Try booting to Safe Mode, which can use basic drivers and lower screen resolutions. From there, you should be able to adjust the resolution or change the driver that will be used on the next boot.
System lockup
System lockup happens when your computer stops responding to any system functions or user input. If you are using Windows XP or Vista, system lockups should be rare compared with older OSes, such as Windows 9x. More likely, you will have a period of slow responsiveness. If you are using older Windows OSes, you are likely to have complete system lockups. This has to do with how the operating systems work. Read Book VI, Chapter 2 to see how the Windows XP system architecture has been designed to prevent lockups.
If you are using Windows XP or newer Windows OS, you might experience a runaway application or Service that can cause the system to become unresponsive. This unresponsiveness might fool you into believing that your computer is locked up, when in fact it just doesn’t have enough clock cycles to pay attention to you. Press Ctrl+Alt+Del to open the security dialog box (shown in Figure 2-4), where you can launch Task Manager.
I have seen Windows XP lock completely up very few times, but there have been many other times when I have powered a machine off because it was slow to respond.
Figure 2-4: Use the Security dialog box in Windows XP to launch Task Manager.
How long you wait for an unresponsive system is typically related to the number of critical unsaved documents that you had open when it became unresponsive. Prior to turning off the computer’s power, I have waited an hour to get the dialog boxes to close Microsoft Word and save my documents. Turning off the power (as opposed to performing a graceful shutdown) is usually to be avoided because of the risk of disk corruption. This wait is not typical because in most cases if the system is unresponsive to the point that Task Manager takes more than five minutes to open, I usually hit the power button to kill the power to the computer.
The Processes tab of Task Manager lets you see which application is the runaway. You just have to look for the application with the highest CPU value. This indicates which application is hogging the CPU. You can try to switch to the application and close it normally, but there is a good chance that this will not work. If you cannot shut down the program normally, return to Task Manager, select the task, and click the End Task button. This should return your system to its previous level of responsiveness. In some cases, the application at fault will be a critical system service, like WinLogon, which cannot be stopped, so you will just have to reboot.
If you are using 16-bit Windows applications on Windows XP or Vista, you are in for a different experience. Book VI, Chapter 2 examines how the system architecture lends to a lack of system stability. This is mainly because of OS components running in the same area of the system that 16-bit applications execute. When any 16-bit application crashes, it has the potential to lock the 16-bit OS components as well, which can lock up an entire NTVDM (NT Virtual DOS Machine) but not the entire OS. If this happens, your only choice will be to terminate the NTVDM that has the problem.
Resolving Device Driver and Service Errors
Other than a system Stop error, device drivers or services might fail to load for a number of reasons:
Version: The version of the driver or service might not be compatible with the OS version.
Configuration: The driver or service could be misconfigured, and some settings require changing.
Incompatibility: The driver or service might not be compatible with some other driver, service, or application running on the system.
Dependency: The driver or service may be dependent on another driver or service that has not started up.
In most cases, the startup error for the driver will be listed in the Event Viewer error logs. Open Event Viewer from Start⇒Control Panel⇒Administrative Tools⇒Event Viewer; then click one of the three log files to view the error logs. After locating the error in Event Viewer, you may use other OS tools, such as Device Manager or Services, to correct the problem. Typically, the long-term fix for these issues involves a visit to the vendor’s Web site and downloading the latest version of the software or reviewing the vendor knowledge base for a fix to the compatibility problem. In rare cases, it might mean changing to a different version of the device or service, possibly from a different manufacturer.
Application Install, Start, and Load Failures
From time to time, you will have applications that will not start. A few reasons for this problem include corrupted or damaged shortcuts, damaged or corrupted settings, missing files, or corrupted memory space. In the following sections, I take a closer look at these three types of errors as well as application installation errors.
Corrupted shortcuts
When you create a shortcut to a program, the shortcut records information about the target file, such as size and creation date. If something happens to the original file, such as being moved to another directory, the shortcut will attempt to search the hard drive and repair the shortcut. In older versions of Windows, you will be prompted to verify that the correct file was found. If you are using Windows XP or newer OSes, automatic link tracking is enabled on shortcuts. What this means to you is that unless the file was deleted — in which case you will be asked to delete the shortcut — your shortcut links will always work, no matter how much you move a target file.
Damaged setting
Applications may store settings in a variety of locations, from the Registry to ini files to custom settings files. If the application is failing to start due to corrupted settings, those settings need to be corrected or removed. The location where you will go to resolve the issue will depend on where the settings are stored. So, you may need to locate the settings in the Registry, or edit or delete the configuration file that has the settings stored.
Missing files or dlls
If the computer user has been performing some manual disk cleanup, has been running programs from a server, or has suffered from disk corruption, required files might be missing. Depending on the error handling that the programmer is using, it may report to the user exactly which files are missing from the computer. In any case, if the running program has lost access to the files, this connection to the files needs to be restored. For users running programs from a network server, you may need to follow network troubleshooting processes to restore the network connections. For missing local files, you may need to copy the missing files from another computer with the application installed, or reinstall the application.
If the missing files are OS files rather than application files, you should look at the Emergency Repair process, covered in Chapter 3 of this minibook. Alternatively, for missing OS files, you may be required to reinstall Windows.
Corrupted memory space
Corrupted memory space occurs mostly with 16-bit Windows applications. Occasionally, 16-bit Windows applications can crash completely or partially when they are closed. When a 16-bit application crashes, it has a chance to corrupt the 16-bit Windows operating environment. If this happens, you will find that you cannot open or launch 16-bit Windows applications even though 32-bit Windows applications run fine.
If you are using older versions of Windows, you may have to reboot to allow a new 16-bit Windows environment to load. If you are using Windows XP, Windows Vista, or Windows 7, you can terminate the 16-bit Windows environment through Task Manager by following these steps:
1. Press Ctrl+Alt+Del and click the Task Manager button.
2. In Task Manager, click the Processes tab.
3. Locate the NTVDM that contains only wow.exe
, which will be indented in the column.
4. Right-click it and choose End Task.
The next time you load a 16-bit Windows application, it will reload the default 16-bit Windows environment.
Applications will not install
In some cases, you will find that applications just do not install. This may be caused by the application performing a compatibility test, which your computer fails. If this is the case, you should be notified of the compliance failure by the application. If your computer is less powerful than recommended, it should be upgraded to support the application.
In some cases, specifically with Windows Installer applications, you might not be able to perform an installation if there is a previous installation that is still pending a reboot. The older setup applications might not install if they see another copy of setup.exe
running. If this copy of setup is not expected to be running, check Task Manager to see whether it is there. Rebooting the system will correct both issues.
In addition to compliance failure, there might be a problem with how the setup program was written. Very often, the setup program itself is a 16-bit Windows-based application. If you recently had a 16-bit Windows application crash on your system and you have not corrected the corruption of the 16-bit memory space by rebooting or by terminating the NTVDM in Windows, the setup program might fail.
Solving Other Problems
In addition to the problems already discussed in this chapter, a number of other things may go wrong. In the following sections, I examine some of those problems.
General Protection Faults
General Protection Faults (GPFs) are OS-level errors. When applications are running, Windows prevents applications from interfering with each other by running them in their own memory space. However, some applications share a memory space, and these applications are mostly 16-bit applications. When a component attempts to reference memory that does not belong to it, Windows generates a GPF and attempts to prevent the improper reference by terminating the offending application.
To reduce the occurrence of GPFs, reduce the number of running 16-bit applications.
Illegal operation
An illegal operation error is similar to the GPF. In most cases, but not necessarily all, the illegal operation is a memory reference problem. When one 32-bit application attempts to reference an area of memory that belongs to another application or that it has somehow corrupted, it generates an illegal operation. When this happens, Windows treats it as a rogue or damaged application and terminates it before it can cause damage outside its own memory space.
The big difference between illegal operations and GPFs is which components are in use and affected at the time. In most cases, you can recover from an illegal operation by relaunching the application. Reducing the number of open applications reduces the chance of illegal operations because less space will be considered “out of bounds.”
Device will not function
Although, in addition to a keyboard, many devices might be attached to your computer from time to time, the three devices that most computers have are a sound card, modem, and mouse. These devices all require an IRQ and an I/O address. You will have other devices that take resources. If all the devices are Plug and Play — and you have a Plug and Play BIOS — this is not an issue. But if you are using legacy devices, you will have to configure IRQ and I/O settings for your devices. Sound cards usually use IRQ 5, and your COM ports (for your modem) are IRQ 3 and IRQ 4. If you are using a serial mouse, you want to make sure that it is not sharing its IRQ with COM 3 or COM 4. To find out more about configuring resources, read Book III, Chapter 4.
In addition to configuring the hardware support for the devices, you also have to load the appropriate driver for the device. This can be done through Add New Hardware or Add a Device in the Control Panel. Later versions of Windows, like Windows Vista and Windows 7, rely heavily on Windows Update to locate and install the correct drivers for your devices. For more information about installing devices on your computer, see Book VI, Chapter 1.
Getting an A+
This chapter goes over some problems that computer support personnel typically encounter. The problems include
Dealing with stalled print spooler and other services by restarting services
Identifying root causes for auto-restart and Stop errors
Using Event Viewer to diagnose driver problems
Dealing with application start problems in the form of shortcuts, settings, or memory space issues
Prep Test
1 When the print queue is unresponsive, what should you do first?
A Reboot the server.
B Reinstall the printer.
C Restart the spooler service.
D Redirect the printer to a remote print device.
2 What command is used to restart a service from the command prompt?
A service.exe
B services.exe
C cmdrun.exe
D net.exe
3 Only 16-bit Windows applications will produce GPFs. True or False?
A True
B False
4 What program can be used to terminate programs running on your computer?
A Device Manager
B Computer Administration Console
C System Information
D Task Manager
5 Your computer has had a Stop error, and the first two lines read
*** STOP: 0x0000000A (0x802aa502, 0x00000002, 0x00000D00, 0xFA84001C)
IRQL_NOT_LESS_OR_EQUAL*** Address fa84001c has base fa840000 - i804prt.SYS
Which of the following statements are true? (Choose all that apply.)
A The type of error was IRQL_NOT_LESS_OR_EQUAL
.
B The error is due to the BIOS version.
C The driver i804prt.sys
was involved in the error.
D The driver i804prt.sys
is located at memory address 0x802aa502
.
6 What Windows XP function keeps your shortcut links from breaking?
A Link Tracking
B Shortcut Locator Service (SLS)
C System Information
D Task Manager
7 What tool should you check first when things are not working properly and you suspect an error, but you have not seen an error message?
A System Information
B WinMSD
C Device Manager
D Event Viewer
8 During your computer’s boot process, your computer reboots automatically. There is no sign of a blue screen error. What are the most likely issues? (Choose all that apply.)
A Hardware problem with power supply
B Driver issue
C Service configuration
D Memory error
Answers
1 C. Restarting the print spooler service should fix problems with the print queue. If it does not, then you may have to restart the server. See “Dealing with a stalled print spooler.”
2 D. net.exe
is the command that can stop and start services from the command prompt. Review “Dealing with a stalled print spooler.”
3 B. This is false. 16-bit applications will GPF more often and are responsible for most, but not all, GPFs. Peruse “General Protection Faults.”
4 D. Task Manager can terminate running applications or background processes. Device Manager can be used to configure hardware devices. System Information can provide status and configuration information about most areas of your computer. Take a look at “System lockup.”
5 A, C. The error is of type 0x0000000A: IRQL_NOT_LESS_OR_EQUAL
. The error involves i804prt.sys
, which is found at memory address 0xFA840000, not 0x802aa502. There is no information suggesting that the problem may involve the system BIOS. Peek at “Blue screen errors.”
6 A. Link tracking is the OS function that keeps track of the executables that shortcuts refer to. Look over “Corrupted shortcuts.”
7 D. Event Viewer is one of the first utilities that you should use when trying to find out why your system is not responding correctly. WinMSD is the old term used for the System Information tool. Device Manager would list devices that did not start up correctly but would be unrelated to some errors that your system may experience. Study “Resolving Device Driver Errors.”
8 A, C, D. The only item that it might not be is a driver issue. Most of the time, if there is a driver issue, the driver will either not start up or will trigger a Stop (blue screen) error. Refer to “Solving Boot Errors and Errors Requiring Restarting.”