Chapter 21. Troubleshooting Problems

The Microsoft Office Communications Server 2007 R2 product suite builds on the functionality and features of Microsoft Office Communications Server 2007. However, to enable these new functions and feature sets, the product requires many complex and sophisticated technologies distributed across several machines on a variety of networks with different software components installed on each. Inconsistent local and global settings, network connectivity, and external dependency failures can cause problems that can be difficult to troubleshoot. The goal of this chapter is to provide a systematic troubleshooting process to guide architects and administrators through hurdles that may be encountered during the initial deployment. It is also intended to provide a framework that experts can use to explore more complex issues.

On the Companion Media

On the Companion Media

You can find links to additional resources on the topics addressed in this chapter and book on this book’s companion CD.

Troubleshooting Process

This chapter identifies common problems and illustrates effective ways to diagnose a problem’s source. This troubleshooting framework is engineered to isolate the component(s) responsible for the problem so that the steps needed to find a resolution may be properly researched. Guidelines for the safe implementation of the changes required for resolution are also included.

The remainder of the chapter focuses on problem scenarios and includes specific ways to gather information to resolve the most common issues. For reference, most of the tools and resources that are used to work through these problems have been described in Chapter 20, which also includes information on where to locate them.

Determining the Root Cause

The purpose of troubleshooting an issue is to determine the root cause, which starts by understanding the issue’s symptoms. Often the symptoms reported can mask a deeper issue; an analysis of the symptoms will help isolate the components causing the problem. Once the faulty component is isolated, collect the relevant logs and use them to research the root cause. It is important to isolate the troubled component prior to starting the data collection process—this prevents information overload. If the problem appears complex or involves multiple components, the collection of preliminary data may be necessary to assist with root cause determination.

The next few sections outline the various steps of root cause determination.

Understanding the Symptoms

The first step in the troubleshooting process is to document and understand the symptoms. Common examples of symptoms are not being able to connect to a Web conference, a desktop sharing attempt fails, or less noticeable issues arise such as incorrect presence. This step can lead an architect or administrator to the wrong conclusion if the symptoms first reported were incorrect or inconsistent. In resolving problems in a complex, distributed environment, understanding the full nature of the symptoms is key to reducing the time and effort spent determining the next steps in resolving the problems.

During symptom review, start to look for underlying relationships. Common correlations to look for:

  • Determine which users are impacted. Are they in the same Active Directory organizational unit (OU)? Do they belong to the same security group? Are they not enabled for a particular functionality? Are they in the same physical location?

  • Determine when the issue occurs. Does it occur the same time every day? Does the issue happen immediately after another process, such as a backup, completes? Does the issue occur during peak hours?

  • Determine whether the issue is related to a specific server. Are users experiencing the same problem homed on the same pool? Are call failures routed to the same Mediation Server?

Finding these correlations is often key to determining the problem component(s). Remember to keep track of your findings via documentation to prevent duplication of work and to facilitate the sharing of information with team members.

Collecting Information

Once the symptoms are fully understood, begin collecting relevant data to further isolate the components by using tools such as installation logs, event logs, validation wizards, the Best Practices Analyzer (BPA), Client Logging (UCCAPI logs), and Server Logging (Office Communications Server logger). Detailed information regarding these tools, including their location and how to use them, can be found in Chapter 20.

Installation Logs

During installation, the Office Communications Server 2007 R2 setup process generates two sets of logs that can be used to resolve various problems related to setup and installation. These setup logs are located under the %temp% directory and have the .log or .html extension.

Event Logs

Office Communications Server reports a lot of information in the event logs that is extremely helpful when locating and resolving problems. With the addition of the MS-Diagnostic headers introduced in Office Communications Server 2007, the event logs can help pinpoint which component has failed and on which server. If the event logs cannot isolate the exact component that is experiencing the issues, they should at least isolate the server on which the error was generated, reducing the amount of information that needs to be collected.

Following is a list of common event logs used to troubleshoot Office Communications Server 2007 R2 issues and the types of problems they help resolve.

  • System logs:

    • Certificate failures

    • Logon failures

    • Distributed Component Object Model (DCOM) (Microsoft Management Console [MMC] administration errors)

  • Application logs:

    • Back-end database issues

  • Office Communications Server logs:

    • Service failing to start

    • Certificate problems

    • Domain Name System (DNS) issues

    • Server Trust failures

Validation Wizards

The validation wizards are built directly into the setup process as well as the Office Communications Server 2007 R2 Management Console (Admin Tools). They are best used for resolving issues with initial installation as well as validating that the current environment is operating correctly.

Best Practices Analyzer

The Office Communications Server 2007 BPA is a diagnostic tool that gathers configuration information from an Office Communications Server 2007 and R2 environment and determines whether the configuration abides by Microsoft best practices. Rule updates for this tool are being released with Office Communications Server 2007 R2 to accommodate the new server roles and functionality found in this release. This tool is often one of the default tools used to collect information needed to resolve issues in an established Office Communications Server 2007 R2 environment. Here are some troubleshooting areas in which the BPA can assist:

  • Perform health checks proactively, verifying that the configuration is set according to recommended best practices

  • Generate a list of issues, such as suboptimal, unsupported, or ill-advised configuration settings

  • Judge the general health of a system

  • Help troubleshoot specific problems

  • Prompt you to download updates if they are available

  • Provide online and local documentation about reported issues, including troubleshooting tips

  • Generate configuration information that can be captured for later review

Client Logging (UCCAPI LOGS)

When a server is experiencing a high volume of traffic, it is difficult to collect and analyze server logs pertaining to the failure of a small number of clients. When the issue involves a modest client population, client-side logs available in Microsoft Office Communicator, Office Communicator 2007 R2 Attendant, Group Chat Console, and the Microsoft Office Live Meeting client are a great place to start. For details on how to configure logging, see Chapter 20.

Server Logging (Office Communications Server Logger) Server Logging is valuable when troubleshooting problems because it provides the most detailed information in a single place. However, due to the fact that servers often process requests for thousands of users at a time, the logs expand quickly and it can be difficult to find the specific information that you are looking for. Depending on the problem and the volume of the server, going straight to Server Logging can be as effective as looking for a needle in a haystack. Some data can be collected only at the server, so if the appropriate information cannot be collected from other easier-to-review sources, you can use Office Communications Server Logger and Snooper.

When reviewing a large server log, remember to also collect the logs from affected client(s) at the same time. Use the smaller client log to find the information specific to the issue, then filter the server log accordingly to obtain the detailed information you need.

For more information on how to install and use the Office Communications Server Logger and Snooper tools, refer to Chapter 20.

Reducing Complexity

Often while working through issues in complex deployments of Office Communications Server 2007 R2, it becomes difficult to find the relevant data to be able to troubleshoot the problem due to the number of servers and components involved. In these situations, it is necessary to reduce the complexity of the environment by removing redundant or unnecessary server roles. Examples of reducing complexity are:

  • When working on resolving intermittent nondelivery reports for sent instant messages against a pool with two servers behind a load balancer, reducing complexity could mean turning one of the servers off or changing the client to manual configuration and connecting directly to one of the servers.

  • For failed external communication or remote access in an environment containing a Director, reducing complexity could be configuring the Edge Server to route directly to the pool, thus reducing the number of servers in the communication path.

The objective of reducing complexity is not to put the scalability or redundancy of your Office Communications Server 2007 R2 infrastructure at risk. Once you have resolved the problem, any modifications made to the infrastructure for the purpose of isolating the problem must be restored to their original state. It is recommended to make modifications in a controlled manner that is recorded in detail to ensure no new problems are introduced into the environment and steps can be retraced.

Isolating the Component

Now that we have collected relevant data concerning the problem, the next step in the process is to use this information to identify the actual Office Communications Server component that is failing. Success in this step requires an understanding of how the individual components interact with each other.

Knowing how the components work when everything is operating optimally makes it easier to identify failing components. One way of building a better understanding of this interaction is to collect detailed logging from the Office Communications Server environment while performing tasks, such as joining a Web conference. Beginning to thoroughly review these logs will help build experience and knowledge on how the different components communicate and interact.

Issues can occur in a variety of components. The following list can help you narrow down the scope of components where the root cause might be occurring. This will help to further simplify your investigation. This list is not comprehensive.

  • Client issue

    • Authentication

    • Connectivity

    • Certificate validation

  • Server issues

    • Certificates

    • Active Directory

    • Front-end services

    • Address Book services

    • Web Components

    • Data conferencing

    • Enterprise Voice

    • Server applications

  • Networking issues

    • Blocked ports

    • Improper network load balancer configuration

    • Missing or incorrect DNS records

  • Enterprise Voice

    • Public Switched Telephone Network (PSTN) gateways

    • IP Private Branch eXchange (PBX) interoperability

    • Number normalization

    • Inbound/outbound routing

Researching Your Findings

Based on the information collected, use keywords from the logs to research the common body of knowledge for this problem. This research can take multiple paths such as reading blogs, searching Knowledge Bases (KBs), and leveraging search engines, such as http://www.live.com. The objective is to determine the root cause of your issue and possible solutions to resolve it.

Over the years, a large open online community has developed around the Office Communications suite of products and is available to help you identify similar problems and offer solutions that others have used. This support community can save you a considerable amount of time, effort, and money by providing information found in real-world scenarios that goes beyond the technical documentation.

Following is a list of some of the most active and valuable online resources that can be used for research:

  • The Microsoft Support Web site, located at http://go.microsoft.com/fwlink/?LinkId=136417, is the best resource for finding commonly reported problems and Microsoft-supported resolutions. Microsoft KB articles are a public compilation of known issues, causes, workarounds, patches, and hotfixes known to Microsoft support. Note that some hotfixes require you to contact Microsoft Support before you are can download the files. These additional steps enable Microsoft to track all customers that are using the fix and also ensure that the problem you are experiencing will indeed be resolved by the requested hotfix.

  • The TechNet Unified Communications Forum, located at http://go.microsoft.com/fwlink/?LinkID=133756, is an excellent resource for getting assistance with Office Communications Server 2007 R2 and Office Communicator 2007 R2. This forum assists in all aspects of Microsoft Unified Communications (UC), including server deployment, client deployment, troubleshooting, telephony, customization, administration, and monitoring.

  • The Microsoft Office Communications Server Team Blog, located at http://go.microsoft.com/fwlink/?LinkID=133634, is a resource from the product group members for many topics that are being discussed or addressed for the first time. You will find many articles that are posted will eventually become Microsoft KB articles or provide a more through explanation than a KB article allows.

  • The Microsoft Office Communicator Team Blog, located at http://go.microsoft.com/fwlink/?LinkID=133635, is an excellent resource from the product group members for many topics that are being discussed or addressed for the first time. You will find many of the postings will become Microsoft KB articles or go into much more detail than existing KB articles. Additional topics covered are UC devices, sample scripts, and usage best practices.

If you are still not able to isolate the root cause, some external support options are available to you. You can contact one of the Office Communications Server 2007 R2 certified partners (http://go.microsoft.com/fwlink/?LinkID=133699) or Microsoft Support Services (http://go.microsoft.com/fwlink/?LinkId=136417).

The benefit of taking either of these options is that they enable you to work directly with engineers and architects who have an extensive understanding of Office Communications Server 2007 R2. In some rare situations, based on the criticality of the problem and time needed to resolve the problem, it might be beneficial to immediately engage with an outside support organization, whether that organization is a UC certified partner or Microsoft Support Services.

When working with Microsoft Support Services or a UC certified partner, here are several things to keep in mind to help expedite the process:

  • Ensure that you have the necessary permissions to access the resources needed to troubleshoot the environment.

  • Provide as much detail as possible concerning the environment (see the section titled "Collecting Information" earlier in this chapter)—infrastructure diagrams, BPA logs, server and client logs, and so on. This will help the certified partner or Microsoft Support to help you quickly resolve the problem.

  • Provide the team with your documented symptoms, as well as steps that were previously taken in attempting to resolve the problem.

Our intended goal of this chapter is provide you with information and guidance to avoid the need to rely on external support; however, it is always good to know what to do if the need arises.

Resolving the Issue

Once you have identified the root cause, you must determine how to resolve the problem. Prior to implementing a change, here are a few best practices to keep in mind:

  • Perform a backup of the system you’ll be modifying or ensure you have a recent backup.

  • Implement one change at a time. Often in troubleshooting an issue, you will identify multiple solutions to a problem, and it is only normal to want to implement all of the changes at the same time. However, this is not advisable because it will not be possible to determine which solution resolved the problem or, worse, one of the steps might introduce a new issue that can cause further frustration. By implementing only one change at a time, you can better determine the effects of the change and quickly back out the change if it creates an adverse result.

  • Once the change is implemented, test the entire environment to ensure that not only is the problem resolved, but no new problems are created.

  • Once you have tested the environment and have verified that the problem is resolved and no new issues have appeared, back up the entire environment. This step helps ensure you have a valid backup.

  • As a final step, document the implemented changes and update any necessary recorded baselines. This helps provide documentation for the future in the event that you need to repeat the steps if you encounter a similar situation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset