Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 21. Troubleshooting Problems

The Microsoft Office Communications Server 2007 R2 product suite builds on the functionality and features of Microsoft Office Communications Server 2007. However, to enable these new functions and feature sets, the product requires many complex and sophisticated technologies distributed across several machines on a variety of networks with different software components installed on each. Inconsistent local and global settings, network connectivity, and external dependency failures can cause problems that can be difficult to troubleshoot. The goal of this chapter is to provide a systematic troubleshooting process to guide architects and administrators through hurdles that may be encountered during the initial deployment. It is also intended to provide a framework that experts can use to explore more complex issues.

On the Companion Media

You can find links to additional resources on the topics addressed in this chapter and book on this book’s companion CD.

Troubleshooting Process

This chapter identifies common problems and illustrates effective ways to diagnose a problem’s source. This troubleshooting framework is engineered to isolate the component(s) responsible for the problem so that the steps needed to find a resolution may be properly researched. Guidelines for the safe implementation of the changes required for resolution are also included.

The remainder of the chapter focuses on problem scenarios and includes specific ways to gather information to resolve the most common issues. For reference, most of the tools and resources that are used to work through these problems have been described in Chapter 20, which also includes information on where to locate them.

Determining the Root Cause

The purpose of troubleshooting an issue is to determine the root cause, which starts by understanding the issue’s symptoms. Often the symptoms reported can mask a deeper issue; an analysis of the symptoms will help isolate the components causing the problem. Once the faulty component is isolated, collect the relevant logs and use them to research the root cause. It is important to isolate the troubled component prior to starting the data collection process—this prevents information overload. If the problem appears complex or involves multiple components, the collection of preliminary data may be necessary to assist with root cause determination.

The next few sections outline the various steps of root cause determination.

Understanding the Symptoms

The first step in the troubleshooting process is to document and understand the symptoms. Common examples of symptoms are not being able to connect to a Web conference, a desktop sharing attempt fails, or less noticeable issues arise such as incorrect presence. This step can lead an architect or administrator to the wrong conclusion if the symptoms first reported were incorrect or inconsistent. In resolving problems in a complex, distributed environment, understanding the full nature of the symptoms is key to reducing the time and effort spent determining the next steps in resolving the problems.

During symptom review, start to look for underlying relationships. Common correlations to look for:

Determine which users are impacted. Are they in the same Active Directory organizational unit (OU)? Do they belong to the same security group? Are they not enabled for a particular functionality? Are they in the same physical location?
Determine when the issue occurs. Does it occur the same time every day? Does the issue happen immediately after another process, such as a backup, completes? Does the issue occur during peak hours?
Determine whether the issue is related to a specific server. Are users experiencing the same problem homed on the same pool? Are call failures routed to the same Mediation Server?

Finding these correlations is often key to determining the problem component(s). Remember to keep track of your findings via documentation to prevent duplication of work and to facilitate the sharing of information with team members.

Collecting Information

Once the symptoms are fully understood, begin collecting relevant data to further isolate the components by using tools such as installation logs, event logs, validation wizards, the Best Practices Analyzer (BPA), Client Logging (UCCAPI logs), and Server Logging (Office Communications Server logger). Detailed information regarding these tools, including their location and how to use them, can be found in Chapter 20.

Installation Logs

During installation, the Office Communications Server 2007 R2 setup process generates two sets of logs that can be used to resolve various problems related to setup and installation. These setup logs are located under the %temp% directory and have the .log or .html extension.

Event Logs

Office Communications Server reports a lot of information in the event logs that is extremely helpful when locating and resolving problems. With the addition of the MS-Diagnostic headers introduced in Office Communications Server 2007, the event logs can help pinpoint which component has failed and on which server. If the event logs cannot isolate the exact component that is experiencing the issues, they should at least isolate the server on which the error was generated, reducing the amount of information that needs to be collected.

Following is a list of common event logs used to troubleshoot Office Communications Server 2007 R2 issues and the types of problems they help resolve.

System logs:
- Certificate failures
- Logon failures
- Distributed Component Object Model (DCOM) (Microsoft Management Console [MMC] administration errors)
Application logs:
- Back-end database issues
Office Communications Server logs:
- Service failing to start
- Certificate problems
- Domain Name System (DNS) issues
- Server Trust failures

Direct from the Source: Event Log Tips

Nirav Kamdar

Senior Development Lead, Office Communications Server

Office Communications Server services usually generate more than one event log. Generally speaking, the first event log contains the most information, but it is beneficial to go through all of the error event logs. I often see people look at the top-most event log and give up because it is a general event log that might not point out enough details on its own to completely identify the problem.

Validation Wizards

The validation wizards are built directly into the setup process as well as the Office Communications Server 2007 R2 Management Console (Admin Tools). They are best used for resolving issues with initial installation as well as validating that the current environment is operating correctly.

Best Practices Analyzer

The Office Communications Server 2007 BPA is a diagnostic tool that gathers configuration information from an Office Communications Server 2007 and R2 environment and determines whether the configuration abides by Microsoft best practices. Rule updates for this tool are being released with Office Communications Server 2007 R2 to accommodate the new server roles and functionality found in this release. This tool is often one of the default tools used to collect information needed to resolve issues in an established Office Communications Server 2007 R2 environment. Here are some troubleshooting areas in which the BPA can assist:

Perform health checks proactively, verifying that the configuration is set according to recommended best practices
Generate a list of issues, such as suboptimal, unsupported, or ill-advised configuration settings
Judge the general health of a system
Help troubleshoot specific problems
Prompt you to download updates if they are available
Provide online and local documentation about reported issues, including troubleshooting tips
Generate configuration information that can be captured for later review

Direct from the Source: Diagnosing Active Directory Errors

Yong Zhao

Software Developer on the Office Communications Server Team

When collecting data, if you receive an Active Directory error (that has a symbolic name starting with ERROR_DS_*, for example, ERROR_DS_NO_SUCH_ OBJECT) during deployment, the first thing to check is which account credentials are being used. You can accomplish this by using the following Windows command-line tool (Whoami.exe is included in Microsoft Windows 2008 and can be found in the Support Tools of earlier versions of Windows):

whoami /groups

Using the output of the tool, verify that the RTCUniversalServerAdmins group shows up in the list. If it does, use the tools Nltest.exe and NetDiag.exe to rule out problems with domain controllers (DCs) and the local machine network. (These tools are included in Microsoft Windows Vista and Windows Server 2008 and can be found in the Resource Kit of earlier versions.)

Nltest.exe and NetDiag.exe are Windows server tools that allow for testing of basic functionality of the operating system. Nltest provides a large number of command-line functions, such as retrieving the names of servers that provide a given role (global catalog, PDC emulator). NetDiag can test and diagnose network connectivity issues, narrowing down where the problem might be in a given failure state.

Their usage is shown in the following examples:

Nltest.exe /DsGetDC:[Root Domain FQDN] /GC – Outputs the GC of the root domain
Nltest.exe /DsGetDC:[Root Domain FQDN] /PDC – Outputs the PDC of the root domain
Nltest.exe /DsGetDC:[Local Domain FQDN] /PDC – Outputs the PDC of the local domain
NetDiag.exe – runs through all tests that NetDiag can perform on the local system

The root and local domain fully qualified domain names (FQDNs; shown in the following example as [Root Domain FQDN] and [Local Domain FQDN]) must be replaced with the root DC FQDN for the top-level domain and DC FQDN for the domain from which you are running the commands. Examples of domain names are litwareinc.com for the root domain and eng.litwareinc.com for the local domain, with the command sequence being:

Nltest.exe /DsGetDC:litwareinc.com /GC
Nltest.exe /DsGetDC:litwareinc.com /PDC
Nltest.exe /DsGetDC:eng.litwareinc.com /PDC
NetDiag.exe

Client Logging (UCCAPI LOGS)

When a server is experiencing a high volume of traffic, it is difficult to collect and analyze server logs pertaining to the failure of a small number of clients. When the issue involves a modest client population, client-side logs available in Microsoft Office Communicator, Office Communicator 2007 R2 Attendant, Group Chat Console, and the Microsoft Office Live Meeting client are a great place to start. For details on how to configure logging, see Chapter 20.

Direct from the Source: Extracting Errors from Communicator Logs

Joel Schaeffer

Escalation Engineer, Product Support Services

If you are using Office Communicator logging to troubleshoot issues, you can extract "error" entries quickly by using FINDSTR. The syntax is as follows:

findstr /I error %userprofile%	racing<log file> <output file>

For example, you might use the following command:

findstr /I error %userprofile%	racingCommunicator-uccp-0.uccplog
errors.txt

The errors.txt file will contain all log entries tagged as ERROR. You can then find the entry in the source log file and place the error in its proper context. You should then be able to discern what happened before and after the error was thrown.

Server Logging (Office Communications Server Logger) Server Logging is valuable when troubleshooting problems because it provides the most detailed information in a single place. However, due to the fact that servers often process requests for thousands of users at a time, the logs expand quickly and it can be difficult to find the specific information that you are looking for. Depending on the problem and the volume of the server, going straight to Server Logging can be as effective as looking for a needle in a haystack. Some data can be collected only at the server, so if the appropriate information cannot be collected from other easier-to-review sources, you can use Office Communications Server Logger and Snooper.

When reviewing a large server log, remember to also collect the logs from affected client(s) at the same time. Use the smaller client log to find the information specific to the issue, then filter the server log accordingly to obtain the detailed information you need.

For more information on how to install and use the Office Communications Server Logger and Snooper tools, refer to Chapter 20.

Direct from the Source: Resolving Mysterious Error Codes

Jason Epperly

Escalation Engineer, Product Support Services

As you are troubleshooting any product, you are inevitably confronted by a mysterious error code. What do you do with this mysterious error code? Microsoft has published a tool named Microsoft Exchange Server Error Code Look-up (err.exe) that is available for download at http://go.microsoft.com/fwlink/?LinkID=133755.

First, disregard the fact that the tool states that it is for use with Microsoft Exchange. This message is included only because the Exchange team published the tool. The tool is not Exchange-specific. At its most basic level, Err.EXE maps return codes (for example, 0x54F) to symbolic names (such as ERROR_INTERNAL_ERROR).

ERR takes into account that you don’t always know whether your input value is hexadecimal or decimal. If there’s any ambiguity in how the error code is specified, ERR returns multiple results to show the hexadecimal (base 16) and decimal (base 10) error code symbolic names. For example, if the command err.exe 10 was run, the parameter ‘10’ could represent a decimal value of 10 (ERROR_BAD_ENVIRONMENT) or a hexadecimal value of 0x10 (ERROR_CURRENT_DIRECTORY), so running ERR 10 produces both errors.

Mapping between symbolic names and error codes goes both ways—you can search for an error code by name as well as by ID, which can be useful when you need an error code for a script or cannot remember the exact name of a symbolic constant. To find an error code by its exact name, preface that name with an equal sign on the command line (err =ERROR_BAD_ENVIRONMENT). The search is case insensitive. ERR doesn’t call the operating system (such as via FormatMessage) to look up errors in any way because all of its error codes are kept in internal tables within the binary. This keeps ERR from depending on one operating system or another to produce the right results. In most cases, the symbolic name that is relative to Windows error codes represents the errors associated with the winerror.h table.

Reducing Complexity

Often while working through issues in complex deployments of Office Communications Server 2007 R2, it becomes difficult to find the relevant data to be able to troubleshoot the problem due to the number of servers and components involved. In these situations, it is necessary to reduce the complexity of the environment by removing redundant or unnecessary server roles. Examples of reducing complexity are:

When working on resolving intermittent nondelivery reports for sent instant messages against a pool with two servers behind a load balancer, reducing complexity could mean turning one of the servers off or changing the client to manual configuration and connecting directly to one of the servers.
For failed external communication or remote access in an environment containing a Director, reducing complexity could be configuring the Edge Server to route directly to the pool, thus reducing the number of servers in the communication path.

The objective of reducing complexity is not to put the scalability or redundancy of your Office Communications Server 2007 R2 infrastructure at risk. Once you have resolved the problem, any modifications made to the infrastructure for the purpose of isolating the problem must be restored to their original state. It is recommended to make modifications in a controlled manner that is recorded in detail to ensure no new problems are introduced into the environment and steps can be retraced.

Isolating the Component

Now that we have collected relevant data concerning the problem, the next step in the process is to use this information to identify the actual Office Communications Server component that is failing. Success in this step requires an understanding of how the individual components interact with each other.

Knowing how the components work when everything is operating optimally makes it easier to identify failing components. One way of building a better understanding of this interaction is to collect detailed logging from the Office Communications Server environment while performing tasks, such as joining a Web conference. Beginning to thoroughly review these logs will help build experience and knowledge on how the different components communicate and interact.

Issues can occur in a variety of components. The following list can help you narrow down the scope of components where the root cause might be occurring. This will help to further simplify your investigation. This list is not comprehensive.

Client issue
- Authentication
- Connectivity
- Certificate validation
Server issues
- Certificates
- Active Directory
- Front-end services
- Address Book services
- Web Components
- Data conferencing
- Enterprise Voice
- Server applications
Networking issues
- Blocked ports
- Improper network load balancer configuration
- Missing or incorrect DNS records
Enterprise Voice
- Public Switched Telephone Network (PSTN) gateways
- IP Private Branch eXchange (PBX) interoperability
- Number normalization
- Inbound/outbound routing

Researching Your Findings

Based on the information collected, use keywords from the logs to research the common body of knowledge for this problem. This research can take multiple paths such as reading blogs, searching Knowledge Bases (KBs), and leveraging search engines, such as http://www.live.com. The objective is to determine the root cause of your issue and possible solutions to resolve it.

Over the years, a large open online community has developed around the Office Communications suite of products and is available to help you identify similar problems and offer solutions that others have used. This support community can save you a considerable amount of time, effort, and money by providing information found in real-world scenarios that goes beyond the technical documentation.

Following is a list of some of the most active and valuable online resources that can be used for research:

The Microsoft Support Web site, located at http://go.microsoft.com/fwlink/?LinkId=136417, is the best resource for finding commonly reported problems and Microsoft-supported resolutions. Microsoft KB articles are a public compilation of known issues, causes, workarounds, patches, and hotfixes known to Microsoft support. Note that some hotfixes require you to contact Microsoft Support before you are can download the files. These additional steps enable Microsoft to track all customers that are using the fix and also ensure that the problem you are experiencing will indeed be resolved by the requested hotfix.
The TechNet Unified Communications Forum, located at http://go.microsoft.com/fwlink/?LinkID=133756, is an excellent resource for getting assistance with Office Communications Server 2007 R2 and Office Communicator 2007 R2. This forum assists in all aspects of Microsoft Unified Communications (UC), including server deployment, client deployment, troubleshooting, telephony, customization, administration, and monitoring.
The Microsoft Office Communications Server Team Blog, located at http://go.microsoft.com/fwlink/?LinkID=133634, is a resource from the product group members for many topics that are being discussed or addressed for the first time. You will find many articles that are posted will eventually become Microsoft KB articles or provide a more through explanation than a KB article allows.
The Microsoft Office Communicator Team Blog, located at http://go.microsoft.com/fwlink/?LinkID=133635, is an excellent resource from the product group members for many topics that are being discussed or addressed for the first time. You will find many of the postings will become Microsoft KB articles or go into much more detail than existing KB articles. Additional topics covered are UC devices, sample scripts, and usage best practices.

If you are still not able to isolate the root cause, some external support options are available to you. You can contact one of the Office Communications Server 2007 R2 certified partners (http://go.microsoft.com/fwlink/?LinkID=133699) or Microsoft Support Services (http://go.microsoft.com/fwlink/?LinkId=136417).

The benefit of taking either of these options is that they enable you to work directly with engineers and architects who have an extensive understanding of Office Communications Server 2007 R2. In some rare situations, based on the criticality of the problem and time needed to resolve the problem, it might be beneficial to immediately engage with an outside support organization, whether that organization is a UC certified partner or Microsoft Support Services.

When working with Microsoft Support Services or a UC certified partner, here are several things to keep in mind to help expedite the process:

Ensure that you have the necessary permissions to access the resources needed to troubleshoot the environment.
Provide as much detail as possible concerning the environment (see the section titled "Collecting Information" earlier in this chapter)—infrastructure diagrams, BPA logs, server and client logs, and so on. This will help the certified partner or Microsoft Support to help you quickly resolve the problem.
Provide the team with your documented symptoms, as well as steps that were previously taken in attempting to resolve the problem.

Our intended goal of this chapter is provide you with information and guidance to avoid the need to rely on external support; however, it is always good to know what to do if the need arises.

Resolving the Issue

Once you have identified the root cause, you must determine how to resolve the problem. Prior to implementing a change, here are a few best practices to keep in mind:

Perform a backup of the system you’ll be modifying or ensure you have a recent backup.
Implement one change at a time. Often in troubleshooting an issue, you will identify multiple solutions to a problem, and it is only normal to want to implement all of the changes at the same time. However, this is not advisable because it will not be possible to determine which solution resolved the problem or, worse, one of the steps might introduce a new issue that can cause further frustration. By implementing only one change at a time, you can better determine the effects of the change and quickly back out the change if it creates an adverse result.
Once the change is implemented, test the entire environment to ensure that not only is the problem resolved, but no new problems are created.
Once you have tested the environment and have verified that the problem is resolved and no new issues have appeared, back up the entire environment. This step helps ensure you have a valid backup.
As a final step, document the implemented changes and update any necessary recorded baselines. This helps provide documentation for the future in the event that you need to repeat the steps if you encounter a similar situation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 21. Troubleshooting Problems

Create new playlist

Sign In

Sign Up