Using Event Logs

The Office Communications Server product line has always provided event logs to help identify problems and point out irregularities in the system. In Office Communications Server 2007, there is even a filtered view of server warnings and errors presented in the MMC snap-in as a tabbed overview for each server. Office Communicator 2007 has added the capability to create event logs that can be enabled from the Tools | Options menu item on the General tab by selecting Turn On Windows Event Logging For Communicator, as shown in Figure 16-1. When issues arise, the event log is the first place to turn for high-level guidance on problems that the server or client might have already identified. In general, warnings and errors will be of interest, but informational event logs will not.

Enabling Windows Event Logging For Communicator in Office Communicator

Figure 16-1. Enabling Windows Event Logging For Communicator in Office Communicator

Because the event log has the capability to filter and sort itself, it is not necessary to spend time hunting through the Application Event Log. All events logged by the client show up in the Application Event Log, and all events from servers show up in the Office Communications Server Event Log. The View menu provides the ability to filter by event source; therefore, filtering for Communicator events or a particular server role is possible, as is filtering for only warnings and errors.

Understanding Communicator Events

The following example presents a normal Office Communicator event log along with examples of problematic logs that may show up. Additional information and advice on how to resolve problems are presented with each log.

Example of an Expected Event

Normal event log entries are provided here simply to point out expected log entries so that they can be ignored during troubleshooting. The first log shown is created during every normal Office Communicator login when event logging is enabled.

Type: Informational EventID: 1020 Source: Communicator
 Description:
Communicator has enabled event logging.
Information about failed calls will be sent to the Windows event log.

Event ID 1020 is useful only for identifying that client event logs are turned on when looking for problematic warnings and errors in the event log.

Examples of Problem Events and Troubleshooting Steps

The detailed tracing log (described in more detail in the following section) is a primary means of understanding the details of problems when they occur, and if event logs are enabled in Office Communicator, it will also create event log entries when there are failures to provide diagnostic information. These entries are incredibly useful for diagnosing login problems. The following examples clearly explain the problem, point at data related to the problem, and explain the steps involved for solving the problem.

This first sample event log message (1007) occurs because Communicator is configured with an invalid server name, IConfiguredAnInvalidServerName.contoso.com. The message explains that Communicator cannot find a server with that name when attempting to resolve the hostname specified using the DNS. As is typical for most events, the actions to be taken next are clearly laid out in the event log Resolution, which greatly aids in resolving the problem.

Type: Error EventID: 1007 Source: Communicator
 Description:
Communicator was unable to resolve the DNS hostname of the login server
IConfiguredAnInvalidServerName.contoso.com.

Resolution:
If you are using manual configuration for Communicator, please check that
the server name is typed correctly and in full.  If you are using automatic configuration,
the network administrator will need to double-check the DNS
A record configuration for IConfiguredAnInvalidServerName.contoso.com
because it could not be resolved.

The second sample event log error (1015) occurs because Communicator is configured to connect to an invalid port (9999) instead of a valid port (for example, 5061) on the server. The message explains that Communicator is unable to connect (10065 is the code for Windows WSAEHOSTUNREACH error, which means that the host is unreachable on the network). Again, the steps to resolve the problem are laid out in the Resolution section in the event log text.

Type: Error EventID: 1015 Source: Communicator
 Description:
Communicator failed to connect to server srv.contoso.com (192.168.3.1)
on port 9999 due to error 10065.  The server is not listening on the
port in question, the service is not running on this machine, the
service is not responsive, or network connectivity doesn't exist.

Resolution:
Please make sure that your workstation has network connectivity.  If you
are using manual configuration, please double-check the configuration.
The network administrator should make sure that the service is running on
port 9999 on server srv.contoso.com (192.168.3.1).

The third sample event log error occurs because Communicator cannot validate the certificate credentials presented by the server to which it is connecting. The message explains that Communicator attempted to connect to sipfed.contoso.com, but received a certificate with a Subject Name (SN) that didn't match. This typically occurs because the wrong certificate is installed on the server or an unsupported name is published in the DNS and is used to connect to the server. The Resolution section offers the appropriate steps to take to move forward.

Type: Error EventID: 1008 Source: Communicator
 Description:
Communicator could not connect securely to server sipfed.contoso.com
because the certificate presented by the server did not match the expected
hostname (sipfed.contoso.com).

Resolution:
If you are using manual configuration with an IP address or a NetBIOS
shortened server name, a fully qualified server name will be required.
If you are using automatic configuration, the network administrator will
need to make sure that the published server name in DNS is supported by
the server certificate.

Some event log messages do not clearly state what needs to happen next, which can occur when the server infrastructure identifies a problem and Office Communicator simply reports the raw data that it receives from the server. Event ID 1000 is used for reporting exactly these types of scenarios, and examples of logs with this event as well as additional information about them are provided in the samples that follow. For event ID 1000, the first number in the technical data (1011 for this example) and the reason text ("Ms-Diagnostics header not provided by previous hop") are linked in that the number is simply an identifier that matches the reason text. It is not another form of error code to be analyzed.

The first sample event log message displays an unexpected failure (diagnostic event 1011 passed back from the server) during federation routing. An error is returned from the remote organization (woodgrovebank.com) by the server sip.woodgrovebank.com, but no diagnostics information is provided. Therefore, the server sipfed.contoso.com adds the ms-diagnostics header to identify the source of the failure. The problem might be transitory and not occur again later (e.g., it is handled by the client and the infrastructure by retrying the request, or a network outage will be resolved at some point), it might be a software bug in the remote implementation, or it might be an invalid request that can't be handled. In the latter case, looking at the transaction details in the Office Communicator tracing logs is the best approach to fully identify the issue.

Type: Warning EventID: 1000 Source: Communicator
 Description:
Communicator failed in an unexpected way.  More information is contained
in the following technical data:

1011;reason="Ms-Diagnostics header not provided by previous
hop";source="sipfed.contoso.com";Domain="woodgrovebank.com";PeerServer=
"sip.woodgrovebank.com"

Resolution:
If this error continues to occur, please contact your network
administrator.  The network administrator can use a tool like
winerror.exe from the Windows Resource Kit or lcserror.exe from
the Office Communications Server Resource Kit in order to interpret
any error code listed above.

The second sample event log message shows an unexpected failure (diagnostic event 1004) during internal routing. An error is returned by server5.contoso.com because it identifies a problem at the protocol level. In this case, a messaging session or some other interaction must have been in place long enough so that the server signature used to protect and validate the routing information is no longer valid. Servers rotate these keys regularly, and extremely long sessions can come up against this error, but Office Communicator should handle this error by re-establishing the session automatically.

Type: Warning EventID: 1000 Source: Communicator
 Description:
...
1004;reason="Route set is no longer valid";source="server15.contoso.com";ErrorType="A key
that was used to
sign the route set is no longer valid"
...

The third sample event log message (1013) occurs because the client and the server are not in sync on the current (universal) time. This can happen when the time is set properly but the time zone is set improperly because the system time has not been set to take daylight savings time or simply because the clock is set to the incorrect time. Authentication using Kerberos relies on clock accuracy, so the solution is to check the time and the time zones on the client and then on the server.

Type: Warning EventID: 1000 Source: Communicator
 Description:
...
1013;reason="Significant time skew detected during
authentication";source="server6.contoso.com"
...

The fourth sample event log message (1010) occurs due to certificate validation problems between two servers. This can happen because the servers are in different organizations and do not have common trust for each other's certificates, the other server's certificate isn't valid due to its name, the certificate has expired, or the remote server isn't using a server certificate. In this case, the problem is actually listed in the ErrorType field, which explains that Transport Layer Security (TLS) or Mutually Authenticated TLS (MTLS) was not able to be negotiated quickly enough. This is likely due to some networking problems (slow networks or firewall issues) or because the server went offline during the connection. Steps to resolve the problem would be to explore the remote server and the network path to that server to understand what caused the delay.

Type: Warning EventID: 1000 Source: Communicator
 Description:
...
1013;reason="Certificate trust with next-hop server could not be
established";source="sipfed.contoso.com";ErrorType="The peer did
not respond to TLS or MTLS negotiation in a timely manner"
...

The fifth sample event log message shows an unexpected failure (diagnostic event 2) with an error code that doesn't provide much explanation. The Windows calculator (calc.exe) can often be used to convert error codes into hexadecimal or decimal numbers so that a tool, such as winerror.exe or lcserror.exe, can provide a descriptive error message. Sometimes, error codes are printed in decimal numbers and need to be converted to hexadecimal numbers to display properly. At other times, a Windows system error message is wrapped with decorations and presented as a hexadecimal error code. If the tools are unable to present an error name and text, try using only the last four hexadecimal digits because it can take a bit of fooling around with calc.exe in scientific mode to back out some error codes. However, for this example, converting the error isn't necessary because it can be interpreted when passed directly to lcserror.exe: "SIPPROXY_E_AUTHENTICATION_LEG <--> The request processing was stopped to continue authentication exchange with the client through challenge response." This error message explains that sometimes the client and server get out of sync with authentication and, though the client thinks its current credentials are still valid, the server needs to prompt for authentication and cannot continue processing the request. In general, this is a harmless warning because the client will establish new credentials and the user is unlikely to even notice a problem.

Type: Warning EventID: 1000 Source: Communicator
 Description:
...
2;reason="See response code and reason
phrase";source="sipfed.contoso.com";HRESULT="C3E93D81 (SIPPROXY_E_AUTHENTICATION_LEG)"
...

Understanding Server Events

Examples of normal Office Communications Server event logs are presented to avoid confusion, along with examples of problematic logs that might arise. Additional advice on how to resolve problems is presented with each log.

When working through problems in the topology or on specific servers, a good way to minimize the scope or to quickly identify errors is to use the Validation Wizard, which is available as a tool in the Office Communications Server 2007 administration MMC snap-in. This tool checks configuration against connectivity and does the basic validation checks to point out and avoid misconfigurations in Office Communications Server 2007 settings or in network and certificate configurations that relate to the server.

Examples of Normal Events

For anyone who has looked at the number of events created in the event logs on a server as part of starting up, it can be an overwhelming experience to understand what information is actually useful and what can be ignored. In general, all informational logs can safely be ignored, but examples are provided here to explain some of the detailed information available within them. Warning and error messages are also commonly seen, and some are explained here to alleviate administrator concerns about warnings or errors that aren't fully understood. All server logs will show up in the Office Communications Server Event Log, which can be loaded as a snap-in in the MMC or launched directly as eventvwr.exe, as shown in Figure 16-2.

Office Communications Server events in Event Viewer

Figure 16-2. Office Communications Server events in Event Viewer

Examples of Normal Informational Events The following startup event log message (Event ID 14426) simply identifies that the registry key was enabled to log message bodies, as shown in Figure 16-3.

Viewing event log entries in Office Communications Server

Figure 16-3. Viewing event log entries in Office Communications Server

This isn't a great cause for concern. However, depending on your corporate security policy, it might be a privacy violation to log this type of information in tracing logs; therefore, this type of logging should be used only for short intervals during detailed diagnostic sessions if message bodies are actually required (typically, they are not). Additional information about enabling message body tracing is available in the Using Client and Server Trace Logs section later in this chapter.

Type: Informational EventID: 14426 Source: OCS Protocol Stack
 Description:
Configuration information was loaded from the registry. The registry
value is EnableMessageBodyTracing with a DWORD value of 1.

The next event log message (56001) is similar to several of the first messages in the 56000s, which identify that default settings are being used because they weren't actually stored in the Windows Management Instrumentation (WMI) store. None of these messages are of much concern other than to record the current settings that the server will be using in case of future debugging or to work with Product Support Services (PSS).

Type: Informational EventID: 56001 Source: OCS WMI Event Provider
 Description:
The WMI event provider parameter 'initial due time' was not set or
failed to be retrieved. Use the default value.

Initial due time: 300000 milliseconds

Several additional settings are shown in other server events during startup, and these are included in the following examples to demonstrate some of the configuration information available and identify the default settings for some potentially interesting detailed configurations. The first event log message shows some of the detailed default configuration used by the SIP stack.

Type: Informational EventID: 14413 Source: OCS Protocol Stack
 Description:
Proxy configuration changes were applied successfully.
Request compression on outgoing server to server connections
    [1=yes,0=no]: 0
Accept compression requests from server connections [1=yes,0=no]: 1
Accept compression requests from client connections [1=yes,0=no]: 1
Maximum number of server-to-server connections: 1024
Maximum number of client-to-server connections: 5000
Maximum outgoing TLS connections to the same server: 4
Maximum size of content body for client to server connections: 131072 bytes
Maximum size of content body for server to server connections: 5120000 bytes

The second event log message shows some of the detailed default configuration used by the User Services SIP registrar.

Type: Informational EventID: 30926 Source: OCS User Services
 Description:
User Services Global Settings configuration applied:
MinRegistrationExpiry: 300.
DefRegistrationExpiry: 600.
MaxRegistrationExpiry: 900.
MinPresenceSubscriptionExpiry: 1200.
DefPresenceSubscriptionExpiry: 28800
MaxPresenceSubscriptionExpiry: 43200
MinRoamingDataSubscriptionExpiry:900
DefRoamingDataSubscriptionExpiry: 43200
MaxRoamingDataSubscriptionExpiry: 86340
NumOfDevicesPerUser: 8
MaxSubscriptionPerUser: 200
AllowPollingForPresence: true
EnableBENotify: true
UserDomainList: false

The third event log message shows additional default configuration used by User Services, with the most interesting probably being the maximum contacts that a given user is allowed to maintain.

Type: Informational EventID: 30934 Source: OCS User Services
 Description:
Server specific User Services Settings configuration applied:
RedirectMethods: AsAppropriate
MaintenanceHourOfDay: 2
MaxContactsPerUser: 150.

As previously mentioned, most informational logs are useful only for backtracking configuration settings of the server after problems are detected or for determining whether the configuration is the same as it was expected to be (based on setting registry keys, WMI configuration, or changes in the user interface [UI]).

Examples of Normal Warning and Error Events Several warning and error event log messages can show up during initial startup of the server, which isn't really a cause for concern. The first event log message warns that a critical application or service has not yet registered with the server. This can occur due to longer startup processing times when the server is booted. As long as the server starts successfully and a subsequent information log identifies that the application finished startup, this log is not of much concern.

Type: Warning EventID: 30232 Source: OCS Applications Module
 Description:
Office Communications Server startup is pending.

Some configured critical applications have not yet registered.
Resolution:
For script only applications ensure that the application is available
in the path specified in the MMC, and that no errors are reported by the
Office Communications Server Script-Only Applications Service. For
non-script only critical applications ensure that they are configured to
register on server startup.

The next event log message warns that no Exchange Unified Messaging (UM) servers could be found in the domain. This message is a concern only if servers were installed, but this is not a mandatory part of the installation.

Type: Warning EventID: 44028 Source: OCS Exchange Unified Messaging Routing
 Description:
Exchange Unified Messaging Routing Application did not find any Exchange
UM servers.

Cause: No Exchange UM servers are configured for SIP traffic or the
RTCSRV service account does not have permission to read Exchange objects
in Active Directory.
Resolution:
Configure one or more Exchange UM servers to handle SIP traffic.

The following event log message has an error due to the failure of a service (in this case, the Address Book Server [ABS]) to start up and respond within an expected time frame. Again, this is something that can happen while the server is starting up shortly after the machine boot process when many processes are loading or on slower servers on which a lot of programs are running. As long as subsequent event log messages show that the service eventually started, this message isn't of concern.

Type: Error EventID: 12330 Source: OCS Server
 Description:
Failed starting a worker process.

Process: 'C:Program FilesOffice Communications Server 2007ServerCoreABServer.exe'
Exit Code: C3E8302D (The worker process
failed to initialize itself in the maximum allowable time.).
Cause: This could happen due to low resource conditions or insufficient
privileges.
Resolution:
Try restarting the server. If the problem persists contact Product Support
Services.

...

The next event log message warns that delta files aren't being generated. However, this is normally the case for small servers or test systems because there just aren't enough users. This message can largely be ignored and treated as an informational event log.

Type: Warning EventID: 21012 Source: OCS Address Book Server
 Description:
A delta file would be too large in relation to the full file it is based
on.  The delta file will not be generated.

Path: C:Program FilesOffice Communications Server 2007Web Components
Address Book FilesD-0909-0926.lsabs
Full File Records:  5
Delta File Records: 3
Cause: A delta file is generated in memory.  If the number of new/updated
contacts plus the number of deleted contacts is greater than a configured
percentage of the number of contacts in the base file, the delta file is
not generated because it would be quicker for the client to just download
the full file.
Resolution:
None needed.

Examples of Problem Events and Troubleshooting Steps

Many possible ways exist in which server configurations, network conditions, and invalid requests can create error conditions for the server. A few examples are included in this section both for reference and to explain what is happening.

This first error event occurs because the server didn't have a static IP address. Office Communications Server must be run on a static IP address because doing so will provide the most reliable service if its IP address doesn't change. The SIP records and uses IP addresses during routing, and a change in this address can cause failures for messaging or other active sessions. Under normal conditions, the Dynamic Host Configuration Protocol (DHCP) server (which is in charge of handing out IP addresses) does not change the IP address that it originally assigned to the Office Communications Server, but it can happen, especially across reboots or service windows. The following log helps identify that the IP address on which the server was listening no longer exists, which is an indicator that DHCP was in use.

Type: Error EventID: 14336 Source: OCS Protocol Stack Description:
A configured transport has failed to start.

Transport TLS has failed to start on local IP address 192.168.3.104 at
port 5061.Cause: This can occur due to a configuration error, low system resources
or other programs using the specified port. It can also happen if the IP
address specified has become invalid.
Resolution:
Ensure that the IP address specified is valid and that no other program
is listening on the specified port.

The second event shows a warning that is logged to alert the administrator that a server certificate (used by Office Communications Server for authentication and encryption with TLS) is close to expiring. Server certificates are valid only for a certain period of time after they are issued and then need to be replaced by a newly issued certificate. This warning helps administrators act before the certificate expires so that an unplanned outage and diagnostics can be avoided.

Type: Warning EventID: 14342 Source: OCS Protocol Stack
 Description:
The certificate configured for secure transport will expire soon.

Transport TLS on 192.168.1.103:5061 will expire on Thursday, June 05, 2007
at 09:35 Local Time.  The certificate serial number is attached for
reference.

<attached binary data shows the certificate's serial number>

The third event shows an error that will be logged if the administrator doesn't update the certificate, which will prevent the server from starting successfully. The solution for this and the previous log are the same—install a valid certificate for use by the server.

Type: Error EventID: 14341 Source: OCS Protocol Stack
 Description:
The certificate configured for secure transport has expired or is not yet
valid.

Transport TLS on 192.168.1.103:5061 will not start.  The certificate
serial number is attached for reference.
Resolution:
Renew the certificate or replace the transport with a new one.

<attached binary data shows the certificate's serial number>

Many possible types of events could arise, and all of them cannot be presented here. Use of winerror.exe and/or lcserror.exe from the Windows Server 2003 or Office Communications Server 2007 Resource Kit Tools can help explain any error codes that you come across. However, almost all of these events contain relatively detailed information to help point out the problem, identify specific data related to the problem, and offer solutions or items to investigate.

Note

Please see the Additional Resources section at the end of this chapter for instructions on obtaining the Windows Server 2003 Resource Kit Tools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset