For fault management in switches, you will depend mainly on the SNMP traps and syslog messages to tell you when hardware issues are arising in the network, versus actively going out and polling MIB objects. However, we will look at some MIB objects that you may want to actively poll for or poll for based on some event correlation, such as a syslog message or defined RMON thresholds exceeded, based on SNMP traps.
From MIB CISCO-STACK MIB, the following variables are relevant to switch failures:
chassisMinorAlarm: A minor alarm varbind within an snmp trap message.
chassisMajorAlarm: A major alarm varbind within an snmp trap message.
From the moduleTable within the CISCO-STACK MIB:
moduleStatus: The status of a module within the switch chassis.
moduleTestResult: The result of a power-on self test for a module within the switch chassis.
Only two of these MIB objects are worth polling: moduleStatus and moduleTestTesult. And they need to be actively polled based on only an SNMP trap or syslog message seen. The other two MIB objects, chassisMajorAlarm and chassisMinorAlarm, are varbinds within the SNMP trap chassisAlarmOn and chassisAlarmOff.
When the system LED status turns to red, a chassisMajorAlarm is generated. When the system LED status turns orange, a chassisMinorAlarm is generated. The trap generated will be a chassisAlarmOn trap. Included with the traps are variables that indicate whether the trap is from a chassisTempAlarm, a chassisMinorAlarm, or a chassisMajorAlarm. Decoding the trap indicates what kind of alarm generated the trap.
A chassisMajorAlarm exhibits one of the following conditions:
Any voltage failure
Simultaneous Temp and Fan failure
100 percent power supply failure (2 out of 2 or 1 out of 1)
EEPROM failure
NVRAM failure
MCP communication failure
NMP status “unknown”
A chassisMinorAlarm exhibits one of the following conditions:
Temp alarm
Fan failure
Partial power supply failure (1 out of 2)
Two power supplies of incompatible types
Based on appropriate syslog messages or SNMP traps received on your Network Management console, you can determine when you need to actively poll these MIB objects.
The following are show commands that can be used to get the same type of data points as the MIB objects mentioned previously for switch health.
The show system command for this section will “zoom” in on the system status (Sys-Status) as displayed in the output. Other components seen in this output are power supply, fan, and temperature status. The normal system status should have a value of “ok”. The only other value seen here is “faulty,” which is based on a particular alarm that triggered, either Major or Minor.
Example 10-13 shows ouput from show system, with emphasis on information regarding switch health.
Switch>show system
PS1-Status PS2-Status Fan-Status Temp-Alarm Sys-Status Uptime d,h:m:s Logout
---------- ---------- ---------- ---------- ---------- -------------- ---------
ok none ok off ok A 4,23:06:16 20 min
PS1-Type PS2-Type Modem Baud Traffic Peak Peak-Time
---------- ---------- ------- ----- ------- ---- -------------------------
WS-C5508 none disable 9600 0% 0% Wed Apr 21 1999, 15:57:24
System Name System Location System Contact
------------------------ ------------------------ ------------------------
|
“Sys-Status” (A) displays the current state of the switch based on the “health” of the processor. If there are any alarms triggered that are power-, temperature- or fan-related, the Sys-Status would be affected, in addition to the other variables. Think of the Sys-Status as the main reporting mechanism for the switch as a whole.
This command allows you to see what kind of card is installed in the switch chassis and what the status is of the line cards or supervisor cards. You can also get the module, number of ports, card model, serial number, hardware version, firmware version, and software version from this output. You can also see from this output any sub-model types typically installed on the supervisor card, such as the netflow feature card (NFFC) or the uplink modules. This data is especially prevalent on the newer Supervisor cards (Supervisor III or WS-X5530).
The focus for Example 10-14 is on the individual module Status column.
Switch> sh module
Mod Module-Name Ports Module-Type Model Serial-Num Status
--- ------------------- ----- --------------------- --------- --------- -------
1 2 100BaseFX MMF Supervi WS-X5530 011437543 ok A
2 2 MM MIC FDDI WS-X5101 003397731 ok A
6 12 10/100BaseTX Ethernet WS-X5213 003974709 ok A
Mod MAC-Address(es) Hw Fw Sw
--- -------------------------------------- ------ ---------- -----------------
1 00-e0-4f-73-8e-00 to 00-e0-4f-73-91-ff 2.0 3.1.2 4.5(1)
2 00-60-3e-cd-55-6c 1.1 1.1 3.1(1)
6 00-60-83-5d-8a-ec to 00-60-83-5d-8a-f7 1.0 1.4 4.5(1)
Mod Sub-Type Sub-Model Sub-Serial Sub-Hw
--- -------- --------- ---------- ------
1 NFFC WS-F5521 0011437958 1.1
1 uplink WS-U5533 0008588482 1.0
Mod SMT User-Data T-Notify CF-St ECM-St Bypass
--- -------------------------- -------- -------- --------- -------
2 WorkGroup Stack 30 c-Wrap-B in absent
|
The “Status” column (A) shows you the current state of the module. It can be one of the following values: ok, disable, faulty, other, standby, or error. If there is a “faulty” condition on the module, you can issue the show log or show test [mod_num] command to see why it is faulty.
From MIB CISCO-STACK-MIB TRAPS, several SNMP traps are relevant to switch failure:
chassisAlarmOn
chassisAlarmOff
moduleDown
moduleUp
A chassisAlarmOn trap signifies that the agent entity has detected the chassisTempAlarm, chassisMinorAlarm, or chassisMajorAlarm object, and this MIB has transitioned to the on(2) state. The generation of this trap can be controlled by the sysEnableChassisTraps object in this MIB or by using the CLI command set snmp trap enable chassis.
A chassisAlarmOff trap signifies that the agent entity has detected the chassisTempAlarm, chassisMinorAlarm, or chassisMajorAlarm object, and this MIB has transitioned to the off(1) state. The generation of this trap can be controlled by the sysEnableChassisTraps object in this MIB or by using the CLI command set snmp trap enable chassis.
A moduleDown trap signifies that the agent entity has detected that the moduleStatus object in this MIB has transitioned out of the ok(2) state for one of its modules. The generation of this trap can be controlled by the sysEnableModuleTraps object in this MIB or by using the CLI command set snmp trap enable module.
Refer to the Chassis Alarm MIBs previously discussed for an explanation of when a certain trap would be seen.
A moduleUp trap signifies that the agent entity has detected that the moduleStatus object in this MIB has transitioned to the ok(2) state for one of its modules. The generation of this trap can be controlled by the sysEnableModuleTraps object in this MIB or by using the CLI command set snmp trap enable chassis.
The syslog functionality was first introduced to the Catalyst series switches in software release 2.4. Table 10-5 summarizes only those messages that apply to hardware and to the variables already discussed in this section.
TIP
It is recommended to turn on timestamps on the log messages so you can correlate events to issues in the network. Using the command set logging timestamp enable will turn on the timestamps for the log messages.
Message | Explanation |
---|---|
SYS-3-MOD_FAILREASON: Module [dec] failed due to [chars][chars][chars] [chars] | This message indicates that the module [dec] has failed because of [chars]. [dec] is the module number and [chars] is one of the following: CPU Initialization Error, Memory Test Failed, Boot Checksum Verification Failed, SPROM Checksum Verification Failed, EOBC Loopback Test Failed, LTL-A Error, Flash Erase/Write Error, Pinnacle CBL Error, Pinnacle Packet Buffer Error, Pinnacle TLB Error, or Unknown or Undocumented Error. The first [chars] line is Ports disabled if the module is a non-ATM/Route Switch Module (RSM) (non-IOS). The second [chars] line is a description of the module type configured in NVRAM. The third [chars] line is a description of the module type inserted in the slot. Execute the CLI command show test [mod_num] to see what specifically failed. |
SYS-3-MOD_MINORFAIL: Minor problem in module [dec] | This message indicates that a module [dec] failed the self-test; [dec] is the module number. Execute the CLI command show test [mod_num] to see what specifically failed. |
SYS-3-MOD_FAIL: Module [dec] failed to come online | This message indicates that module [dec] failed to come online; [dec] is the module number. Execute the CLI command show module to see the status of the module. |
SYS-5-MOD_INSERT: Module [dec] has been Inserted | This message indicates that module [dec] was inserted; [dec] is the module number. This message is provided for information only. If a module is inserted and the message does not appear, this might indicate a problem. Enter the show module or show port [mod_num/port_num] command to verify that the system has acknowledged the module and brought it online. |
SYS-5-MOD_REMOVE: Module [dec] has been Removed | This message indicates that module [dec] was removed; [dec] is the module number. This message is provided for information only. If a module is removed and the message does not appear, this might indicate a problem. Enter the show port [mod_num/port_num] command to query the module. The system should respond as follows: Module n is not installed. |
SYS-5-SYS_RESET: System reset from [chars] | This message indicates that the system was reset from [chars]; [chars] is a console number if the request is from a console session or IP address if the request is from a Telnet session or SNMP. |
SYS-5-MOD_OK: Module [dec] is online | This message indicates that module [dec] passed diagnostic self-test and is online; [dec] is the module number. Usually seen after the SYS-5-SYS_RESET message occurs if modules are working properly. |