Sensors are physical probes that check the health and status of hardware. Manufacturers have put more and more sensors in hardware, providing low-level hardware information to the operating systems. OpenBSD supports a wide variety of hardware sensors, and uses the sensorsd
daemon to query them and act upon error states.
Resolving many hardware errors requires shutting down the machine, but advance warning that a component has stopped working changes a hardware failure from an unexpected middle-of-the-day catastrophe to an after-hours annoyance. Some hardware, such as hot-swappable hard drives, can be replaced without interrupting service once you know the hardware has failed.
Each physical sensor has a device driver. The device driver extracts information from the hardware and publishes it in a sysctl (discussed in Chapter 18). sensorsd
reads the sysctl values and can act when they change or cross critical values. For example, here are the sensor-related sysctl values from my laptop:
$ sysctl hw.sensors
hw.sensors.acpitz0.temp0=67.00 degC (zone temperature)
hw.sensors.acpiac0.indicator0=On (power supply)
hw.sensors.acpibat0.volt0=11.10 VDC (voltage)
hw.sensors.acpibat0.volt1=12.35 VDC (current voltage)
hw.sensors.acpibat0.power0=0.00 W (rate)
hw.sensors.acpibat0.watthour0=2.61 Wh (last full capacity)
hw.sensors.acpibat0.watthour1=0.30 Wh (warning capacity)
hw.sensors.acpibat0.watthour2=0.06 Wh (low capacity)
hw.sensors.acpibat0.watthour3=9.57 Wh (remaining capacity), OK
hw.sensors.acpibat0.raw0=2 (battery full), OK
hw.sensors.cpu0.temp0=81.00 degC
This comparatively simple and generic hardware has two temperature sensors and all kinds of power sensors. You can get hundreds of lines of sensor output, depending on your hardware.
Many RAID controllers have their own sensors, and will report when an array has failed. Here, we see three virtual disks provided by an AMI RAID controller:
hw.sensors.ami0.drive0=online (sd0), OK hw.sensors.ami0.drive1=degraded (sd1), WARNING hw.sensors.ami0.drive2=failed (sd2), CRITICAL
If you didn’t have sensors, you would need to look at the blinking lights on the drive enclosure. Or you could listen for the really annoying “beep, beep, beep,” which is so easy to hear over the roar of 5,000 server fans, the air conditioners, and someone else’s hardware that has been beeping every time you’ve come in for the last six months.
Some sensors require the Intelligent Platform Management Interface (IPMI). This is a kernel feature that’s disabled by default in OpenBSD, because it makes some machines behave really badly. Chapter 18 discusses enabling IPMI.
The device drivers attach to sensors automatically, and the values get into the kernel automatically, but to do anything with these results in any automated manner, you need sensorsd(8)
, or you need to configure an external SNMP-based management system and use snmpd(8)
. We’ll look at using sensorsd(8)
here. Using snmpd(8)
is discussed in Chapter 16.
The sensors daemon sensorsd(8)
watches sensor monitoring data. It logs changes and can execute commands if needed. Because all hardware is different and all environments are different, by default, sensorsd
notices changes only in sensor readings. To take action, you must configure sensorsd
in /etc/sensorsd.conf.
OpenBSD supports many types of sensors, as listed in Table 15-2.
Name |
Function |
temp |
Temperature (C) |
fan |
Fan speed (RPM) |
volt |
DC voltage |
acvolt |
AC voltage |
resistance |
Ohms resistance |
power |
Wattage |
current |
Amperage |
watthour |
Power capacity |
amphour |
Power capacity |
indicator |
Device-dependent yes/no |
raw |
Device-dependent value |
percentage |
Device-dependent percentage |
illuminance |
Lighting |
drive |
Hard drives |
timedelta |
Time difference between operating system and hardware |
humidity |
Percent humidity |
frequency |
Microhertz |
angle |
Microdegrees |
You’ll need to check your hardware manual in order to learn how to use some of these sensors effectively.
Some sensors appear to overlap. For example, why does OpenBSD have all those separate values for power, when you could probably do some math and get a common power gauge? The reason is that these are the values that the actual sensors report, and the developers would prefer to give you the actual measurements. OpenBSD does perform some data rationalization, but only for simple data; all temperature sensors are normalized to degrees Celsius, for example.
The file sensorsd.conf has example entries, but because environments differ so widely, they’re all commented out. It uses a termcap
-style configuration syntax, much like /etc/remote (see Chapter 5) or /etc/login.access (see Chapter 6), with colons separating the terms in an entry. Each entry starts with the sensor to be measured, followed by attribute names and settings.
For example, here’s an entry for a temperature sensor in the default sensorsd.conf:
hw.sensors.lm0.temp0:high=50C
For the sensor lm0.temp0
, the attribute high
is set to 50C
.
sensorsd
supports four attributes:
high
. An upper limit
low
. A lower limit
command
. A command to run when a limit is crossed or a state changes
istatus
. Ignore this status
The values reported for a sensor type depend on what makes sense. Where high and low limits make sense for temperature and voltage, some sensors report specific values instead. The RAID controller shown earlier reports drives as degraded, failed, or healthy. A hard-drive sensor that reports a scalar value isn’t useful, as you want to know if a RAID container is healthy or if drives have failed. There’s no middle ground.
You can have both high and low values for a single sensor. For example, whereas temperature might not have a low value in most data centers, voltage certainly will. I work in all sorts of weird places, and not all of them have clean power.
hw.sensors.acpibat0.volt0:low=11.0V:high=13.0V
With a line like this, if the electricity supply to my laptop drops below 11 volts or goes above 13 volts, I will know.
Some systems might have dozens of sensors of a given type, which could make configuration tricky. If my motherboard has 15 temperature sensors, I don’t want to configure each separately. Fortunately, you can configure sensors en masse by type, and since I don’t care which temperature sensor goes above 80 degrees Celsius (if any of them do, I want an alarm), that works.
temp:high=80C
When this rule is applied, sensorsd
first looks for a configuration item for a specific sensor. If it doesn’t find that specific rule, it looks for a general rule. You can have one rule for most of your temperature sensors, and then override it for specific sensors, like this:
hw.sensors.lm0.temp5:high=90C temp=80C
This rule says that most of my temperature sensors alarm at 80 degrees, but one specific sensor doesn’t alarm until 90 degrees.
I care about temperature, but I don’t care if my fancy keyboard sees that there’s no light and wants to trigger its back lighting. You can ignore a sensor, or a type of sensor, with the istatus
keyword.
illuminance:istatus
You should categorically ignore certain types of alarms based on your environment and gear. Make up your own mind.
Having an entry in /var/log/daemon for when a hard drive fails is nice, but it would be better if the system would send email, page you, or trigger your monitoring system. It should do something—anything—that doesn’t require you to log in and look at a log file. Fortunately, sensorsd
can run arbitrary commands upon detecting a problem or crossing a threshold, using the command
attribute.
Thanks to the wide variety of sensors and their possible error states and conditions, sensorsd
doesn’t have a fine-grained “run this command for an error, but run that other command for recovery.” There are too many possible error states and conditions for this to make any sense. Instead, sensorsd
runs a single command upon crossing any threshold or upon any state change, including when it starts up and the state of an individual sensor goes from “unknown” to whatever it starts at.
Consider this sensorsd.conf entry:
temp:high=80C:command=/sbin/reboot
At first glance, this reads “If the temperature is high, reboot the machine.” You think that will unquestionably kill whatever runaway process is saturating your heat-generating CPU (completely setting aside the fact that other hardware besides CPUs generate heat), but sensorsd
will run the command whenever the temperature state changes. The state changes at boot time, when the first temperature reading is taken, which means that your system will boot, and then immediately reboot. Your script needs intelligence.
To make scripting easier, sensorsd
has a set of variables it can pass to a script:
%1
. Is the value within the limit set in sensorsd.conf? This can be one of below
, above
, within
, invalid
, or uninitialized
.
%n
. Sensor number.
%s
. Sensor status.
%x
. Which device the sensor sits on.
%t
. Sensor type.
%2
. Sensor’s current value.
%3
. Sensor’s low limit
%4
. Sensor’s high limit.
You might run a temperature command like this:
temp:high=80C:command=/usr/local/script/temp %1 %2 %n
Your script /usr/local/script/temp would take three arguments: the error condition, the temperature, and the sensor name. Your script would check these values and see if a reboot is warranted.
With sensorsd
, proper timekeeping, and log file management, your OpenBSD system can largely look after itself.
In the next chapter, we’ll look at how OpenBSD can take care of other hosts.