Understanding load average

For a Linux administrator, load average is one of the most important concepts you'll ever learn. While you may know already that this number represents how much load your system is experiencing, it also represents trending performance as well. Using this number, you'll be able to determine whether your system is being overwhelmed or it's recovering and calming down. Essentially, the load average consists of three numbers, each representing the average load of the system over a specific time frame. The first number represents one minute, the second represents five minutes, and the third represents 15 minutes. There are many ways in which you can view your load average, and it will also be displayed in most system monitors available for Linux. One way to view your load average in a snap is to execute the following command:

cat /proc/loadavg
Understanding load average

Viewing the load average

A simpler technique is to use the uptime command. Though the main purpose of the uptime command is to view how long your system has been up, it displays the system's load average as well.

Understanding load average

The output of the uptime command

So, how does one properly interpret this information? With the screenshot of the uptime command shown in this section, we see the following numbers:

0.63 0.72 0.71

As mentioned, the first three numbers represent the system's load during a period of 1, 5, and 15 minutes respectively. The load that's being referred to represents the number of processes that are waiting on, or currently utilizing, the CPU during each timeframe. On the system used in this example, we can see that the load on it is relatively low. We can also see trends with load average as well. On the example system, the load is trending upward but just by a bit.

Generally speaking, the lower the load averages, the better. But that's not always the case; lower numbers can be disturbing too. For example, if you have a server that's supposed to be doing a lot of work and its load average drops down to being less than one, that may be a cause for alarm. If the load is that low, the server clearly isn't busy. This might represent that a process which is supposed to be running has failed. For example, if you have a MySQL server that normally sees hundreds of queries at a time, it would definitely be odd to see that the server was suddenly bored. On the flipside, a server with load average in the hundreds would be so busy it would be unlikely that it could even process a login request for you to even access the system!

Let's take a look at another load average. Here's one from a busier system on a network that I help manage:

9.75 8.96 5.94

Here, we can see that the load on this system is much higher than the previous example. This might be something I'll want to look into. But one confusing thing about a system's load average is that the number itself isn't enough to justify cause for alarm. If that system had ten cores, I wouldn't be so worried. Despite the load average being over nine, there would be plenty of CPU's to handle the workload in that case. However, the system I took that output from has only four cores, so it's a cause for alarm. It means that during each of the three time windows, there were more processes waiting for CPU time than the system actually has in cores. That's not good. But thankfully, I can see that the system is recovering since the load is trending downward. In this case, I won't panic but I'll certainly want to keep my eye on it to ensure that it continues to recover. I may also investigate the system to find out what exactly caused the load to spike up so high. Perhaps the server just finished a really big job, but it's worth looking into.

As a general rule of thumb, it's a good idea to record a baseline of your systems when they are under their normal, expected load. Each system on your network will have a designated purpose and each will have a certain load you can reasonably expect your system to face at any one time. If the systems load average dips too far below or climbs higher than the baseline, then you would want to take a look and find out what's going on. If the load reaches a level where there are more processes than you have cores to handle, that's cause for alarm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset