Chapter 5. Monitoring System Resources

As the needs of your organization expand, your network will grow and change in order to match the growth. Keeping track of the resources on each node is extremely important for stability. While Linux handles resources exceptionally well, it can only do so much. CPUs can be overutilized, disks become full, and excessive input/output can halt even the strongest of servers. Keeping an eye on these things is very important, especially when systems are used in production and depended upon by others.

In this chapter, we'll look at ways to inspect what's running on your Linux systems and manage their resources to help ensure your nodes are good citizens on your network.

In this chapter, we will cover:

  • Inspecting and managing processes
  • Understanding load average
  • Checking available memory
  • Using shell-based resource monitors
  • Checking disk space
  • Scanning used storage
  • Introduction to logging
  • Maintaining log size with logrotate
  • Understanding the systemd init system
  • Understanding the systemd journal

Inspecting and managing processes

In a typical troubleshooting scenario, you might have a process that is misbehaving or needs an action performed against it. If you're using a graphical desktop environment for a workstation, you might use a tool such as the GNOME System Monitor to investigate processes running on your system, and then kill the problem child. In most cases though, you probably won't have a desktop environment (at least not on servers), so you would use a command such as kill in order to get rid of whatever process is misbehaving. But before you can kill a process, you'll need to know its process identifier (PID). One method that works on all Linux systems to find the PID of a process is to open a terminal and us the ps command. Here's an example of its usage:

ps aux

Along with ps, it's common to use grep if you happen to already know the name of the process. In that case, you can pipe the output of ps aux into grep and then search for a process.

ps aux |grep httpd

The ps command will give you a list of running processes. If you used grep, the output would be narrowed down to a list of processes matching the search term. You'll see the PID located for each process that comes up in the results within the second column. In the third column, you'll see how much CPU the process is consuming, followed by a column for memory usage immediately after that.

Output of ps aux on a Debian system

USER, STAT, START, TIME, and COMMAND are additional columns we can see from this output. While USER is self-explanatory, here's a short description of the other column headers:

  • STAT: This field identifies the state of the program, with a one or two-character code representing the state the program is currently in. For example, S means that the process is waiting for some event to complete, while D is an uninterruptible sleep state, typically related to IO. To view a complete list, check out the manual page on ps.
  • START: This field refers to the time at which the process began running.
  • TIME: This indicates the total time the process has been utilizing the CPU. Every time a process hits the CPU and needs work done, time is logged against the CPU.
  • COMMAND: This displays the command that the current process is running.

Now that you know how to find the PID of a process, we can take a look at the kill command, which is a command that's useful in case you need to close a program that otherwise won't close by normal means. For example, if you are running a script with a process ID 25787, you could kill it by executing the following command:

# kill 25787

The kill command works by sending a specific signal to a PID. Signal 15, for example, is known as SIGTERM. If you execute the kill against a process without any parameters (as we did in our last example), you're sending signal 15 by default, which basically asks politely for the process to close down. There are 18 different signals you can send to a process, which you can read about in the manual pages. For the sake of our discussion here, SIGINT, SIGTERM, and SIGKILL are the ones you'll likely use the most. You can view a list of these signals, as well as their meanings, by executing the following command:

man 7 signal

To send a specific signal, type a hyphen after the kill command followed by the signal you wish to send. Since kill by itself sends signal 15, you can do the same thing by executing the following command:

# kill -15 25787

To send a different signal, such as 2 (SIGINT), type the following command:

# kill -2 25787

If you're very desperate, you could send signal 9 (SIGKILL) to the process:

# kill -9 25787

However, SIGKILL should be used only if you've already exhausted all your other options, and you cannot get the process to close despite your best efforts. SIGKILL closes the process immediately, but unfortunately it does not give it a chance to clean up after itself. This may cause unclean temporary files and open socket connections to remain on your system. Worse, it can actually damage databases and configuration. Therefore, I cannot stress this enough, kill -9 should definitely be the very last thing you try if you can't get a process to close out gracefully. Try every method you know to first close a process gracefully, and then make several more attempts before considering using it.

Another command that can be used to kill processes is the killall command. The killall command allows you to kill all the processes on your system which match a specific name. For example, let's say you have multiple Firefox windows open and the program stops responding. To kill all instances of Firefox running on your system instantly, simply execute the following command:

killall firefox

And just like that, every Firefox window on your system will instantly vanish. The killall command can be used to close down multiple processes that all share the same name, and it can be very useful on servers which run multiple instances of a single unresponsive program or script.

That's pretty much all there is to using the kill and killall commands. Sure, there are more options and the man pages will give you more information. But in a nutshell, those are the variations you'll actually use. In a perfect world, you should never need to use kill and all processes running on your servers will obey you without question. Unfortunately, we don't live in a perfect world and you'll probably use these commands more often than you'd like.

