Performance metrics

When a microservice eats 100% of server memory, bad things will happen. Some Linux distributions will just kill the greedy process using the infamous out-of-memory killer (oomkiller).

Using too much RAM can happen for several reasons:

  • The microservice has a memory leak and steadily grows, sometimes at a very fast pace. It's very common in Python C extensions to forget to dereference an object and leak it on every call.
  • The code uses memory without care. For example, a dictionary that's used as an ad hoc memory cache can grow indefinitely over the days unless there's an upper limit by design.
  • There's simply not enough memory allocated to the service--the server is getting too many requests or is too weak for the job.

It's important to be able to track memory usage over time to find out about these issues before it impacts users.

Reaching 100% of the CPU in production is also problematic. While it's desirable to maximize the CPU usage, if the server is too busy when new requests are coming in, the service will not be responsive.

Lastly, knowing that the server disk is almost full will prevent a service to crash when it's out of space.

Hopefully, most of these problems can be discovered with a load test before the project goes to production. A load test is a good way to determine how much load a server can hold during the test and over time, and tweak the CPU/RAM resources depending on the expected load.

To do this, let's tool our service to monitor the system resources continuously.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset