Another way of debugging a misbehaving application is to inspect its log files. Log files are special files in which an application writes down all sorts of things, normally related to what's going on inside of it. If an important procedure is started, I would typically expect a corresponding line in the logs. It is the same when it finishes, and possibly for what happens inside of it.
Errors need to be logged so that when a problem happens, we can inspect what went wrong by taking a look at the information in the log files.
There are many different ways to set up a logger in Python. Logging is very malleable and you can configure it. In a nutshell, there are normally four players in the game: loggers, handlers, filters, and formatters:
- Loggers: Expose the interface that the application code uses directly
- Handlers: Send the log records (created by loggers) to the appropriate destination
- Filters: Provide a finer-grained facility for determining which log records to output
- Formatters: Specify the layout of the log records in the final output
Logging is performed by calling methods on instances of the Logger class. Each line you log has a level. The levels normally used are: DEBUG, INFO, WARNING, ERROR, and CRITICAL. You can import them from the logging module. They are in order of severity and it's very important to use them properly because they will help you filter the contents of a log file based on what you're searching for. Log files usually become extremely big so it's very important to have the information in them written properly so that you can find it quickly when it matters.
You can log to a file but you can also log to a network location, to a queue, to a console, and so on. In general, if you have an architecture that is deployed on one machine, logging to a file is acceptable, but when your architecture spans over multiple machines (such as in the case of service-oriented or microservice architectures), it's very useful to implement a centralized solution for logging so that all log messages coming from each service can be stored and investigated in a single place. It helps a lot, otherwise trying to correlate giant files from several different sources to figure out what went wrong can become truly challenging.
Microservice architectures are an evolution of SOAs, but follow a different set of architectural patterns.
Here, I will present you with a very simple logging example. We will log a few messages to a file:
# log.py
import logging
logging.basicConfig(
filename='ch11.log',
level=logging.DEBUG, # minimum level capture in the file
format='[%(asctime)s] %(levelname)s: %(message)s',
datefmt='%m/%d/%Y %I:%M:%S %p')
mylist = [1, 2, 3]
logging.info('Starting to process `mylist`...')
for position in range(4):
try:
logging.debug(
'Value at position %s is %s', position, mylist[position]
)
except IndexError:
logging.exception('Faulty position: %s', position)
logging.info('Done parsing `mylist`.')
Let's go through it line by line. First, we import the logging module, then we set up a basic configuration. In general, a production-logging configuration is much more complicated than this, but I wanted to keep things as easy as possible. We specify a filename, the minimum logging level we want to capture in the file, and the message format. We'll log the date and time information, the level, and the message.
I will start by logging an info message that tells me we're about to process our list. Then, I will log (this time using the DEBUG level, by using the debug function) which is the value at some position. I'm using debug here because I want to be able to filter out these logs in the future (by setting the minimum level to logging.INFO or more), because I might have to handle very big lists and I don't want to log all the values.
If we get IndexError (and we do, since I'm looping over range(4)), we call logging.exception(), which is the same as logging.error(), but it also prints the traceback.
At the end of the code, I log another info message saying we're done. The result is this:
# ch11.log
[05/06/2018 11:13:48 AM] INFO:Starting to process `mylist`...
[05/06/2018 11:13:48 AM] DEBUG:Value at position 0 is 1
[05/06/2018 11:13:48 AM] DEBUG:Value at position 1 is 2
[05/06/2018 11:13:48 AM] DEBUG:Value at position 2 is 3
[05/06/2018 11:13:48 AM] ERROR:Faulty position: 3
Traceback (most recent call last):
File "log.py", line 15, in <module>
position, mylist[position]))
IndexError: list index out of range
[05/06/2018 11:13:48 AM] INFO:Done parsing `mylist`.
This is exactly what we need to be able to debug an application that is running on a box, and not on our console. We can see what went on, the traceback of any exception raised, and so on.
Logging is an art. You need to find a good balance between logging everything and logging nothing. Ideally, you should log anything that you need to make sure your application is working correctly, and possibly all errors or exceptions.