Debug levels

The constellation of messages that Ceph OSDs can log is huge and changes with each release. Full interpretation of logs could easily itself fill a book, but we'll end this section with a brief note regarding the logging levels summary within the above excerpt.

--- logging levels ---
0/ 5 none
0/ 0 lockdep
0/ 0 context

Ceph's allows us to finely control the verbosity with which it logs events, status, and errors. Levels may be independently set for each subsystem to control verbosity. Moreover, each subsystem has separate verbosity levels for information kept in process memory and for messages sent to log files or the syslog service. Higher numbers increase verbosity. The output and memory levels can be set independently, or separately. For example, if our MONs are proving unruly, we might add lines like the below to ceph.conf.

[global]
debug ms = 1
    
[mon]
debug mon = 15/20
debug paxos = 20
debug auth = 20

Note how for the mon subsystem we have specified different levels for output and memory logs by separating the two values with a slash character. If only a single value is given, Ceph applies it to both.

Recall from earlier in this chapter that Ceph daemons only read configuration files at startup. This means that to effect the above changes, one would need to make them on each MON node and perform a rolling restart of all MON daemons. This is tedious and awkward; in times of crisis, it can also be precarious to stir the pot by bouncing services.

This is a perfect example of how the admin socket and injection mechanisms described earlier in this chapter earn their keep.

Say yesterday osd.666 and osd.1701 were being squirrelly, so we elevated the logging levels of the filestore and osd modules. Today we find that the increased log flow has half-filled the OSD node's /var/log filesystem and need to back off on the verbosity. After logging into osd.666's node we check the settings that are currently active in the daemon's running config.

# ceph daemon osd.666 config show | grep debug
    "debug_none": "0/5",
    "debug_lockdep": "0/0",
    "debug_context": "0/0",
    "debug_crush": "0/0",
    "debug_mds": "1/5",
...
    "debug_osd": "10/10",
    "debug_optracker": "0/0",
    "debug_objclass": "0/0",
    "debug_filestore": "10/10",
...

On Ceph's Jewel 10.2.6 release this matches no fewer than 98 individual settings, hence the ellipses for brevity. Now log into an admin or mon node and inject those puppies into submission.

# ceph tell osd.666 injectargs '--debug-filestore 0/0 --debug-osd 0/0'debug_filestore=0/0 debug_osd=0/0

Boom. Instant satisfaction. Since we can inject these values on the fly, we can set them low in ceph.conf and inject temporary increases on the fly, followed by decreases when we're through. In fact if the log file has grown so large that there isn't even space to compress it we can truncate and reopen it without having to restart the OSD daemon.

# rm /var/log/ceph/ceph-osd.666.log
# ceph daemon osd.666 log reopen

To quote my high school chemistry teacher, Isn't this FUN!? NOW we're cooking with gas!

Oh wait, we forgot about osd.1701. And maybe we also elevated osd.1864 and ... we don't remember. It was late and we were groggy due to garlic fries deprivation. This morning after a waking up with a soy chai we remember that we can inject values into the running state of all OSDs with a single command.

# ceph tell osd.* injectargs '--debug-filestore 0/0 --debug-osd 0/0'
osd.0: debug_filestore=0/0 debug_osd=0/0
osd.1: debug_filestore=0/0 debug_osd=0/0
osd.2: debug_filestore=0/0 debug_osd=0/0

...

Better safe than sorry; we're best off being sure. Injecting values that may be identical to those already set is safe and idempotent, and sure beats nuking the site from orbit.

Analysis of Ceph logs can involve lots of time and gnarly awk/sed pipelines. A singularly useful set of tools for identifying patterns in log files can be found here: https://github.com/linuxkidd/ceph-log-parsers

By slicing and dicing Ceph log files and emitting CSV files ready for importing into a spreadsheet, these scripts help collate thousands of messages across hours or days so that we can discern patterns. They are especially useful for narrowing in on nodes or OSDs that are loci of chronic slow / blocked requests.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset