Let's see how to profile memory use for the Ceph daemons running on our nodes:
- Start the memory profiler on a specific daemon:
# ceph tell osd.2 heap start_profiler
To auto-start the profiler as soon as the Ceph osd daemon starts set the environment variable as CEPH_HEAP_PROFILER_INIT=true.
It's a good idea to keep the profiler running for a few hours so that it can collect as much information related to the memory footprint as possible. At the same time, you can also generate some load on the cluster.
- Next, print heap statistics about the memory footprint that the profiler has collected:
# ceph tell osd.2 heap stats
- You can also dump heap stats on a file that can be used later; by default, it will create the dump file as /var/log/ceph/osd.2.profile.0001.heap:
# ceph tell osd.2 heap dump
- To read this dump file, you will require google-perftools:
# yum install -y google-perftools
Refer to http://goog-perftools.sourceforge.net/doc/heap_profiler.html for additional details.
- To view the profiler logs:
# pprof --text {path-to-daemon} {log-path/filename}
# pprof --text /usr/bin/ceph-osd
/var/log/ceph/osd.2.profile.0001.heap
- For granule comparison, generate several profile dump files for the same daemon, and use the Google profiler tool to compare it:
# pprof --text --base /var/log/ceph/osd.0.profile.0001.heap
/usr/bin/
ceph-osd /var/log/ceph/osd.2.profile.0002.heap
- Release memory that TCMalloc has allocated but is not being used by Ceph:
# ceph tell osd.2 heap release
- Once you are done, stop the profiler as you do not want to leave this running in a production cluster:
# ceph tell osd.2 heap stop_profiler
The Ceph daemons process has matured much, and you might not really need memory profilers for analysis unless you encounter a bug that's causing memory leaks. You can use the previously discussed procedure to figure out memory issues with the Ceph daemons.