Chapter 7. Memory Debugging Tools

Although C is undisputedly the standard programming language on Linux systems, C has a number of features that lead programmers into writing code with subtle bugs that can be very hard to debug. Memory leaks (in which malloc() ed memory is never free() ed) and buffer overflows (writing past the end of an array, for example) are two of the most common and difficult-to-detect program bugs; buffer underruns (writing before the beginning of an array, for example) are much less common but usually even harder to track down. This chapter presents a few debugging tools that greatly simplify the detection and isolation of such problems.

Buggy Code

 1: /* broken.c */
 2:
 3: #include <stdlib.h>
 4: #include <stdio.h>
 5: #include <string.h>
 6:
 7: char global[5];
 8:
 9: int broken(void) {
10:     char * dyn;
11:     char local[5];
12:
13:     /* First, overwrite a buffer just a little bit */
14:     dyn = malloc(5);
15:     strcpy(dyn, "12345");
16:     printf("1: %s
", dyn);
17:     free(dyn);
18:
19:     /* Now overwrite the buffer a lot */
20:     dyn = malloc(5);
21:     strcpy(dyn, "12345678");
22:     printf("2: %s
", dyn);
23:
24:     /* Walk past the beginning of a malloced local buffer */
25:     *(dyn - 1) = '';
26:     printf("3: %s
", dyn);
27:     /* note we didn't free the pointer! */
28:
29:     /* Now go after a local variable */
30:     strcpy(local, "12345");
31:     printf("4: %s
", local);
32:     local[-1] = '';
33:     printf("5: %s
", local);
34:
35:     /* Finally, attack global data space */
36:     strcpy(global, "12345");
37:     printf("6: %s
", global);
38:
39:     /* And write over the space before the global buffer */
40:     global[-1] = '';
41:     printf("7: %s
", global);
42:
43:     return 0;
44: }
45:
46: int main(void) {
47:     return broken();
48: }

Throughout this chapter, we look for the problems in this code segment. This code corrupts three types of memory regions: memory allocated from the dynamic memory pool (the heap) via malloc(); local variables allocated on the program’s stack; and global variables, which are stored in a separate area of memory that is statically allocated when the program starts.[1] For each of these memory classes, this test program writes over the end of the reserved area of memory (usually, by a single byte) and stores a byte immediately before the allocated area as well. In addition, the code includes a memory leak to show how various tools can help track down leaks.

Although this code has many problems, it actually runs just fine. Does that mean these problems are not important? Not by any means. Buffer overflows tend to cause a program to misbehave long after the actual overflow, and memory leaks in programs that run for a length of time waste a computer’s resources. Furthermore, buffer overflows are a classic source of security vulnerabilities, as discussed in Chapter 22. For reference, here is what the program looks like when it is executed.

$ gcc -Wall -o broken broken.c
$ ./broken
1: 12345
2: 12345678
3: 12345678
4: 12345
5: 12345
6: 12345
7: 12345

Memory-Checking Tools Included in glibc

The GNU C Library (glibc) includes three simple memory-checking tools. The first two, mcheck() and MALLOC_CHECK_, enforce heap data structure consistency checking, and the third, mtrace(), traces memory allocation and deallocation for later processing.

Finding Memory Heap Corruption

When memory is allocated from the heap, the memory management functions need someplace to store information about the allocations. That place is the heap itself; this means that the heap is composed of alternating areas of memory that are used by the program and by the memory management functions themselves. This means that buffer overflows or underruns can actually damage the data structures that the memory management functions use to keep track of what memory has been allocated. When this happens, all bets are off, except that it is a pretty good bet that the memory management functions will eventually cause the program to crash.

If you set the MALLOC_CHECK_ environment variable, a different and some-what slower set of memory management functions is chosen that is more tolerant of errors and can check for calling free() more than once on the same pointer and for single-byte buffer overflows. If MALLOC_CHECK_ is set to 0, the memory management functions are simply more tolerant of error but do not give warnings. If MALLOC_CHECK_ is set to 1, the memory management functions print out warning messages on standard error when they notice problems. If MALLOC_CHECK_ is set to 2, the memory management functions call abort() when they notice problems.

Setting MALLOC_CHECK_ to 0 may be useful if you are prevented from finding one memory bug by another that is not convenient to fix at the moment; it might allow you to use other tools to chase down the other memory bug. It may also be useful if you are running code that works on another system but not on Linux and you want a quick workaround that may allow the code to function temporarily, before you have a chance to resolve the error.

Setting MALLOC_CHECK_ to 1 is useful if you are not aware of any problems and just want to be notified if any problems exist.

Setting MALLOC_CHECK_ to 2 is most useful from inside the debugger, because it allows you to get a backtrace as soon as the memory management functions discover the error, which will get you closest to the point at which the error has happened.

$ MALLOC_CHECK_=1 ./broken
malloc: using debugging hooks
1: 12345
free(): invalid pointer 0×80ac008!
2: 12345678
3: 12345678
4: 12345
5: 12345
6: 12345
7: 12345
$ MALLOC_CHECK_=2 gdb ./broken
...
(gdb) run
Starting program: /usr/src/lad/code/broken
1: 12345
Program received signal SIGABRT, Aborted.
0×00c64c32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) where
#0  0×00c64c32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0×00322969 in raise () from /lib/tls/libc.so.6
#2  0×00324322 in abort () from /lib/tls/libc.so.6
#3  0×0036d9af in free_check () from /lib/tls/libc.so.6
#4  0×0036afa5 in free () from /lib/tls/libc.so.6
#5  0×0804842b in broken () at broken.c:17
#6  0×08048520 in main () at broken.c:47

Another way to ask glibc to do heap consistency checking is with the mcheck() function:

typedef void (*mcheckCallback)(enum mcheck_status status);
void mcheck(mcheckCallback cb);

When the mcheck() function has been called, malloc() places known byte sequences before and after the returned memory region in order to make it possible to spot buffer overflow and buffer underrun conditions. free() looks for those signatures, and if they have been disturbed, it calls the function pointed to by the cb parameter. If cb is NULL, the library exits instead. Running a program linked against mcheck() through gdb can show you exactly which memory regions have been corrupted, as long as those regions are properly free() ed. However, the mcheck() method does not pinpoint exactly where the corruption occurred; it is up to the programmer to figure that out based on an understanding of the program flow.

Linking our test program against the mcheck library yields the following results:

$ gcc -ggdb -o broken broken.c -lmcheck
$ ./broken
1: 12345
memory clobbered past end of allocated block

Because mcheck merely complains and exits, this does not really pinpoint the error. To pinpoint the error, you need to run the program inside gdb and tell mcheck to abort() when it notices a problem. You can simply call mcheck() from within gdb, or you can call mcheck(1) as the first line of your program (before you ever call malloc()). (Note that you can call mcheck() from within gdb without linking your program against the mcheck library!)

$ rm -f broken; make broken
$ gdb broken
...
(gdb) break main
Breakpoint 1 at 0×80483f4: file broken.c, line 14.
(gdb) command 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>call mcheck(&abort)
>continue
>end
(gdb) run
Starting program: /usr/src/lad/code/broken
Breakpoint 1, main () at broken.c:14
47          return broken();
$1 = 0
1: 12345
Program received signal SIGABRT, Aborted.
0×00e12c32 in_dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) where
#0  0×00e12c32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0×0072c969 in raise () from /lib/tls/libc.so.6
#2  0×0072e322 in abort () from /lib/tls/libc.so.6
#3  0×007792c4 in freehook () from /lib/tls/libc.so.6
#4  0×00774fa5 in free () from /lib/tls/libc.so.6
#5  0×0804842b in broken () at broken.c:17
#6  0×08048520 in main () at broken.c:47

The important part of this is where it tells you that the problem was detected in broken.c at line 17. That lets you see that the error was detected during the first free() call, which indicates the problem was in (or more precisely, bordering) the dyn memory region. (freehook() is just the hook that mcheck uses to do its consistency checks.)

mcheck does not help you to find overflows or underruns in local or global variables, only in malloc() ed memory regions.

Using mtrace() to Track Allocations

A simple way to find all of a program’s memory leaks is to log all its calls to malloc() and free(). When the program has completed, it is straightforward to match each malloc() ed block with the point at which it was free() ed, or report a leak if it was never free() ed.

Unlike mcheck(), mtrace() has no library against which you can link to enable mtrace(). This is no great loss; you can use the same technique with gdb to start tracing. However, for mtrace() to enable tracing, the environment variable MALLOC_TRACE must be set to a valid filename; either an existing file that the process can write to (in which case it is truncated) or a filename that the process can create and write to.

$ MALLOC_TRACE=mtrace.log gdb broken
...
(gdb) break main
Breakpoint 1 at 0×80483f4: file broken.c, line 14.
(gdb) command 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>call mtrace()
>continue
>end
(gdb) run
Starting program: /usr/src/lad/code/broken
Breakpoint 1, main () at broken.c:47
47          return broken();
$1 = 0
1: 12345
2: 12345678
3: 12345678
4: 12345
5: 12345
6: 12345
7: 12345
Program exited normally.
(gdb) quit
$ ls -l mtrace.log
-rw-rw-r--   1 ewt      ewt           220 Dec 27 23:41 mtrace.log
$ mtrace ./broken mtrace.log
Memory not freed:
-----------------
   Address     Size     Caller
0×09211378      0×5  at /usr/src/lad/code/broken.c:20

Note that the mtrace program has found the memory leak exactly. The mtrace program can also find memory that is free() ed that was never allocated in the first place if this case shows up in the log file, but in practice it will not find it there because the program should crash immediately when attempting to free() the unallocated memory.

Finding Memory Leaks with mpr

The mtrace() facility in glibc is good, but the mpr memory allocation profiler[2] is in some ways easier to use and has some more sophisticated scripts for processing its logfile output.

The first step in using mpr (after building the code with debug information enabled[3])is to set an environment variable, MPRFI, that tells mpr what command it should pipe the log through (if it is not set, no log is generated). For small programs, MPRFI should be set to something like cat > mpr.log. For larger programs, it can save a significant amount of space to compress the log file while writing it by setting MPRFI to gzip -1 >mpr.log.gz.

The easiest way to do this is to use the mpr script to run your program; if MPRFI is not already set, it sets MPRFI to gzip -1 >log.%p.gz, which creates a logfile with the process ID of the program being debugged and preloads the mpr library so that you do not have to rebuild your program at all. Here is what we did to create a log file for a fixed version of our test program:

$ MPRFI="cat >mpr.log" mpr ./broken
1: 12345
2: 12345678
3: 12345678
4: 12345
5: 12345
6: 12345
7: 12345
$ ls -l mpr.log
-rw-rw-r--   1 ewt         ewt          142 May 17 16:22 mpr.log

Once the log file has been created, there are a number of tools available for analyzing it. All of these programs expect an mpr log on standard input. If the output from these tools has numbers where you expect function names (probably with a warning like “cannot map pc to name”), the problem may be the version of the awk utility that mpr is using. The mpr documentation suggests exporting the MPRAWK environment variable to choose the mawk version of awk for best results: export MPRAWK='mawk -W sprintf=4096'. In addition, the stack randomization provided by the kernel “Exec-shield” functionality can confuse mpr; you can ameliorate this by using the setarch command to disable Exec-shield while running the program under investigation and while running the mpr filters: for example, setarch i386 mpr program and setarch i386 mprmap ....

You may still end up with a few stack frames that mpr cannot find a textual symbol name for; you can generally ignore them.

mprmap program

This converts the program addresses in an mpr log into function names and source code locations. The executable file name that generated the log must be given as an argument. To see all the allocations that occurred in a program, along with the function call chain that led to the allocations, you could use mprmap program < mpr.log. By default, this program displays the function name. Using the -f flag causes it to display the file names, as well, and -l displays the line number within the file. The -l argument implies the -f argument.

 

The output of this program is considered a valid mpr log file, and as such it can be piped through any of the other mpr utility programs.

mprchain

This converts the log into output grouped by call chain. A function call chain is a list of all the functions that are currently active at some point in a program. For example, if main() calls getargs(), which then calls parsearg(), the active call chain while parsearg() is running is displayed as main:getargs:parsearg. For each unique call chain that allocated memory during a program’s execution, mprchain displays the number of allocations made by that chain[4] and the total bytes allocated by that chain.

mprleak

This filter examines the log file for all of the allocated regions that were never freed. A new log file, which consists of only those allocations that caused leaks, is generated on standard out.

 

The output of this program is considered a valid mpr log file, and as such it can be piped through any of the other mpr utility programs.

mprsize

This filter sorts memory allocations by size. To see memory leaks by size, use the output from mprleak as the input to mprsize.

mprhisto

Displays a memory allocation histogram.

[4] More specifically, it is the sum of all the allocations made by the final function in the chain when it was invoked through that particular chain.

Now that we know about the log analyzers, it is easy to find the memory leak in our test program: Merely use the command mprleak mpr.log | mprmap -l ./broken (which is equivalent to mprmap -l ./broken mpr.log | mprleak) to find the memory leak on line 20.

$ mprleak mpr.log | mprmap -l ./broken
m:broken(broken.c,20):main(broken.c,47):5:134518624

Investigating Memory Errors with Valgrind

Valgrind[5] is an Intel x86-specific tool that emulates an x86-class CPU to watch all memory accesses directly and analyze data flow (for example, it can recognize reads of uninitialized memory, but it also recognizes that moving one uninitialized value into another location that is never read does not actually constitute an uninitialized read). It has many other capabilities, including investigating cache use and looking for race conditions in threaded programs, and in fact has a general facility for adding more capabilities based on its CPU emulator. However, for our purposes, we merely briefly introduce its aggressive memory error checking, which is its default behavior.

Valgrind does not require that a program be recompiled, although its analysis, like all debugging tools, is enhanced by compiling the program with debugging information included.

$ valgrind ./broken
==30882== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux.
==30882== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward.
==30882== Using valgrind-2.0.0, a program supervision framework for x86-linux.
==30882== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward.
==30882== Estimated CPU clock rate is 1547 MHz
==30882== For more details, rerun with: -v
==30882==
==30882== Invalid write of size 1
==30882==    at 0xC030DB: strcpy (mac_replace_strmem.c:174)
==30882==    by 0x8048409: broken (broken.c:15)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==    Address 0x650F029 is 0 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x80483F3: broken (broken.c:14)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==
==30882== Conditional jump or move depends on uninitialised value(s)
==30882==    at 0x863D8E: __GI_strlen (in /lib/libc-2.3.2.so)
==30882==    by 0x83BC31: _IO_printf (in /lib/libc-2.3.2.so)
==30882==    by 0x804841C: broken (broken.c:16)
==30882==    by 0x804851F: main (broken.c:47)
1: 12345
==30882==
==30882== Invalid write of size 1
==30882==    at 0xC030D0: strcpy (mac_replace_strmem.c:173)
==30882==    by 0x804844D: broken (broken.c:21)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==    Address 0x650F061 is 0 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==
==30882== Invalid write of size 1
==30882==    at 0xC030DB: strcpy (mac_replace_strmem.c:174)
==30882==    by 0x804844D: broken (broken.c:21)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==    Address 0x650F064 is 3 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==
==30882== Invalid read of size 4
==30882==    at 0x863D50: __GI_strlen (in /lib/libc-2.3.2.so)
==30882==    by 0x83BC31: _IO_printf (in /lib/libc-2.3.2.so)
==30882==    by 0x8048460: broken (broken.c:22)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    Address 0x650F064 is 3 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==
==30882== Invalid read of size 1
==30882==    at 0x857A21: _IO_file_xsputn@@GLIBC_2.1 (in /lib/libc-2.3.2.so)
==30882==    by 0x835309: _IO_vfprintf_internal (in /lib/libc-2.3.2.so)
==30882==    by 0x83BC31: _IO_printf (in /lib/libc-2.3.2.so)
==30882==    by 0x8048460: broken (broken.c:22)
==30882==    Address 0x650F063 is 2 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==
==30882== Invalid read of size 1
==30882==    at 0x857910: _IO_file_xsputn@@GLIBC_2.1 (in /lib/libc-2.3.2.so)
==30882==    by 0x835309: _IO_vfprintf_internal (in /lib/libc-2.3.2.so)
==30882==    by 0x83BC31: _IO_printf (in /lib/libc-2.3.2.so)
==30882==    by 0x8048460: broken (broken.c:22)
==30882==    Address 0x650F061 is 0 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
2: 12345678
==30882==
==30882== Invalid write of size 1
==30882==    at 0x8048468: broken (broken.c:25)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==    by 0x8048354: (within /usr/src/d/lad2/code/broken)
==30882==    Address 0x650F05B is 1 bytes before a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==
==30882== Invalid read of size 4
==30882==    at 0x863D50: __GI_strlen (in /lib/libc-2.3.2.so)
==30882==    by 0x83BC31: _IO_printf (in /lib/libc-2.3.2.so)
==30882==    by 0x804847A: broken (broken.c:26)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    Address 0x650F064 is 3 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==
==30882== Invalid read of size 1
==30882==    at 0x857A21: _IO_file_xsputn@@GLIBC_2.1 (in /lib/libc-2.3.2.so)
==30882==    by 0x835309: _IO_vfprintf_internal (in /lib/libc-2.3.2.so)
==30882==    by 0x83BC31: _IO_printf (in /lib/libc-2.3.2.so)
==30882==    by 0x804847A: broken (broken.c:26)
==30882==    Address 0x650F063 is 2 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==30882==
==30882== Invalid read of size 1
==30882==    at 0x857910: _IO_file_xsputn@@GLIBC_2.1 (in /lib/libc-2.3.2.so)
==30882==    by 0x835309: _IO_vfprintf_internal (in /lib/libc-2.3.2.so)
==30882==    by 0x83BC31: _IO_printf (in /lib/libc-2.3.2.so)
==30882==    by 0x804847A: broken (broken.c:26)
==30882==    Address 0x650F061 is 0 bytes after a block of size 5 alloc'd
==30882==    at 0xC0C28B: malloc (vg_replace_malloc.c:153)
==30882==    by 0x8048437: broken (broken.c:20)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: __libc_start_main (in /lib/libc-2.3.2.so)
3: 12345678
4: 12345
==30882==
==30882== Invalid write of size 1
==30882==    at 0x80484A6: broken (broken.c:32)
==30882==    by 0x804851F: main (broken.c:47)
==30882==    by 0x802BAE: _ _libc_start_main (in /lib/libc-2.3.2.so)
==30882==    by 0x8048354: (within /usr/src/d/lad2/code/broken)
==30882==    Address 0xBFF2D0FF is just below %esp.  Possibly a bug in GCC/G++
==30882==    v 2.96 or 3.0.×.  To suppress, use: --workaround-gcc296-bugs=yes
5: 12345
6: 12345
7: 12345
==30882==
==30882== ERROR SUMMARY: 22 errors from 12 contexts (suppressed: 0 from 0)
==30882== malloc/free: in use at exit: 5 bytes in 1 blocks.
==30882== malloc/free: 2 allocs, 1 frees, 10 bytes allocated.
==30882== For a detailed leak analysis,  rerun with: --leak-check=yes
==30882== For counts of detected errors, rerun with: -v

Note that Valgrind found everything but the global overflow and underrun, and it pinpointed the errors more specifically than any other tool we have described here.

One option allows you to turn on an aggressive form of leak checking in which the program is searched to determine for each memory allocation whether any accessible pointers still hold a reference to that memory. This is more accurate than simply asking whether memory has been free() ed because it is reasonably common to allocate some memory that is held for the lifetime of the program, and not to free() it because it returns to the operating system when the program exits anyway.

$ valgrind --leak-check=yes ./broken

...
==2292== searching for pointers to 1 not-freed blocks.
==2292== checked 5318724 bytes.
==2292==
==2292== 5 bytes in 1 blocks are definitely lost in loss record 1 of 1
==2292==    at 0xEC528B: malloc (vg_replace_malloc.c:153)
==2292==    by 0x8048437: broken (broken.c:20)
==2292==    by 0x804851F: main (broken.c:47)
==2292==    by 0x126BAE: __libc_start_main (in /lib/libc-2.3.2.so)
==2292==
==2292== LEAK SUMMARY:
==2292==    definitely lost: 5 bytes in 1 blocks.
==2292==    possibly lost:   0 bytes in 0 blocks.
==2292==    still reachable: 0 bytes in 0 blocks.
==2292==         suppressed: 0 bytes in 0 blocks.
==2292== Reachable blocks (those to which a pointer was found) are not shown.
==2292== To see them, rerun with: --show-reachable=yes

Valgrind includes fairly detailed information on its capabilities, called skins, and has many command-line options for modifying its behavior.

Because Valgrind uses a CPU emulator, it runs many times slower than a program running natively on the system. Exactly how much slower depends on the program, but Valgrind is intended to be at least usable for running interactive programs.

There are occasional subtle issues that can confuse Valgrind when you compile with high levels of optimization. If you get a report of a memory error that does not seem to make sense, try compiling with -0 rather than -02 (or higher) and see if the report changes.

Electric Fence

The next tool we look at is Electric Fence.[6] While it makes no attempt to find memory leaks, it does a nice job of helping programmers isolate buffer overflows. Every modern computer (including all the computers that Linux runs on) provides hardware memory protection. Linux takes advantage of this to isolate programs from each other (your vi session cannot access the memory of my gcc invocation, for example) and to share code safely among processes by making it read-only. Linux’s mmap()[7] system call allows processes to take advantage of hardware memory protection, as well.

Electric Fence replaces the C library’s normal malloc() function with a version that allocates the requested memory and (usually) allocates a section of memory immediately after this, which the process is not allowed to access! Although it may seem odd for a process to allocate memory that it is not allowed to access, doing so causes the kernel to halt the process immediately with a segmentation fault if the program tries to access the memory. By allocating the memory in this manner, Electric Fence has arranged things so that your program will be killed whenever it tries to read or write past the end of a malloc() ed buffer. For full details on using Electric Fence, consult its man page(man libefence), which is detailed.

Using Electric Fence

One of the nicest things about Electric Fence is that it is easy to use. Simply link a program against libefence.a by running the final link step with -lefence as the final argument, and the code is ready to be debugged. Let’s see what happens when we run our test program against Electric Fence:

$ ./broken
  Electric Fence 2.2.0 Copyright (C) 1987-1999 Bruce Perens.
1: 12345
Segmentation fault (core dumped)

Although Electric Fence does not tell us exactly where the problem occurred, it does make the problem itself much more obvious. Pinpointing where the problem occurred is easily done by running the program under a debugger, such as gdb. To use gdb to pinpoint the problem, build the program with debugging information by using gcc’s -g flag, run gdb and tell it the name of the executable to debug, and run the program. When the program is killed, gdb shows you exactly what line caused the problem. Here is what this procedure looks like:

$ gcc -ggdb -Wall -o broken broken.c -lefence
$ gdb broken
...
(gdb) run
Starting program: /usr/src/lad/code/broken
  Electric Fence 2.2.0 Copyright (C) 1987-1999 Bruce Perens.
1: 12345
Program received signal SIGSEGV, Segmentation fault.
0x007948c6 in strcpy () from /lib/tls/libc.so.6
(gdb) where
#0  0x007948c6 in strcpy () from /lib/tls/libc.so.6
#1  0x08048566 in broken () at broken.c:21
#2  0x08048638 in main () at broken.c:47
(gdb)

Thanks to Electric Fence and gdb, we know there is a problem in file broken.c code at line 21, which is the second time strcpy() is called.

Memory Alignment

While Electric Fence did a fine job of finding the second problem in the code—namely, the strcpy() that overwrote its buffer by a large amount—it did not help us at all in finding the first buffer overflow.

The problem here has to do with memory alignment. Most modern CPUs require multibyte objects to start at particular offsets in the system’s RAM. For example, Alpha processors require that an 8-byte long begin at an address that is evenly divisible by eight. This means that a long may appear at address 0x1000 or 0x1008, but not at 0x1005.[8]

Because of this consideration, malloc() implementations normally return memory whose first byte is aligned on the processor’s word size (4 bytes on 32-bit processors and 8 bytes on 64-bit processors) to ensure the caller can store whatever data it likes into the memory. By default, Electric Fence attempts to mimic this behavior by providing a malloc() that returns only addresses that are an even multiple of sizeof(int).

In most programs, such alignment is not all that important, because memory allocations are done in increments that are already based on the machine’s word size, or of simple character strings, which do not have any alignment requirements (as each element is only one byte long).

In the case of our test program, the first malloc() call allocated five bytes. For Electric Fence to meet its alignment restrictions, it must treat the allocation as a request for eight bytes and set up the memory with an extra three bytes of accessible space after the malloc() ed region! Small buffer overflows in this region are not caught because of this.

As malloc() alignment concerns can normally be ignored and the alignment can allow buffer overflows to remain undetected, Electric Fence lets you control how alignment works through the EF_ALIGNMENT environment variable. If it is set, all malloc() results are aligned according to its value. For example, if it is set to 5, all malloc() results will be addresses evenly divisible by 5 (this probably is not a very useful value, however). To turn off memory alignment, set EF_ALIGNMENT to 1 before running your program. Under Linux, improperly aligned accesses are fixed in the kernel anyway, so although this may slow your program down substantially, it should function properly—unless it has slight buffer overflows!

Here is how our test program linked against Electric Fence behaved when we set EF_ALIGNMENT to 1:

$ export EF_ALIGNMENT=1
$ gdb broken
...
(gdb) run
Starting program: /usr/src/lad/code/broken
  Electric Fence 2.2.0 Copyright (C) 1987-1999 Bruce Perens.
Program received signal SIGSEGV, Segmentation fault.
0x002a78c6 in strcpy () from /lib/tls/libc.so.6
(gdb) where
#0  0x002a78c6 in strcpy () from /lib/tls/libc.so.6
#1  0x08048522 in broken () at broken.c:15
#2  0x08048638 in main () at broken.c:47

This time it found the first buffer overflow that occurred.

Other Features

Not only does Electric Fence help detect buffer overflows, but it also can detect buffer underruns (accessing the memory before the start of a malloc() ed buffer) and accesses to memory that has already been free() ed. If the EF_PROTECT_BELOW environment variable is set to 1, Electric Fence traps buffer underruns instead of overflows. It does this by placing an inaccessible memory region immediately before the valid memory region returned by malloc(). When it does this, it can no longer detect overflows because of the memory paging layout of most processors. The memory alignment concerns that make overflows tricky do not affect underruns, however, as in this mode, Electric Fence’s malloc() always returns a memory address at the beginning of a page, which is always aligned on a word boundary.

If EF_PROTECT_FREE is set to 1, free() makes the memory region passed to it inaccessible rather than return it to the free memory pool. If the program tries to access that memory at any point in the future, the kernel will detect the illegal access. Setting EF_PROTECT_FREE makes it easy to ensure that your code is not using free() ed memory at any point.

Limitations

While Electric Fence does a nice job of finding overflows of malloc() ed buffers, it does not help at all with tracking down problems with either global or locally allocated data. It also does not make any attempt to find memory leaks, so you have to look elsewhere for help with those problems.

Resource Consumption

Although Electric Fence is powerful, easy to use, and fast (because all the access checks are done in hardware), it does exact a price. Most processors allow the system to control access to memory only in units of a page at a time. On Intel 80x86 processors, for example, each page is 4,096 bytes in size. Because Electric Fence wants malloc() to set up two different regions for each call (one allowing access, the other allowing no access), each call to malloc() consumes a page of memory, or 4K![9] If the code being tested allocates a lot of small areas, linking the code against Electric Fence can easily increase the program’s memory usage by two or three orders of magnitude! Of course, using EF_PROTECT_FREE makes this even worse because that memory is never freed.

For systems with lots of memory relative to the size of the program you are debugging, when you are looking to find the source of a specific instance of corruption, Electric Fence may be faster than Valgrind. However, if you need to enable a gigabyte of swap space just to make Electric Fence work, then Valgrind will probably be much faster even though it is using a CPU emulator instead of the native CPU.



[1] Unfortunately, none of the tools discussed in this chapter is capable of tracking memory errors with global variables; this requires help from the compiler. In the first edition of Linux Application Development, we discussed a tool called Checker that was a modified version of the gcc compiler, but it is no longer maintained. A new technology called “mudflap” is in the process of being added to the official gcc compiler, but it is not yet integrated as we write this. If you are looking for errors involving global variables, you may wish to check the current state of mudflap technology in gcc by reading the current gcc manual. However, since overuse of global variables tends to be a sign of bad program design, you might also consider design changes that eliminate or reduce your use of global variables, which might give other benefits as well.

[2] Available from http://www3.telus.net/taj_khattra/mpr.html as well as with many Linux distributions.

[3] For portability, most of the mpr log analysis tools use gdb to relate addresses to their location in the source code. For this to work, the program must include debugging information.

[6] Available from ftp://sunsite.unc.edu/pub/Linux/devel/lang/c as well as with many distributions.

[7] See Chapter 13 for details on mmap().

[8] Most traditional Unix systems deliver a bus error (SIGBUS) to a process that attempts to use misaligned data. The Linux kernel actually handles unaligned accesses so that the process can continue normally, but at a large performance penalty.

[9] On Linux/Intel and Linux/SPARC systems anyway. The page size depends on the underlying hardware architecture, and may be 16K or even larger on some systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset