Finding Run-time Memory Errors

A sophisticated tool for the thorniest of bugs

Probably the most insidious bug infesting C programs is the array-bounds violation. In its more subtle forms, it merely leads to slightly incorrect results. Virulent strains can cause stack corruption, segmentation violations, and ultimately, programmer insanity. A wide variety of free and commercial malloc() debugging packages are available to help combat this plague. Unfortunately, they're awkward to use and only address part of the problem.

However, a number of "modern" tools designed to detect memory-related errors are now available. These tools, such as Purify from Pure Software, Insight from Parasoft, MemCheck from StratosWare, and Sentinel from Virtual Technologies, are modern in that they perform sophisticated checking and generate detailed reports of C/C++ programs at run time. In this article, I'll focus on Purify 2.0.

Besides array-bounds violations, Purify 2.0 detects memory leaks and the use of uninitialized memory, NULL pointers, and free()ed memory. (Figure 1 lists errors Purify can catch.) Currently, Purify is available only on Sun SPARCstations, but HP 9000-support is forthcoming. [HP and SGI are now supported ed.] It supports C, C++, and Fortran 77. I've used it with cc and CC from Sun, lcc from Lucid, and GNU gcc and g++ from Free Software Foundation. Pure Software supports a number of other compilers as well.

Innocence Lost

Although I write software for different platforms, my primary project has been OPAL, a PLD programming package consisting of 20 programs and 80,000 lines of code, which supports hoth DOS and Sun UNIX environments. Since DOS is tolerant of memory errors, we decided to focus quality assurance on the UNIX platform, where a segmentation violation is likely.

The result was a brute-force malloc() replacement (written by Andy Valencia, of Sequent, and myself) which could detect array-bound violations of malloc()ed memory. Unfortunately, it used enormous amounts of memory (a minimum of two virtual pages -- typically 8K -- for each malloc() call), which usually thrashed the system, and frequently exhausted all available virtual memory. I first ran across Purify at the Winter 1992 USENIX conference. Since then, I've used Purify on every program I've written, as well as on the public-domain packages to which I contribute.

Overview

Purify can be used almost immediately after installation. To illustrate how to use it, I've written a short program which contains a number of errors. While the example is admittedly contrived, all the errors are familiar to C programmers. For the sake of illustration, imagine the code is tens of thousands of lines long where you would have only a small chance of finding the errors by sight.

Purify can be run either from the command line or from within a make file by prefixing your normal linking command with the command purify. Although I'm compiling with the -g debugging option, Purify doesn't require it. If you use -g, Purify will produce more readable error messages. Since there's only one source module, I'll skip the make file and type purify gcc -g -o string stringl.c.

Purify first allows gcc to compile string.c, intercepting the link step. It then modifies the object file string.o and the standard library libc.a to insert all of its error checking code. It then uses its own incremental linker to produce the executable file.

Uninitialized Memory Read

When execution of string starts, Purify verifies you have a license. If so, it runs normally, but whenever an error is detected, it prints a message. Since the messages are sent to stderr, they are easy to separate from normal program output.

The first reported error is an "uninitialized memory read" of the local variable string on line 13 of stringl.c, the stringCopy() function call. Returning to the source file, you'll notice I passed string as an argument without initializing it to a section of allocated memory. Knowing a bit about pointers, I suspect that the garbage pointer is what lead to the second error, a segmentation violation. Purify specified the stack frame and line number of both errors. The familiar UNIX message "segmentation fault" doesn't convey this information.

The first error is easy enough to fix; the modified version of main() is then recompiled. Compiling is much faster this time since the Purify-ed version of libc.a had been cached. Nevertheless, when I run my program, I still have an uninitialized memory read, but it's due to the local variable length in stringCopy(). Purify shows the entire stack frame and size of the error. Also, since the program didn't crash this time, a summary is printed at the end.

Once again, the error is simple to fix by initializing length to 0. After I compile and run this version, Purify reports no errors. Feeling overjoyed, I decide to exercise my program a bit with different strings.

Figure 1: The errors caught by Purify

Array Bounds Violations and NULL Pointers

When I first run the new version, it produces a long list of errors. To shorten this list, I use Purify's batch mode by entering purify -batch gcc -g -o string string3.c. This option consolidates all error messages of the same type that occur on the same line. The report shows I had an "array bounds write" error by writing past the end of my malloc()ed memory. The report identifies where the error occurs, where that memory was allocated from, and the amount of memory allocated.

When I examine my code, I realize I neglected to allocate enough memory to store each of my strings, specifically the second. I can either fix this by allocating more memory, or by passing an additional parameter to stringCopy(). Since the latter is more general, I go with that alternative. (Lack of array size testing is one of the programming mistakes that the Internet Worm used to its advantage.)

The report also identifies the obvious use of a NULL pointer. At first, you may not think that this feature is special; after all, UNIX "reports" the error with a segmentation violation. The advantage is that Purify identifies the line number and stack frame when the error occurred. This is a trivial error to fix by checking that neither "source" nor "destination" are NULL. In addition to the two bug fixes, I decide to get a bit fancy with my testing by adding a loop.

Memory Leaks

In the newest version all of my access errors have been fixed. The report now identifies 300 bytes of leaked memory. A memory leak is allocated memory that has no active pointer pointing to it. Purify also identifies potential memory leaks, those areas that have a valid pointer that isn't pointing at the first byte. Usually, potential leaks are due to incrementing a pointer across a string and not freeing it. However, they sometimes hint at a variety of other problems.

Many C programmers have been conditioned not to worry about memory leaks since the memory is reclaimed by the operating system when the program ceases execution. However, neglecting to free() memory used early in a program's execution can cause large programs to page fault unnecessarily or run out of virtual memory. The X Window System and programs built on top of it are notorious for this type of error, particularly since some of us remain logged in for months at a time.

In this case, since Purify points out exactly where the memory was allocated, I realized that I forgot to free() my allocated memory. This memory leak is easy to fix by placing free(string); at the end of the loop. After this change, the program runs as expected, and Purify reports no errors with it.

Purify's API

Purify provides a large number of functions which can be called from a debugger (such as dbx or gdb), or from within the program itself. Functions are provided to control the batch mode, print to the log file, report on the state of memory leaks, and to print detailed information about memory locations. Many of these are useful within an assert() statement to ensure that previously fixed problems do not return.

In addition, watchpoints are provided to break on a read, write, allocation, free, entry, or exit of a specific or range memory location. From within a debugger, it's also possible to break on any detected error.

Error Suppression

Purify will sometimes detect a violation that you know is acceptable. Usually it is a known error that does no harm, in a library over which you have no control, or one that hasn't been fixed yet. Occasionally, it will be due to a bit of strange code that confuses Purify.

Purify messages can be suppressed by using a .purify file. This file can exist in your home directory (for general problems), or in the current directory (for project-specific problems). I have in my home directory a .purify file which contains the entry "suppress abw tzload," where abw stands for "array bounds write." This fixes a bug that SunOS 4.1.1 has in its standard library version of tzload(), a routine called by many of the time functions, which writes one byte past the end of allocated memory. I am [highlighting] this code not to point fingers at Sun, but to show that these types of errors can crop up in extremely reliable commercial code.

It's also possible to suppress errors based on a certain stack frame. For example, instead of ignoring all "array bounds writes" in tzload(), I may only wish to suppress them if tzload() is called from tzsetwall(). To do this, my .purify file would contain suppress abw tzload; tzsetwall. The use of wildcards is allowed.

Purify's Innards

The first phase of Purify runs after compilation and before linking. It takes each object file and library and inserts a special function call before every memory access. It inserts additional code during stack frame creation, and within malloc() and free(). Since it is the object files that are modified, Purify can detect errors in all aspects of the program, even hand-optimized assembly code and commercial libraries for which no source is available.

The second phase starts during run time, when the additional code is executed. This code overhead maintains and checks a two-bit entry for every byte in the heap, stack, data and bss sections. The entry indicates the state of the byte: unallocated, allocated but uninitialized, or allocated and initialized. By checking this state on every access, Purify can easily detect the use of stray pointers (unallocated memory) memory that has been free()ed, and uninitialized memory.

However, since Purify operates at the instruction level, it cannot detect stray instruction fetches; that is, a runaway program counter, like those products that utilize debugging modes within the microprocessor itself. On the other hand, this type of error is rare, and will quickly be rewarded with a segmentation violation on UNIX. It is also likely that Purify would detect the cause of the invalid program counter, which is often caused by corrupting the stack with an out-of-bounds write to a local array.

Unfortunately, Purify will also overlook uninitialized bits within a byte, as long as at least one bit has been initialized. This can result from the use of an operator assignment expression, see Listing Ten, page 92. Purify Software did this intentionally for subtle reasons. Array bounds violations are detected in a similar manner by allocating extra space hoth before and after the requested memory and marking that space as unallocated. This is done for both malloc()ed and static arrays. An extremely bad array-bounds violation; for example, array[lOOO] on a 10-byte array, has some small chance of ending up in the valid section of another array, but I've never seen this happen in practice.

Performance Issues

It's no surprise that Purify affects program performance. Although, based on the poor performance of my previous home-grown tools and Purify's many additional features, I'm satisfied with the cost.

Link-time performance can be poor. The first link of a program will take about ten times longer than usual. After that, though, it will only take a few times as long because Purify caches the modified object files, and uses an incremental linker by default. Although these cached files tend to clutter your project directory, the documentation does show exactly how to set up a crontab entry to remove old ones.

Run-time memory use is about 50 percent more than usual. About half of this increase is due to the two-bit state of each byte, and the remainder is due to the larger executable file. The executable file is typically about three times larger than normal, due to the inserted Purify code. With large applications, the extra memory usage becomes a concern due to the performance degradation caused by page swapping.

Run-time speed is generally three times slower, plus a few seconds overhead for license detection. For the types of programs I typically write this isn't a hindrance, so I use Purify all the time. Purify-ed programs which make use of a GUI, particularly those built on top of X Window, could quickly annoy the user. This is especially true if the machine has little real memory (less than 16 Mbytes). Programs that perform a lot of memory allocation and freeing also experience slow-downs since the memory is not reallocated immediately.

Credits

From an article in the DDJ MAGAZINE November, 1993 by Taed Nelson.

Taed is a senior software engineer at National Semiconductor Corporation. He can be reached via the Internet at nelson@berlioz.nsc.com.

Reprinted with permission from DDJ MAGAZINE November, 1993, Vol 18, Issue 12 (C) Copyright 1993 Miller Freeman, Inc. ALL RIGHTS RESERVED.


[More Information?]

Comments? We'd like to hear from you.
An Index and Help are available.
Copyright 1995 Pure Software Inc. 1309 South Mary Avenue, Sunnyvale, CA 94087 USA. All rights reserved.