Best Practices: A Road to Peril

In the days of old when knights were bold ... many of us were taught to be disciplined with variable and memory initialization. Compilers and operating systems did very little on behalf of applications when allocating memory, etc. If your algorithm depended on a counter variable starting at zero, then you would be well served to set it to zero explicitly in your code. Failure to do so leads to subtle inconsistent errors that are hard to predict and costly to track down and fix.

The typical pattern that emerged was to simply initialize all variables to zero or null as required and to set blocks of memory to zero right after allocation. For example, this code is often encountered:

void typical_function() { int x = 0; int *xblock = malloc(1500); memset(xblock, 0, 1500); }

The above function allocates a so-called 'automatic' integer variable 'x' on the stack and gives it an initial value of zero. Without the initialization, there is no guarantee what value x would hold. Some compilers with certain options set might go ahead and initialize that on behalf of the programmer, but probably not in most cases.

The next line allocates 1500 bytes from the heap and returns a pointer to it. The memory space is not prepared in any way, so the programmer must then include another system function call to explicity initialize all that space to zeroes (it's not unusual for the programmer to set the memory to some initial value other than zero, depending on the program requirements).

In many circumstances, the above pattern can still be a good practice. Now let's consider a more modern case that is quite common in an object-oriented architecture.

First, what if the variable 'x' in the above typical_function() is a complex data type such as a structure like this:

typedef struct { int x; int y; int z; } TypeXYZ;

Let's rewrite our typical_function() using our new TypeXYZ construct.

void new_typical() { TypeXYZ x; TypeXYZ *xblock = malloc(1500 * sizeof(TypeXYZ)); memset(xblock, 0, 1500 * sizeof(TypeXYZ)); }

Now we will start having problems. First of all, 'x' is no longer getting initialized so each of it's member attributes may well contain random values. On the other hand, 'xblock' will still point to a block of memory large enough to contain 1500 TypeXYZ objects, all of whose members will be set to zero. For example, the following would be true:

xblock[2].x == 0

Some less thoughtful or disciplined programmers might be tempted to also use memset() to initialize 'x' like so:

void bad_practice() { TypeXYZ x; memset(&x, 0, sizeof(TypeXYZ)); TypeXYZ *xblock = malloc(1500 * sizeof(TypeXYZ)); memset(xblock, 0, 1500 * sizeof(TypeXYZ)); }

This will appear to work fine given what we have done so far, but I can attest that it will eventually rise up and bite you, the program maintainer who comes along later. I encountered this very situation recently in a project. One of the other team members wrote the above code using memset() to initialize the complex TypeXYZ object.

The conflict came about because there was a need to modify TypeXYZ to support a new feature. As it happens, TypeXYZ might be used in many different parts of an application, or even in separate applications altogether. The problem with using the 'memset()' initialization method is that it necessarily assumes that the complex TypeXYZ class/structure contains ONLY primitive data types -- an assumption that will not always hold true. What if we need to modify it like so:

typedef struct xyz { int x; int y; int z; std::string name; xyz() : x(1), y(2), z(3), name("Betty") {} } TypeXYZ;

We've aggregated another complex type (std::string) into our structure and we've also added a simple constructor that explicitly initializes the attributes. Not only does this modification improve the definition and dependability of our TypeXYZ, but it remains backward compatible with all well-behaved existing code that does it's own member initialization and uses only the old x, y, and z attributes. This is a good practice.

But the bad_practice() function is going to crash and fail miserably. Not only is it going to undo the work of the constructor, it's going to unmercilessly destroy the 'name' string. If execution doesn't fail in the first memset(), any attempt to reference the 'name' attribute later will generate a null pointer exception and/or cause a core dump.

The problem is that making what would appear to be a safe modifcation in another completely unrelated part of the code will cause unexpected consequences that might not be readily apparent until just the right circumstances arise. As these types of practices creep more and more into the code base, later maintanance may make the code increasingly fragile. At some point someone will make the case to completely throw out this 'old' code in favor of the very expensive proposition or rewriting things from scratch.

If dependable initialization of a complex type is important, then the best practice to employ is to define TypeXYZ as we just have, including the constructor, and then write our function like this:

void best_practice() { TypeXYZ x; std::list<TypeXYZ> *xblock = new std::list<TypeXYZ>(1500); }

Crisp, clean, and no caffeine!