Code for Speed

I'm not the fastest coder in the world, but the code I write tends to have a fair amount of consideration for performance where practical. This is not to say that all my code runs at lightning speed; I place a high value on the maintainability of code. Obscure performance tricks which don't benefit the application in a meaningful way, often come at the expense of clarity, so I sometimes avoid them. However, it's important to understand what's possible to achieve so that all the benefits can be properly weighed and a balanced solution is implemented.

There are many places to pick up speed in code. Many of these typical rules of thumb may be familiar to C and C++ coders:

pre-increment loop counters
avoid mathematical division operations
understand that string operations are often CPU intensive
keep function calls returning constant values out of loop conditions

There are many more of these types of rules. Much of this comes from an in-depth understanding of processors and instruction sets as well as a fair amount of experience writing assembly language code.

Another often overlooked source of performance improvement is the central algorithm employed by the program. Consider whether you are using the quickest sort method for your particular needs. Do the SQL queries in your code result in multiple full-table searches that could be factored out with a different table scheme, or a better index? Should you be using a memory pool of pre-allocated objects instead of constructing and destroying objects on the fly?

Here's one real world example from my archives that is very instructive. The story is that one of my neighbors gave me a "Scramble Squares" puzzle to solve. I spent the next 6 hours sitting on my floor trying my best to arrange the 9 pieces correctly, but failed. I was finally persuaded to write a program for it. I was trying to improve my Ruby coding at the time so it seemed that this might be a good practice task.

My first draft employed a brute force approach. The turn-of-the-century PC on which I initially developed this could crunch through about half a million potential solutions per minute. However, there are over 95 billion ways to put this puzzle together! At that rate, it takes more than six months of CPU time to go through all of them -- it would be reasonable to expect a solution would take many weeks, if we're lucky.

Here's the output of the program running on my speedier 2012 machine. The output displays a collection of pairs to represent each piece of the puzzle and a rotation factor as well as the execution clock time.

$ ssquares.rb brute_force solution: [[1, 1], [6, 0], [4, 0], [7, 2], [2, 0], [8, 0], [5, 0], [0, 0], [3, 0]] run time: 27055.505 seconds

Wow! That was fast! The effect of throwing hardware at the problem is quite remarkable in itself, as a solution was found in less than 8 hours!

So what happens if we change the algorithm? Clearly we should be able to tell that if two adjacent pieces of the puzzle aren't correct, then simply rotating and shuffling all the other pieces isn't going to reveal a valid solution either, so why bother taking the time to test them? We should be able to eliminate billions and billions of iterations by not checking each possible solution when we already know in advance that it will fail. Since a human can very likely solve the puzzle, and do it in less than a lifetime, we need to make the program approach the problem more like a human.

Changing from the brute force approach to one that is more intelligent is a puzzle in itself. Of course, it's also possible to find methods for solving the puzzle on-line. I implemented one such method and called it the "human" algorithm. Here's the output from running that:

$ ssquares.rb human solution: [[1, 1], [6, 0], [4, 0], [7, 2], [2, 0], [8, 0], [5, 0], [0, 0], [3, 0]] run time: 0.033 seconds

Wow! That was fast! Indeed, given the same starting point, the modified Ruby program using the human algorithm on my 2012 PC finds the same solution in well under a tenth of a second. That's a mind boggling performance improvement based simply on the choice of algorithm!

Of course, it's surely possible to run even faster if written in C, but is that necessary? Or worth the effort? Depending on the circumstances, solving the problem in 33 milliseconds may be good enough. Implementing this solution in well-tuned C might reduce the time, maybe even by a factor of 10, but even then, we're still talking about 3 milliseconds. A good engineer will have to determine if the savings are worth the effort required for the rewrite. In many cases, probably not.

And that is why good engineers are well worth their expense to the businesses that employ them.