Optimize loops
Loops make programs run slowly. A few unoptimal lines can make an app run at a snail's pace. This article presents performance tricks for squeezing the max speed out of your code. The focus is on processing data arrays in loops. We restructure loops, rebuild function calls, fine-tune conditionals, choose fast operators, pre-calculate values and access arrays the proper way.
This article features tips for the optimization of numeric computation, repetitive execution, recursion and arrays. If your program processes data arrays too slowly, these ideas can give your app a real performance boost. On the other hand, if your application is slow at processing strings, read Optimized string handling instead.
The optimization tips are generic and apply to many programming languages. The code examples are in Visual Basic.
Speed up the loops
Loops are often the performance bottlenecks of an application. The key to speed up the program is to make the loops run faster. Nested loops are often where the problems lie in.
Introduction
Consider an imaging application as an example. It stores pixel images in simple arrays such as Data(x,y) or data[x,y]. To read, process and save the images, it needs to process the arrays element by element, pixel by pixel. This is done in two nested loops: the y direction and the x direction. A typical piece of code looks like this:
For y = 0 to Height-1
' Iterate through y axis
For x = 0 to Width-1
' Iterate through x axis
process the pixel at (x,y)
...
Next x
Next y
The inner code runs through each pixel, a total of x*y times, to process the pixels one after the other. As it happens, the program runs at a snail's pace. What can we do about it?
Focus on the inner loop
With nested loops, put your efforts on the innermost loop. It's the one that executes the largest number of times. This is where you usually get the most out of your efforts. Each statement and expression really makes a difference as it executes a million times. Here you can make miracles by eliminating unnecessary statements, choosing the right operators and picking the fastest data types.
Move loop condition to end-of-loop
Most programming languages have at least two types of loops. One is the pre-condition loop where your loop condition is at the start of the loop. The other is the post-condition loop where the condition is at the end.
1. Pre-condition loop | 2. Post-condition loop |
---|---|
|
|
Loop executes zero or more times (n≥0) | Loop executes one or more times (n≥1) |
Condition evaluated n+1 times | Condition evaluated n times |
Note on loop #2: In some languages, such as Pascal, the post-condition loop must
be written with an "until" condition such as repeat..until condition=false
. As far as optimization is concerned, the idea is the same.
Here's the optimization trick: If the loop should execute at least once, use a post-condition loop. What's the deal? You save one evaluation! Assume the loop runs n times. Loop #1 evaluates its condition n+1 times, but loop #2 evaluates only n times.
When to use this trick? It makes sense if the loop is nested inside another loop and execution enters the loop a large number of times. It also makes sense if evaluating the loop condition takes a considerable amount of time.
Be sure not to overuse this technique. You can't use a post-condition loop if the loop should be skipped sometimes (executed zero times). Then you must use the pre-condition loop.
Move loop invariants out of the loop
A loop invariant is an expression whose value doesn't change during the loop. In the following example, y*Width is evaluated on every iteration of the inner loop, even though its value is always the same. Width doesn't change during the loop, nor does y change within the inner loop. y does change at each iteration of the outer loop, but not before.
For y = 0 to Height-1
For x = 0 to Width-1
' y*Width is invariant
i = y*Width + x
Process i
Next x
Next y
To make the loop faster, move the invariant out of the inner loop:
For y = 0 to Height-1
Temp = y*Width
For x = 0 to Width-1
i = Temp + x
Process i
Next x
Next y
Your compiler might optimize loop invariants for you. Check to see if it really does. Compile both examples and measure the performance. If the performance is the same, you're lucky since you don't need to optimize this manually.
Use pre-calculated values
This technique is a special case of loop invariants. Replace invariant expressions with pre-calculated values (constants). Consider this simple innocent-looking expression:
x = y And Not 192
What's wrong? You possibly evaluate Not 192
on every execution. You can pre-calculate the value as a constant like this:
Const z = Not 192
x = y And z
Again, your compiler might do this for you, so check it out.
Use lookup tables
If you don't have a single constant value, but a set of alternating values, you can't use the loop invariant techniques described above. You can, however, use lookup tables for the same effect. Consider this code:
x = 2 ^ (16 - (y And 15))
We assume y is anything from 0 up to the maximum integer. As you can
see, x is calculated as a of power 2, the exponent varying from 1 to 16.
(y And 15)
varies between 0..15, so the exponent is between 1..16. This
expression is relatively slow since it contains the slow exponential
operator ^
. It is a great case for a lookup table. You pre-calculate the
values and store them in an array. Reading from the array is faster than
re-evaluating the same 16 values over and over again.
Here's how you build the lookup table:
(y And 15)
varies from 0 to 15.
16 - (y And 15)
varies from 16 to 1.
- Thus, we need to calculate the powers of 2 from 216 to 21.
' Powers of 2 (from 2^16 to 2^1)
Dim Pow(0 To 15) As Long
For i = 0 to 15
Pow(i) = 2 ^ (16 - i)
Next
...
' Use the lookup table
x = Pow(y And 15)
' same as x = 2 ^ (16 - (y And 15))
If you know in advance the values that y can take, you can
expand the lookup table. Instead of (y And 15)
, calculate the
table for (y)
. In your application, y might take
values between 0..255 or 0..65535. In this case it could make sense to
pre- calculate a larger table, such as Pow(0 To 255)
or Pow(0 to 65535)
.
This way you would get rid of the And 15
part and save some calculation
in your loops.
Expand loops
Some loops run faster if you write them out. Consider this simple loop:
For y = 0 to 7
i(y) = z And 2^y
Next
You can expand this short loop as:
i(0) = z And 1
i(1) = z And 2
i(2) = z And 4
i(3) = z And 8
i(4) = z And 16
i(5) = z And 32
i(6) = z And 64
i(7) = z And 128
This is a speed vs. code size issue. It can make a big difference if the formula is slow. Rewriting it "open" (instead of re-evaluating the formula over and over) can make the code a lot faster. The expression above may be too simple to justify expansion. To make the point clear, consider this artificial example:
For y = 0 to 7
i(y) = j(0) And 2^y _
+ j(1) And 2^(y-1) _
+ j(2) And 2^(y-2) _
+ j(3) And 2^(y-3)
Next
Executing this line in VB6 will take a long time. Expanding it using
the above technique will boost the performance considerably. You expand
each of 2^y
, 2^(y-1)
, 2^(y-2)
and 2^(y-3)
. Since you know these values
in advance, you don't need to re-calculate them over and over.
Speed up operations on data
Eliminate unnecessary temporary variables
Programs often store intermediate results in a temporary variable before using it further. Here is a typical example:
For y = 0 to Height-1
For x = 0 to Width-1
temp = calculate(x, y)
Process temp
Next x
Next y
Many times you can make the program faster by eliminating the temporary variable. You save the effort of unnecessarily storing the value:
For y = 0 to Height-1
For x = 0 to Width-1
Process calculate(x, y)
Next x
Next y
The benefits of doing this vary by the language. Some compilers automatically optimize this for you, so removing temp manually makes no difference.
Another typical example is this:
For y = 0 to Height-1
For x = 0 to Width-1
result = calculate(x, y)
Next x
Next y
Here you don't actually use the value of result for anything. For all your purposes, result is effectively dead. It is only consuming system resources. Again, some compilers will eliminate such an unused variable on your behalf. For languages that don't have this feature, such as the good old Visual Basic, you can use a code analyzer (see Project Analyzer) to locate the dead variables for you.
Choose faster operators
Some mathematical operators take more to run than others. Testing the actual performance in your environment and with your data types is recommended. Here are some examples in VB6:
Operator | Slower | Faster | Comment |
---|---|---|---|
Exponential | x ^ 2 | x * x | The exponential operator is often slow. |
Modulo | x Mod 16 | x And 15 | You can rewrite x Mod y as x And (y-1) when x≥0 and y is a power of 2. What happens when x<0 is left as an exercise for the reader. |
Division | x / y | x \ y | Integer division can run faster than floating point division, depending on the arguments. |
Note: You may need to write x*x*x*x
as temp=x*x : result=temp*temp
. It depends on how well your compiler optimizes it out.
Choose faster data types
Picking the right data type can make a huge difference. One might assume, for example, that a smaller data type (such as Byte) is faster than a larger data type (such as 32-bit Long integer). Well, this isn't the case. In VB6, Long is the fastest integer data type. It's also the native data type of the processor. Even if the value fits in a byte, put it in a Long if maximum speed is what you are after. (This advice doesn't necessarily work for large arrays, where there is the tradeoff between memory and speed.)
If you can, use integer data instead of floating point. Unnecessary use of floating point values makes a big performance hit.
With data types, there is a classic tradeoff between memory usage and speed. The article Save memory discusses variable data types and their memory consumption.
Optimize the control flow
Control flow optimization is an important part of giving your programs a performance boost. Sometimes you need to twist your code so that it's harder to read but runs faster.
If..Else..Then
The order of the If..Then..Else
branches is not
insignificant. The usual way to code is to make the code look natural to
humans. This isn't always the fastest way. Did you know the
Then
block runs faster than the Else
block (in
VB6)? It makes a difference if the other block executes more often than
the other. Consider this code:
If condition Then
rare_block
Else
usual_block
End If
Let's assume you have noticed that the program usually jumps to the usual_block. The rare_block executes only a few times. Unfortunately, jumping down to the usual_block takes too much time. Rewrite this as:
If Not condition Then
usual_block
Else
rare_block
End If
Now the usual_block is on top and you just saved someone from getting fed up with your slow program.
The same idea works also for Select Case
. Put the most
frequently used Case
block at the top. Again, this rule works so
for VB6. Check it out with your language before optimizing.
Short-circuit conditionals
Short-circuiting speeds up complex conditional statements. You can use it for conditionals with AND/OR operators.
In languages where operators AND/OR are not short-circuited, executing an expression such as "x And y" and "x Or y" evaluates both x and y before deciding what the outcome is. If the operators are short-circuited, y is not necessarily evaluated at all. The following table describes this in detail.
Op. | Not short-circuited | Short-circuited | Flow chart |
---|---|---|---|
x And y |
|
|
|
x Or y |
|
|
For optimization purposes the benefit of short-circuiting is that you sometimes get away without evaluating the latter operand (y). If y happens to be a time-intensive operation, such as a function call, your program executes faster.
The potential pitfall with short-circuiting is that y is not always evaluated (function not called). This could introduce a bug that is hard to notice.
Languages differ in whether their operators are short-circuited or
not. In Visual Basic 6.0 there is no short-circuiting. In C, on the
other hand, the operators && and || are always short-circuited.
In VB.NET, you have a choice. And
is not short-circuited,
AndAlso
is. Or
is not short-circuited, OrElse
is.
In languages that don't have short-circuited operators, you can simulate them with the following constructs:
Not short-circuited | Short-circuited | |
---|---|---|
AND |
|
|
OR |
|
|
Inline function calls
Repeated function calls add overhead. Each time you call a function, you spend a little time for things like passing parameters and saving the function return value.
Inlining means you copy & paste the called function into the caller. [Well, a simple copy & paste is not always enough, you need to take the parameters and local variables into account, too.] It works best if the inlined function is small and not very complex. If the function to inline is a complex one, the amount of rework (refactoring) may be considerable.
Inlining is easier with languages that support macros or special inlined functions. Macros and inlined functions are automatically inlined for you at compile-time, so you don't need to do it manually.
Remove recursion
Recursion is a nice technique where a function calls itself repeatedly until the work is done. It's often the most elegant way to code some algorithm.
Unfortunately, recursion also adds the overhead of a function call. If the recursion tree gets deep, you spend the time with the calls instead of the real computation.
You remove recursion by replacing it with a loop. Sometimes it is enough to use GoTo to jump to the start of the function. Sometimes you need to do more, like arrange parameter values and local variables.
Visual Basic 6.0 optimization tricks
The following lists a number of VB6 specific optimization tricks. Other than VB6 coders can skip them.
VB6 file access tricks
There is no need to read/write files byte by byte. You can read/write large blocks at a time.
Read byte array from file
' Allocate space for count bytes
ReDim array(1 to count) As Byte
' Read count bytes from file
Get #filenum,, array()
Read entire file to byte array
' Allocate space for entire file
ReDim array(1 to LOF(filenum)) As Byte
' Read entire file
Get #filenum, 1, array()
Save byte array to file
Put #filenum,, array()
VB6 CopyMemory
Copying large data blocks in VB6 is slow. You can use a Windows system call to do that. CopyMemory copies a given number of bytes from one memory location to the other.
Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" _
(Destination As Any, Source As Any, ByVal Length As Long)
CopyMemory is a bit tricky to get right the first time. Here's how you use the function.
CopyMemory Destination, Source, numbytes
- Destination must refer to the first byte of the destination data. In an array, it is the first array element to copy to. Often it is Destination(0), the first array element.
- Source must refer to the first byte of the source data. You use it the same way as Destination.
- numbytes is the number of bytes to copy. Make sure it is the exact number of bytes, not the number of array elements, and not 1+number of bytes or something like that. If you copy too many bytes, your program will crash.
CopyMemory: Copy an array
' Source data is numbytes in size
ReDim Source(0 to numbytes-1)
' Allocate memory for destination array
ReDim Destination(0 to numbytes-1)
CopyMemory Destination(0), Source(0), numbytes
It is important that there are at least numbytes bytes of data in both Destination() and Source(). Otherwise your program will crash. The equivalent VB code for array copy is:
Destination() = Source()
This the safer way to copy an array, but it's also the slower way.
CopyMemory: Assign a byte array to UDT
You can copy a byte array over a user-defined type (UDT) variable. This way you can access the various fields of the UDT directly without having to interpret the data from the raw bytes. Here is how you do it:
' Target variable
Dim UDT as MyUDT
' Source data in byte array, size LenB(UDT) bytes
ReDim Source(0 to LenB(UDT)-1)
' Copy source to target
CopyMemory UDT, Source(0), LenB(UDT)
In plain English, you copy exactly the UDT size bytes from the byte
array over the UDT. The call LenB(UDT)
returns the number of bytes
required. The data will automatically arrange itself into the various
fields of the UDT. — Note how you need to call LenB instead of
Len. Keep in mind that Len ≤ LenB. UDTs can contain extra alignment
bytes or Unicode Strings, which occupy 2 bytes per character. Len
doesn't count the extra bytes. The effect is that you get an incomplete
copy with the end bytes not being copied at all.
You can also do it vice versa. Once you have populated the UDT fields, copy it to a byte array.
CopyMemory and type conversions
You can do many other unorthodox type conversions with CopyMemory. You can, for example, copy a 2-dimensional array over a 1-dimensional array. This way you can convert a 2D (x,y) image into a simple list of bytes for further processing.
VB6 ReDim Preserve: Use sparingly
When you want to reallocate more (or less) space for an array, you
use ReDim Preserve
. Unfortunately, this is a slow command.
The problem is worse if you call ReDim Preserve
repeatedly during
processing.
You are better off allocating enough space first. When the processing
is done, deallocate the extra by a single ReDim Preserve
.
Read more: Allocate arrays wisely
VB6 With..End With block
You may have heard that the With..End With
block saves typing and also time. Did you know it's a slow statement? You don't want to use it just to access a single field. Consider the With block below:
With array(x)
If .a = 0 Then
.a = 1
.b = 2 ...
If the condition .a=0
is rarely met, the With
statement
may take a considerable amount of extra time. It may run faster rewritten
as:
If array(x).a = 0 Then
With array(x)
.a = 1
.b = 2 ...
VB6 Pass initial values as parameters
You can save time by passing initial variable values as parameters. Consider the following example procedure, which uses a local variable x with an initial value of 2:
Sub Test()
Dim x As Long
x = 2
...
End Sub
This doesn't look too bad, really. But wait... you can make it faster! You can pass the initial value as a parameter:
Sub Test(ByVal x as Long)
...
End Sub
' Caller:
Test 2
The last line calls Test giving x=2
as the initial value, the way we wanted. As strange as it may look, this actually runs faster. (By the way, make it ByVal, not ByRef, unless you're passing a String.)
The trick works because VB6 doesn't have a variable initializer clause. Integer variables are initialized to zero. When you want a non-zero initial value, you need to execute an assignment statement (x=2
). When you pass it as a parameter, there is no assignment statement – and your program runs faster!
This is nowhere near a good programming practice. After all, callers must pass 2 as the parameter, otherwise the program will fail. On the other hand, end-users will surely value faster execution over nice coding. Just document it well.
VB6 Use zero initial values
As an alternative to the above trick, you can pick local variables so that their initial values are zero. Instead of a 1-based array index use a 0-based index, for example.
VB6 variables and arrays
Static variables are soooo static
Avoid the use of Static
local variables. They are far
slower than regular Dim
variables. Avoid the use of the
Static
keyword altogether. Use module-level variables as a
replacement.
Module-level or local variables?
Oddly enough, the speed difference between accessing a module-level variable (class variable) vs. a local variable is not constant. Sometimes it's faster to read/write a local, sometimes a module-level variable. Experiment.
You can replace a module-level variable with a local copy. Use a local temporary variable and write the changes back at the end of the procedure. If this makes sense, that is.
Dynamic array is faster than fixed-size array
Dynamic arrays are faster to access than static arrays. Consider the following alternatives:
Static array | Dynamic array |
---|---|
Fixed size array. | Array size can change during execution. |
Dim a(10000) As Byte |
Dim a() As Byte |
It is faster to access the array you allocated with
Dim+ReDim
. Note that you do need the Dim
before ReDim
!
ReDim
alone is not enough. Use Dim
to define the data type and ReDim
to
define the size. ReDim
cannot change the data type of an array, so there
is no need to put the data type in the ReDim
statement. Putting it there
will do no harm, though.
(End of Visual Basic 6.0 specific tricks)
General considerations
Don't process the data unless you have to. This sounds obvious, but sometimes you see code that runs in vain. Consider a function that arranges an array of bytes. If the bytes are already in the correct arrangement, there is no reason to rearrange. An example is a sort routine that sorts data that was already sorted, or an imaging application that optimizes the image palette even if it was optimal already.
Process as little data as you can. Consider an imaging application that compresses image bytes. The fewer bytes there are to compress, the faster it runs. To keep the amount of data down, use the lowest color depth (bits per pixel) possible. The image can be represented with fewer bytes and the compression runs faster.
Pre-process the data. If your code spends its time doing lookups on data, store the data so that the lookups are faster. As an example, you can sort the data and do the lookups in the sorted data, which is generally much faster than searching in unsorted data.
Use available system functions. They often run faster than what you can possibly code. Learn the various functions of your programming language, framework and the operating system. As an example, searching for a character in a string in VB6 is faster done with the built-in InStr function than reading through the string character by character.
Optimization requires documentation
Code optimization can make the program look like spaghetti. The more
you optimize, the harder the code gets to read. When you replace
recursion by a GoTo
, inline function calls, rearrange
If..Then..Else
blocks, replace clear formulas with
mysterious optimized versions, the code gets more and more complex for
the reader. This makes it a pain to maintain.
Therefore, remember to document your work. Even if the original code was clear enough to go without comments, the optimized version probably needs them badly. Comment out the old code and write an explanation what was changed and why. If the optimized code is too hard to understand, the original lines can make it clear how it used to work. Explain why you chose And instead of Mod and what the reason was.
Keep in mind that optimizing a piece of code could change the
requirements of calling it. As an example, replacing x Mod y
with x And
(y-1)
requires that x ≥ 0 and y is a power of 2. Similarly, replacing floating
point division with integer division could make a difference with some
special input values (due to rounding). Moving the loop condition to
end-of-loop absolutely requires the loop must execute at least once.
Document the changed requirements as you optimize. It will make life
much easier later on.
Finding the bottlenecks
Most of the performance gains are often due to a limited number of changes in certain key locations in your code. What are these locations and how can you find them?
A critical location is executed thousands or hundreds of thousands of times. It may be inside a loop or a recursive algorithm. The location may consist of a handful of procedures or certain lines of code inside them.
It's not always possible to tell a bottleneck by just looking at the code searching for loops or recursion. A code profiler, such as VB Watch, is useful for finding the critical locations. It logs the execution times as you run your program. When ready, it tells you which procedures or lines were executed the highest number of times, and which ones took the longest time to execute. These places are the best candidates for manual optimization work.
Optimize loops
URN:NBN:fi-fe20061459