When people develop and maintain software, they usually try to make sure that their code not only works well, but also that it runs fast. There are many different ways to make code perform better. It can be as simple as choosing reasonable compiler options, or as hard as rewriting some code in assembly.

If you see some code that takes long to run, you should not start optimizing by looking into the little details though. The first questions that you should ask is: are we actually using the right algorithm for the job? Are there any design mistakes that make things much worse than they need to be? This is especially important if you don’t know in advance how much input the code will have to handle. It should be obvious that it makes a difference if a list that needs to be sorted contains 1000 or 1 million items. However, how big that difference actually is depends on the choice of the algorithm and its correct implementation.

If you are not interested in mathematics or theoretical computer science, feel free to skip the next sections. Maybe the example programs at the end of the post will be interesting for you nonetheless, at least if you are fairly familiar with programming.

**Big O notation**

The exact time f(N), measured in milliseconds, CPU cycles, or some other suitable unit, that an algorithm requires if it is given an input of size N (N could be the length of a list or the length of a string, for example) is often less interesting than the growth rate, i.e., how fast that time grows if N becomes larger and larger. Is it constant, or does it increase logarithmically, linearly, or even worse?

To classify the so-called run-time complexity of an algorithm, one often uses the big-O notation. It can be used to provide an upper limit for f(N) in a rather simple way, even if writing down f(N) exactly would yield a very complex expression. The idea is to use a simple function g(N), and say that

if there is a constant c such that

for any N, i.e., if this constant times g(N) is an upper bound for f(N). Simple example: if

we could write f(N)=O(N) because the linear term 4 N is the one which grows fastest, such that the total value f(N) is always smaller than some constant times N.

Typical orders (of growth) that often occur in algorithms in the software that we use include O(1), O(log N), O(N), and O(N log N).

I’ll give some examples for situations where these occur:

constant: a simple example in the Dolphin context would be showing the files and folders on the screen. Even if you have 100,000 items in a directory, only a small, constant number of them can be shown at the same time, so the effort required to draw them should not depend on N at all. | |

logarithmic: the classical example is binary search. If you are given a sorted list of N items, and you have to find a particular item, you might start by looking at the element in the middle. This tells you if you have to look at the first or in the second half of the list. It is easy to show that applying this procedure repeatedly will yield the right item in steps. | |

linear: a Dolphin example would be loading the data for all files in a directory (like the name, size, and other things that we might want to show in the view). | |

If a good sorting algorithm is used to sort a list of N items, this is the typical number of comparisons of two items that are required, and hence, also a measure of the time that will be needed to sort the list. | |

quadratic: this is something that you really want to avoid, unless you can guarantee that you will never have to handle really many items, or there really is no better solution for your problem. If you are not sure if a better solution exists, you should look for one. Really. |

**Best case, average case, and worst case**

Things are often a bit more complicated: the run-time complexity of an algorithm can depend on the actual input data. For example, certain sorting algorithms might work better if the input data are (almost) sorted, and worse for certain other kinds of data.

Therefore, one often states the best case, average case, and worst case run-time complexity of an algorithm. Which of these is most important depends strongly on what you know in advance about the input data. If they are always in a certain form, an algorithm that has a good best-case performance just for that kind of data might be preferable. On the other hand, if the input data might come from an untrusted source, ensuring a reasonable worst-case performance should have the highest priority if you want to prevent that you and your users will be the target of algorithmic complexity attacks.

**Why am I telling you all this?**

This is an obvious question that you might ask now. Everyone who develops software should be familiar with those things already, right? Well, it turns out that a lot of real-world code does have a worst-case (or, sometimes, even an average) run-time complexity that is worse than it needs to be, either because the wrong algorithm has been chosen, or because a tiny mistake in the implementation makes things go terribly wrong if a certain kind of input is fed into the algorithm. This is why I thought it might be a good idea to write a series of blog posts to raise people’s awareness of these issues.

**Example**

Let’s start with a simple example, and assume that we have to write a function that replaces all occurrences of a certain character in a plain C string by another character. E.g., replace ‘u’ by ‘o’ in the string ‘algurithm’ to obtain the right spelling ‘algorithm’. Let us also ignore that we don’t actually use plain null-terminated C strings much in KDE/Qt code, and that solutions for this problem exist already.

Any algorithm to solve this problem will have to go through all N characters of the string once, and look at and possibly replace each character. So we would expect that the run-time complexity is O(N).

This is a possible solution that we could come up with:

void replace1(char* text, char oldChar, char newChar) { for (unsigned int i = 0; i < strlen(text); ++i) { if (text[i] == oldChar) { text[i] = newChar; } } }

Some will already see the flaw in this implementation, but I assume that many would just verify that this function works nicely for a few examples and then happily use it. It might even work OK in real world applications for some time, that is, until someone uses this function to handle a very long string.

A small benchmark that tests this function for strings with lengths between 100,000 and 800,000 characters can be found in my personal KDE scratch repository. The following plot shows the time in milliseconds as function of the input size (plots were generated with qtestlib-tools).

When you see this plot, you immediately realize that the time depends quadratically on the input size. So our algorithm has a run-time complexity of O(N^2), rather than the expected O(N). This results in an unacceptable run time of 8 seconds just for going through 800,000 characters and replacing some of them! If you are doing something like this in the main thread of an application, the GUI will be completely frozen during that time.

**What went wrong?**

The problem is the condition ‘i < strlen(text)’ in the for loop. This condition is evaluated after each execution of the loop, i.e., N times. Moreover, strlen(text) must go through the entire string, because it looks for the terminating null character, which defines the end of the string. This means that the O(N) function strlen(const char*) is executed N times. Now it is obvious why our function is O(N^2).

**Correcting the mistake**

Here is a better version of the function, and benchmark results that show that it needs only linear time. Note that the time to handle 800,000 characters has been reduced from 8 seconds to 1.5 milliseconds:

void replace2(char* text, char oldChar, char newChar) { const unsigned int length = strlen(text); for (unsigned int i = 0; i < length; ++i) { if (text[i] == oldChar) { text[i] = newChar; } } }

**What about the compiler?**

Maybe some readers will now ask if an optimizing compiler shouldn’t be able to see that calculating the length of the string repeatedly is unnecessary, and thus optimize the first version of our code such that it becomes O(N). It turns out that such an optimization is impossible because using our function to replace any char by 0 will actually change the length of the string inside the loop.

**Do professional developers really write code which has such problems?**

Yes, they do. According to an article that I once read on the web (I cannot find it at the moment, unfortunately – if you do, please add a comment!), the “strlen in for loop condition” issue once caused huge problems in the computer system of a bank. Moreover, there is still a lot of code out there which has similar problems, even if they are sometimes less obvious. In the next posts in this series, I’ll tell you about some further examples, which will be fixed in KDE 4.11.

Daniel StöckelHi, a very good read! I agree that many developers are not aware of issues like this. I see equivalents of the strlen code almost every day in student code. Part of the problem here is also missing runtime specifications in basically every API documentation. I shudder when I think about how much computing time and energy has been wasted because of a missing: “This function is expensive: do not call in tight loops”.

But I also have some nitpicking to do:

– Actually O(g) defines a set of functions. So actually writing f = O(g) is at the very best a (common) abuse of notation (which is unfortunately used and explained very late in the Wikipedia article). In my opinion using proper set notation is clearer and has the same complexity as typing ‘=’ 😉

– The f(N) = than some (finite) N_0, meaning, that f can do some pretty crazy stuff for (not so) small N.

– Not really a nitpick, but something to take into consideration: O(g(n)) is an upper bound. Thus

for a linear function f you can also write f \in O(n^2). If one really wants to make clear that the bound is tight (both from below and from above) one should use the the theta notation. Big-O is usually sufficient though.

Daniel StöckelHmm, writing less than and greater then in the same line is getting stripped. Point 2 was meant to read:

– The f(N) less or equal than c g(N) only needs to hold for all N greater than some (finite) N_0 …

freininghausPost authorThanks for your comments, which are greatly appreciated!

“Actually O(g) defines a set of functions”:

You are right, the set notation is certainly more precise. Maybe I should change it.

“The f(N) less or equal than c g(N) only needs to hold for all N greater than some (finite) N_0”:

You are right, but since we are dealing with positive integers N and positive functions f and g here, doesn’t this actually imply that the relation holds for all N?

The maximum f_max of f(1), f(2), … f(N_0) is a well-defined positive number, and so is g_min, the minimum of g(1), g(2), … g(N_0). If I define

c’=max(c, f_max/g_min)

then I think that we can show that f(N) is less or equal than c’ g(N) for all N.

“Not really a nitpick, but something to take into consideration: O(g(n)) is an upper bound.”:

Yes, I am aware of that, but I didn’t want to make the theory part of this post too complicated for people who never heard about these things 🙂

Daniel StöckelYes, N_0 condition is really seldomly needed when dealing with algorithmic complexity. You are also right: if f and g (especially g) are positive it is not necessary. g being zero at some point would be a massive show stopper though.

“Yes, I am aware of that, but I didn’t want to make the theory part of this post too complicated for people who never heard about these things”

Agreed, it does not really help to bring the point across 😀

PeteExcellent article! I’m a programming beginner myself and your article really helps to avoid such mistakes in the future. I’m already looking forward to your next real world examples 🙂

Greetings

ProgrammerYour improvement putting strlen out of the for loop is actually a good idea. But you could still gain a factor of 2 by not calculating strlen at all. You just stop your loop when text[i] is the null character.

When using string classes, usually size is O(1)… 🙂

freininghausPost authorGood observation, the call to strlen is indeed unnecessary here 🙂

Note that I do not suggest to use the functions here in applications. I was just trying to find a minimal example where a small change changes O(N^2) behavior to O(N).

InkaneJust as a note: Just because an algorithm performs better in big O notation doesn’t necessarily mean that it’s implementation will perform better in your application, as the constant c can get so big that for any expected workload it will dominate the runtime (e.g. for selecting, normally a partition-based general selection algorithm is used instead of median of medians, even though the later has a better worst case performance).

freininghausPost authorYou’re certainly right! The point I’m trying to make here is just that you should never put up with an algorithm that has a bad run-time complexity if a better option (with a small constant c) is available.

aliancemdI am student. And I always see this kind of problems in my colleagues code, I tell them but their reaction is “it doesn’t matter, it doesn’t change that much” or “no it doesn’t”.

I remember reading an article, I think in MIT OpenCourseWare with this kind of optimizations, can’t find it right now(I’ll search more and if I find it i’ll post the link)…

There also were things like:

for (…) {

int something = pow(constValue1, 2) + cos(constValue2) * pow(constValue3, 2);

exampleGlobalVariable[i] = something * something;

}

into ->

const int something = pow(constValue1, 2) + cos(constValue2) * pow(constValue3, 2);

for (…) {

exampleGlobalVariable[i] = something * something;

}

and then for better optimisation something like ->

const int something = pow(constValue1, 2) + cos(constValue2) * pow(constValue3, 2);

const int somethingPow2 = something * something;

for (…) {

exampleGlobalVariable[i] = somethingPow2;

}

notzedI know it’s only a toy example, but strlen returns a size_t type (at least, in C).

I’m sure this kind of small problem crops up everywhere (i don’t mean the type, your example). I think most programmers are glad they have something that works even if they don’t particularly understand why. There is quite a lot that needs to be understood though – and for example the man page for strlen says nothing about big O performance, you’re just expected to know it for C strings.

I also think few programmers even have a good idea of how ‘fast’ a computer should even be – I got into a big argument a decade ago because someone wrote their own sort to sort items in an interactive list. It took something like 30 seconds to sort 1000 items and I was dumbfounded that they thought something a couple of orders of magnitude too slow was possibly acceptable (calling qsort was a few ms, from memory). The excuse was that qsort() wasn’t guaranteed to be stable, needless of the fact we didn’t even need a stable sort. If you don’t even know you have problems, you wont even start to look.

However there certainly seems to have been a push to ‘dumb down’ programming over the last 30 odd years; pushing decisions to the compiler or taking them away through high level languages. Whilst it’s great in a RAD environment not to care about such details, it doesn’t always result in scalable efficient ‘production quality’ code. A good example might be container libraries – some hide internal details completely which is fine until you need to know the behavioural characteristics. Others expose implementation details, which then forces the programmer to make a choice, and it is a trade-skill to make an appropriate one for a given set of trade-offs (i.e. why you’re being paid).

However, I would hope that big-O would still be in the first 1-2 lectures of any data structures & algorithms course as it’s pretty fundamental.

BTW I think your post should be titled “algorithmic efficiency”; complexity is a separate issue. Generally one wants to keep things as simple as possible, but not so simple that efficiency is affected. For many cases they both coincide – fortunately.

freininghausPost authorThanks for your comments!

BTW, I think that requiring a stable sorting is by no means an excuse for using an inefficient custum sort function (if sorting 1000 items takes 30 seconds, it was probably bubble sort-like?). There are good stable sort algorithms with O(N*log N) behavior.

But I guess that these things happen because people just don’t care. I remember that I once saw some C++ code in a free software project where the author used a self-made bubble sort function, even though using std::sort or std::stable_sort would have been much simpler.