Monthly Archives: August 2013

Some thoughts on algorithmic complexity, part 3: How to randomize an array

This is the third part of a series of blog posts. You can find the previous parts here:

Introduction

A task that many applications have to perform at some point is to randomize an array. Simple example: a card game like KPat should make sure that the order of the cards is random before the game starts. In this post, I will tell you how this can be done correctly and efficiently.

The wrong way

According to an interesting blog post that I stumbled across some time ago on Planet GNOME and another article that is linked from there, this is sometimes implemented like this (I’ve tried to translate the JavaScript version to C++):

#include <algorithm>
#include <ctime>
#include <cstdlib>
#include <vector>

bool randomLessThan(int a, int b)
{
    return (rand() % 2) == 1;
}

void randomize(std::vector<int>& v)
{
    std::srand(std::time(0));
    std::sort(v.begin(), v.end(), randomLessThan);
}

Anyone who knows a bit about how sorting algorithms work might expect that this will go terribly wrong. If you don’t, follow the links above and keep in mind that most sorting algorithms expect that the “lessThan” function or the operator “<” that is used for comparing items yields self-consistent results, i.e., that a<b and b<c implies a<c, for example.

Note that the C++ code above may cause even worse trouble than the JavaScript version. I found out about that only when this post was mostly ready to be published, and I thought that testing the code to find out if I made any stupid mistakes might be a good idea. The results were quite surprising. I started to write a new section that explains what happens and why, but then this post got a bit too long for my taste, and this is about a different topic anyway (“undefined behavior” is the keyword). So I copied that part to a new post which I will publish soon.

The slow way

I became curious and investigated how KDE applications randomize arrays. We have a class KRandomSequence with a member function that looked like this:

template<typename T> void randomize(QList<T>& list) {
    if (!list.isEmpty()) {
        QList<T> l;
        l.append(list.takeFirst());
        while (list.count())
            l.insert(int(getLong(l.count()+1)), list.takeFirst());
        list = l;
    }
}

Note that the function

unsigned long KRandomSequence::getLong(unsigned long max)

returns a random number x in the range 0 <= x < max.

Fortunately, this algorithm does not suffer from the problems of the sorting-based implementation, such that the result is a truly randomized QList. But if you read my last post in this series, you might have noticed the problem already. Inside the loop, the function inserts an item into a QList at a random position. Therefore, the function has O(N^2) complexity. My investigations with a simple benchmark showed that a QList with 100,000 ints is enough to keep my CPU busy for almost 4 seconds.

KRandomSequence:randomize(QList<T>&) is used by various games. Maybe the arrays that they randomize are never so big that the delay becomes noticeable. However, also the Plasma picture frame uses that function, and considering that I sometimes see Dolphin bug reports and forum posts from users who have tens or even hundreds of thousands of pictures in one directory, I would be surprised if nobody ever tried to set up a random slide show with 100,000 or more images. Probably the poor users thought that the computer was doing something useful, like examining the image files on disk, while they were waiting.

The right way

The Fisher-Yates shuffle algorithm, invented in 1938, randomizes an array in O(N) steps. For kdelibs 4.11, I have replaced the previous version of KRandomSequence::randomize() with an implementation based on this algorithm. It looks like this:

template<typename T> void randomize(QList<T>& list) {
    // Fisher-Yates algorithm
    for (int index = list.count() - 1; index > 0; --index) {
        const int swapIndex = getLong(index + 1);
        qSwap(list[index], list[swapIndex]);
    }
}

Randomizing a QList with 100,000 items only takes 10 milliseconds on my machine now 🙂

Conclusion

If you are trying to implement a solution for a non-trivial problem, always do some research and look for existing solutions! If you don’t, you might end up reinventing the wheel and making mistakes that cause a lot of trouble. Also consider checking if a library like the C++ STL already has an implementation that you can use.