Some thoughts on algorithmic complexity, part 3: How to randomize an array

This is the third part of a series of blog posts. You can find the previous parts here:

Introduction

A task that many applications have to perform at some point is to randomize an array. Simple example: a card game like KPat should make sure that the order of the cards is random before the game starts. In this post, I will tell you how this can be done correctly and efficiently.

The wrong way

According to an interesting blog post that I stumbled across some time ago on Planet GNOME and another article that is linked from there, this is sometimes implemented like this (I’ve tried to translate the JavaScript version to C++):

#include <algorithm>
#include <ctime>
#include <cstdlib>
#include <vector>

bool randomLessThan(int a, int b)
{
    return (rand() % 2) == 1;
}

void randomize(std::vector<int>& v)
{
    std::srand(std::time(0));
    std::sort(v.begin(), v.end(), randomLessThan);
}

Anyone who knows a bit about how sorting algorithms work might expect that this will go terribly wrong. If you don’t, follow the links above and keep in mind that most sorting algorithms expect that the “lessThan” function or the operator “<” that is used for comparing items yields self-consistent results, i.e., that a<b and b<c implies a<c, for example.

Note that the C++ code above may cause even worse trouble than the JavaScript version. I found out about that only when this post was mostly ready to be published, and I thought that testing the code to find out if I made any stupid mistakes might be a good idea. The results were quite surprising. I started to write a new section that explains what happens and why, but then this post got a bit too long for my taste, and this is about a different topic anyway (“undefined behavior” is the keyword). So I copied that part to a new post which I will publish soon.

The slow way

I became curious and investigated how KDE applications randomize arrays. We have a class KRandomSequence with a member function that looked like this:

template<typename T> void randomize(QList<T>& list) {
    if (!list.isEmpty()) {
        QList<T> l;
        l.append(list.takeFirst());
        while (list.count())
            l.insert(int(getLong(l.count()+1)), list.takeFirst());
        list = l;
    }
}

Note that the function

unsigned long KRandomSequence::getLong(unsigned long max)

returns a random number x in the range 0 <= x < max.

Fortunately, this algorithm does not suffer from the problems of the sorting-based implementation, such that the result is a truly randomized QList. But if you read my last post in this series, you might have noticed the problem already. Inside the loop, the function inserts an item into a QList at a random position. Therefore, the function has O(N^2) complexity. My investigations with a simple benchmark showed that a QList with 100,000 ints is enough to keep my CPU busy for almost 4 seconds.

KRandomSequence:randomize(QList<T>&) is used by various games. Maybe the arrays that they randomize are never so big that the delay becomes noticeable. However, also the Plasma picture frame uses that function, and considering that I sometimes see Dolphin bug reports and forum posts from users who have tens or even hundreds of thousands of pictures in one directory, I would be surprised if nobody ever tried to set up a random slide show with 100,000 or more images. Probably the poor users thought that the computer was doing something useful, like examining the image files on disk, while they were waiting.

The right way

The Fisher-Yates shuffle algorithm, invented in 1938, randomizes an array in O(N) steps. For kdelibs 4.11, I have replaced the previous version of KRandomSequence::randomize() with an implementation based on this algorithm. It looks like this:

template<typename T> void randomize(QList<T>& list) {
    // Fisher-Yates algorithm
    for (int index = list.count() - 1; index > 0; --index) {
        const int swapIndex = getLong(index + 1);
        qSwap(list[index], list[swapIndex]);
    }
}

Randomizing a QList with 100,000 items only takes 10 milliseconds on my machine now :-)

Conclusion

If you are trying to implement a solution for a non-trivial problem, always do some research and look for existing solutions! If you don’t, you might end up reinventing the wheel and making mistakes that cause a lot of trouble. Also consider checking if a library like the C++ STL already has an implementation that you can use.

About these ads

8 thoughts on “Some thoughts on algorithmic complexity, part 3: How to randomize an array

  1. Cool stuff. Maybe you could make a patch that implements a qRandomShuffle for any Qt container? ;)

    • As I wrote in the conclusion, the STL already has such a function which works for any Qt or non-Qt container (I only found out about this after my kdelibs patch though – I had thought that C++11 is required for this, but it seems that this is not the case):

      std::random_shuffle(container.begin(), container.end());

      Traditionally, Qt has reimplemented many of the STL algorihms with a Qt-ish API, but I think that these days are over, see, e.g., http://www.kdab.com/kdab-contributions-to-qt-5-0-part-4/, especially the comments section.

  2. Nice post and inspiring indeed. There are times where in shuffle can take hell lot of time for long list. Now this will make me look into PMC(Plasma-mediacenter) where I had used a shuffling mechanism for playlist.

    • Thanks for your comment! I just had a quick look at

      plasma-mediacenter/libs/mediacenter/playlistmodel.cpp,

      and it looks indeed like

      void PlaylistModel::shuffle()

      has the very same problem (note that QList::takeAt(int) is O(N), and thus your complete function O(N^2)). Replace it with std::random_shuffle or KRandomSequence::randomize (provided you use kdelibs >= 4.11), and I bet that shuffling even a long playlist will be done in no time at all :-)

  3. If getLong returns a number x in the range 0 <= x < max, than by requesting getLong(index + 1) isn't there a chance that you will actually swap the same element? x can be < index + 1, which means it could be x = index.

    • Yes, there is a chance that it will swap an element with itself. If that wasn’t the case, we would not get all permutations of the original sequence with equal probability.

      Simple example:

      Consider an array with two elements: [1, 2]. If you look at the code, you’ll see that the code inside the loop is executed only once in that case. Now we either swap 2 with 1 or with itself -> we’ll get both [2, 1] and [1, 2] with a probability of 50% each.

      If we never swapped an element with itself, we would always have to swap 2 with 1 -> the “shuffled” array would always be [2, 1]. I hope you agree that this is not really “random” ;-)

  4. Thank you for this series of blog posts. I really enjoyed reading them, and I’ll definitely pay better attention to stuff like that.

Comments are closed.