Some thoughts on algorithmic complexity, part 2: Be careful if you insert into or remove from a std::vector, QVector, or QList in a loop

This is the second part of a series of blog posts. Like in part 1, most or all of what I write here will be obvious to many readers. However, I have seen problems which are equivalent to those that I show here more than once in widely-used code, so I think it makes sense to make people aware of these issues and possible solutions.

If you are familiar with containers like std::vector already, feel free to skip the introduction.

Introduction

Different kinds of data structures, also called containers, are a core part of most applications. Depending on the programming language, various containers are provided by the language itself, by libraries which are shipped with the compiler (like the STL in the case of C++), or by libraries like Qt. Containers like vectors, linked lists, sets, hashes, etc., have quite different properties. Each container has advantages in some situations and disadvantages in others. If you want to know more about these, you might want to have a look at the relevant page in the Qt docs or the “Containers” page at cppreference.com for a general overview, or a Qt quarterly article and an excellent article by Marc Mutz for some very interesting details.

QVector, QList, and std::vector

In this post, I will focus on containers which store their elements in one contiguous block of memory. These have the following useful properties:

  • The items have a well-defined order (this is not the case, e.g., in a QSet).
  • Each item has an integer index: the item stored at the beginning of the block has the index 0, the next one has the index 1, and so on. Moreover, accessing an item with a certain index is very fast, no matter how many items there are. It takes O(1), or constant, time, in contrast to linked lists, which also have a well-defined order, but require O(N) time to access a particular element.
  • Appending items at the end also needs constant time. The containers reserve some free space at the end to make sure that appending many items one by one will not require an expensive memory reallocation every time. Removing the last item also needs constant time.

However, inserting or removing items at a random position takes O(N) time because all items after the inserted/removed item (on average, that is N/2 items) have to be moved.

Special case: QList

One important difference between QList and the other two containers is that it does not store the items themselves in the contiguous block of memory if an item requires more size than a pointer. It rather stores pointers to the actual items in the contiguous block then. Moreover, it never uses less than the size of a pointer (4 or 8 bytes on a 32-bit or 64-bit platform, respectively) for an item. These properties can be beneficial in some situations (for example, it enables QLists with different data types to share much of their code in the binary, and it makes moving large items inside the list faster), but storing many bools in a QList might waste a lot of memory.

Another difference is that QList can even add items at the beginning or remove the first item in O(1) time because it also reserves some memory at the beginning. When inserting/removing at a random position, it checks if there are less items to the left or to the right and moves the smaller part of the memory block. Therefore, only N/4 items have to be moved on average, but that is still O(N).

Removing many items can be very slow if implemented incorrectly

Let’s assume that we have stored some items of a type T in a QVector, and we have to remove all items for which a function filter(T) returns true.

Simple example: remove all odd numbers from a QVector with many numbers.

remove

The bad solution

A possible solution would be to go through all numbers and remove any odd number that we find:

template<class Container, class Filter>
void remove1(Container& c, Filter filter)
{
    auto it = c.begin();

    while (it != c.end()) {
        if (filter(*it)) {
            it = c.erase(it);
        } else {
            ++it;
        }
    }
}

To test the efficiency of the function, I wrote a small benchmark that creates a sorted QVector containing the integers between 1 and N, and then removes either the first half or the second half of these numbers, or all odd numbers. The results are shown in this plot (the y-axis shows the time in milliseconds):

plot1

It turns out that the function is quite inefficient and has O(N^2) complexity in all test cases. The reason should be clear: every time an item is removed, all items after that one are moved one position to the front. In other words, O(N) items are moved O(N) times, so we get O(N^2) behaviour. What actually happens is shown here:

removeBad

Note that we could fix the “remove many items from the end” case by going through the items in reverse order (remember that removing the last item is fast). However, this would not help much in the other cases.

Good solution 1

Rather than moving all following items one position to the front when an item is removed, we could just replace it with the last item. This guarantees that removing a single item will always cause at most one other item to be moved:

template<class Container, class Filter>
void remove2(Container& c, Filter filter)
{
    auto it = c.begin();
    auto end = c.end();

    while (it != end) {
        if (filter(*it)) {
            --end;
            *it = std::move(*end);
            c.erase(end);
        } else {
            ++it;
        }
    }
}

The new function has linear complexity for all types of test cases:

plot2

The obvious disadvantage is that the relative order of the items is not preserved. However, if the order does not matter, then this solution might be worth considering, in particular if only a few items have to be removed (note that removing a single item is always O(1) with this solution).

Good solution 2

We can achieve linear complexity and still preserve the sort order. The solution is to make sure that every item is moved at most once: we move an item to the front only when its final position is known.

removeGood

Implementing this idea is quite straightforward, but it’s not even necessary, because this is just what std::remove_if does.

template<class Container, class Filter>
void remove3(Container& c, Filter filter)
{
    const auto oldEnd = c.end();
    const auto newEnd = std::remove_if(c.begin(), oldEnd, filter);
    c.erase(newEnd, oldEnd);
}

In our test cases, this is faster than the “replace by the last item” solution by a factor of 2:

plot3

Final remarks

  • If you insert many items into an existing std::vector, QVector, or QList, similar considerations apply.
  • Such problems really do exist in the software you use, even though they might not cause any serious trouble as long as you don’t let applications handle a large data set that triggers the worst case behaviour. I’ve fixed a few such problems in Dolphin recently, and I will tell you about more problems of this kind in the next post.
  • Thanks for all the interesting feedback that I got after my last post! In particular, I would like to mention Emilio Coppa from the University of Rome here. He works on aprof, a Valgrind-based tool that can help to find inefficient functions like those that I presented here. Just run your program inside this tool and make sure that all interesting functions are fed with various amounts of input data. I ran my benchmark in aprof (I just made it consider a larger set of different vector sizes to make the output look nicer and decreased the overall size of the vectors to reduce the runtime on Valgrind’s simulated CPU). Here is the visualization of the output for the function remove1:
    aprof-remove1
    And the same for remove2:
    aprof-remove2
    Note that you don’t have to write a benchmark at all to get such results with aprof. You can just run your application normally, make sure that it handles inputs with different sizes, and then you can spot the inefficient functions quite quickly (provided that you trigger the inefficient behavior of these functions). If you want to know how the tool works, have a look at this paper. It found a bug that was basically just the strlen issue that I discussed in my last post in the word frequency counter “wf”.
  • Just to prevent misunderstandings: I’m not saying that algorithms with O(N^2) worst-case complexity are always evil. If it is guaranteed that N will never get really large or that the input data will never trigger a bad worst-case behaviour, then there may be reasons why such an algorithm might be preferable.
About these ads

2 thoughts on “Some thoughts on algorithmic complexity, part 2: Be careful if you insert into or remove from a std::vector, QVector, or QList in a loop

  1. Very nice blog post, thanks for that! Finally someone else who is not afraid of raw numbers and can make sense out of statistics :)

    And I also recently was amazed by the performance of STL. Comparing a straight forward sum implementation using range based for was afaik ~20% or so slower than using std::accumulate to do the job.

    Really, everyone should know the ins and outs of STL and use it wherever possible!

    Cheers

  2. Frank, thank you for citing aprof :)

    If any of you have any feedback/comment about our tool, feel free to contact me!

    Cheers,
    Emilio

Comments are closed.