Since I took over Dolphin’s maintainership almost a year and a half ago, we have not only fixed lots of bugs and improved many details which make Dolphin more pleasant to use, but also made quite a few changes under the hood to make Dolphin faster and less memory-hungry. The other day, I compiled a few different Dolphin versions and ran some benchmarks to see how much has changed during the past 18 months.
The most straightforward way to get an idea of an application’s performance is to see how it can handle a large amount of data, so I created 100,000 empty files in a new directory using
I then built the tags v4.9.0, v.4.10.0, v.4.11.0, and the KDE/4.12 branch (to be precise, commit f863968b7) of the kde-baseapps repository in release mode, and checked how long it takes to open the directory with my rather old computer (Athlon 64 X2 5000+ CPU) in all view modes: Icons, Compact, and Details. For all my tests, I used the same recent KDE/4.12 checkout of kdelibs (commit 3028cc4b6). I opened the directory in Dolphin once and then closed the application before starting the measurements, to ensure that the files are cached by the kernel, and that slow disk reads with non-reproducible access times do not add noise to the results.
The time required to load the directory is measured and written to the shell that Dolphin is run from with a simple patch. After the initial loading, I pressed F5 eight times and calculated the mean value of all results to reduce random fluctuations. Here are the results:
Moreover, I measured with KSysGuard how much memory the Dolphin process uses. The memory usage fluctuates less than the time which is required for loading the directory, but still, I started Dolphin five times and calculated the mean value.
I hope that you like these results as much as I do! All Dolphin users should see similar relative savings, but the absolute values might obviously be different. E.g., the memory usage will be much lower on a 32-bit system because all pointers need only 4 bytes instead of 8.
I will tell you in a minute how these savings were achieved, but let me first make some remarks, which will also answer the questions that some readers might already have.
- The timer whose measurements the first plot is based on is started just before the KDirLister’s openUrl(const KUrl&) method is called, and stopped when the files are shown on the screen. In v4.9.0 and v4.10.0, the CPU and disk were still busy for quite some time after that, because Dolphin versions before 4.11.0 loaded the icons for all files in the background (some details can be found in a recent blog post). In v4.10.0, it took more than two minutes on my system until this process was completed. During these two minutes, the memory consumption grew very slowly, but constantly, until it reached the final value which is shown in the second plot. In v4.9.0, that took even much longer due to some inefficiencies which were fixed before v4.10.0 (for example this one). Therefore, it was not really possible to perform detailed memory usage measurements for v4.9.0. I just took a single data point and found that there is not a huge difference between v4.9.0 and v4.10.0.
- The reason why Details View often requires more memory than the other view modes is that the correct handling of expanded folders requires some additional data, and earlier Dolphin versions initialized the corresponding data structures even if it was not clear yet if these data would be needed or not. This is also the reason why Details View was the slowest view mode in v4.9.0 and v4.10.0.
- In Icons and Compact View, more time is spent on the item layout because the height of a row (in Icons View) or the width of a column (in Compact View) depends on the file names. In Icons View, we calculate how many lines of text will be shown under each icon, and in Compact View, we calculate the width of the text (which is slightly cheaper because no line breaks have to be taken into account). This information is needed to initialize the scroll bar correctly. We might consider using guessed values for the files far from the ones which are visible on the screen, but ensuring that this works correctly in all cases when scrolling in the view, and no subtle bugs occur when items which had their on-screen size guessed suddenly become visible, and the maximum value of the scroll bar needs to be updated, is not trivial.
Now I will summarize some of the main ideas that were used to achieve these results. Maybe some of them are also interesting for other applications and libraries.
Dolphin 2.2/KDE SC 4.10.0
- Emmanuel Pescosta speeded up sorting by modifying our Merge Sort implementation such that it makes use of multiple CPU cores.
- He also modified a function that is responsible for loading icons: all recently loaded icons are now cached, which reduces the CPU and memory usage if there are many items shown on the screen which share the same icon.
- We no longer load icons and previews for all files in the current directory.
- The amount of data which are stored in advance for the “expanded folders” bookkeeping in Details View was reduced.
- We ensure that all QByteArrays with equal contents share the same memory (for details, see my recent post about Dolphin 4.11).
- We use QVector, rather than the wasteful QList, for storing items which need more space than a pointer. In that case, QList would not store the items directly, but allocate space for them on the heap, and store pointers to them. Therefore, there is an overhead of one pointer for each item, plus any overhead that the memory allocator adds.
- Some functions which add or remove items from the view have been made faster, see the Dolphin 4.11 post for details.
Dolphin 4.12.0 (not yet released)
- The view receives all data for each item from the class KFileItemModel in a QHash<QByteArray, QVariant>. The QByteArray key is something like “text”, “size”, “iconName”, “iconPixmap”, etc., and QVariant is a very flexible data type that can wrap, e.g., a QString or a QPixmap. In earlier Dolphin versions, KFileItemModel initialized this QHash for every item in the current directory before anything was shown in the view. This took quite a bit of time and memory – there is a Qt Quarterly article which visualizes the overhead that QHash adds to the actual data that it stores. In most cases, we now initialize this QHash lazily, i.e., only just before the item that it belongs to is really shown on the screen. The corresponding patch could only work because some code that relied on the QHash always being initialized had also been modified (1, 2, 3).
- Even if we have to initialize the QHash for an item, we do not store “default” values any more. One example: the QHash stores, among other things, the information if the item is “expandable”, i.e., if it is a non-empty folder that can be expanded in Details View. However, if we have a
QHash<QByteArray, QVariant> hash
that does not contain the key “isExpandable”, then
will return false. Therefore, it is not necessary to store the value “false” at all.
Similarly, storing other kinds of data in the QHash which are equal to the default values returned by QHash::value(key) is not necessary.
- Sorting the files “naturally”, i.e., sorting them such that the order in the view is
file1.txt file2.txt ... file9.txt file10.txt
file1.txt file10.txt file2.txt ...
is quite time-consuming because the string comparison function
KStringHandler::naturalCompare(const QString &a, const QString &b)
essentially has to chop the two strings a and b into little “number” and “text” chunks, which are then compared in a locale-aware way.
Qt 5 provides a class QCollator (public API since Qt 5.2 thanks to the awesome work of Aleix Pol), which can be used to pre-compute a “sort key” for every QString in advance. Comparing two “sort keys” is much faster than comparing the QStrings in a natural and locale-aware fashion. This means that O(N log(N)) expensive comparisons are replaced by O(N) “sort key” calculations, which might be expensive, and O(N log(N)) cheap comparisons.
However, it turned out that we can improve the sorting performance even while we are still using Qt 4.8. The key observations are:
- Sorting the items with QString’s operator “<” as the comparison function is very fast.
- After this fast sorting, the order of the items is not totally correct, but most of the time, the order is quite close to the correct one.
- The number of comparisons that our Merge Sort implementation needs is greatly reduced if the order of the items is “mostly correct”.
Therefore, we now sort the items twice: first using QString’s operator “<”, and then using KStringHandler::naturalCompare. This simple change reduced the time required for sorting by 63% in my tests.
Some readers will probably have noted that there is a sorting algorithm which is designed to make use of input data which are “mostly sorted” already, and which might therefore improve the performance even further: Timsort, which is used, e.g, as the default sort algorithm in Python (1, 2). Maybe we will use it in future Dolphin releases.
- In the class KItemListViewLayouter, which is responsible for the layout of the items in the view, we calculated and stored some redundant data for all files in the current directory: both the size of an item (i.e., how much space the icon and text need on the screen, that is 2 numbers for width and height), and the rectangle that contains the item (storing a rectangle requires 4 numbers). Before an item was shown on the screen, the rectangle was adjusted according to the current value of the scroll bar(s). Now we only store the size and the position of the top-left corner of the rectangle. The final rectangle can easily be constructed from these once it is necessary.
I still have some ideas for improvements inside Dolphin, but as Mark Gaiser pointed out in his comment to a review request, the major remaining consumer of huge amounts of memory is the class UDSEntry from kdelibs. I assume that few people know this class, but it’s being used indirectly by every application that uses the KIO framework to access directory contents either directly or indirectly (e.g., via the “Open/Save File” dialog).
UDSEntry is used to transmit all kinds of information about files from kioslaves (which are the helper applications that read files from disk, or access remote files via sftp, smb, etc.) to the applications. A first optimization which is already in the master branch of kdelibs is to make use of Milian Wolff’s implicit sharing trick once more. The idea is that the user name and the group are very often the same for many files in a directory. Therefore, the function that receives the UDSEntries from the kioslave now checks if, e.g., the user name for a file is the same as the one for the previous file. If that is the case, the same implicitly shared QString is used for both. If all files in a directory belong to the same user, the same implicitly shared QString will be used for all.
In the benchmarks I discussed at the beginning of this post, there were 100,000 copies of my user name, and 100,000 copies of the group name “users” in memory – it should be obvious that this is not really efficient.
I will discuss further possibilities to optimize UDSEntry in a future blog post. For the time being, here are the raw data for the plots above, including the benchmark results for the master branches of kde-baseapps and kdelibs, which include the first UDSEntry optimization:
I hope that I made clear what the key ideas are, and that these might be useful also for some readers of this post:
- Do not determine large amounts of data in advance if it is likely that most of these data will never be needed.
- Do not store redundant data: think twice if everything that is stored in memory is really needed, or if there is another solution that can achieve the same with less memory.
- Make use of implicit sharing: if there are many QStrings, QByteArrays, or other implicitly shared data in an application, many of which are likely to be equal, check if they can share their memory.
- If some calculation takes very long, try to find a more efficient algorithm.
Finally, I would like to thank everyone who helped to make these improvements possible. In particular, I would like to mention Mark Gaiser, who loves to stress-test file managers with half a million files and share his findings on our mailing list. Without his challenges, some of the work presented here might not have happened.