Recent Dolphin bug fixes

Some time ago, I wrote a blog post about changes in Dolphin 4.12.0 and earlier versions that reduced the memory usage and improved the performance. These improvements were only a small part of our efforts to make Dolphin more pleasant to use, however.

Here is a summary of all bugs that were fixed in Dolphin during the past months:

Dolphin 4.12.4

  • Bug 332143: When searching files by content, and Nepomuk is disabled, search in all plain text files. Before this fix, we only searched in files whose MIME type begins with “text/”, which excludes, e.g., shell scripts. See git commit c52ba944, review request 116805.

Dolphin 4.12.3

  • Bug 330047: Restore the URLs of both views correctly when restoring a session. See git commit be29aed5, review request 115406.
  • Bug 330605: Fix the problem that the Dropbox plugin prevents the git plugin from working. See git commit 2a6a1f5a, review request 116019.
  • Bug 305694: Show the correct icon size in the tool tip for the “zoom slider” in the status bar. See git commit 885d260c, review request 111197.

Dolphin 4.12.2

  • Bug 330126: Do not show tooltips while renaming a file. See git commit 8007143f, review request 115146.
  • Bug 330001: Always enable the “Create New…” menu if the URL is writable. Before this commit, this did not work in some special cases. See git commit 48653030, review request 115405.

Dolphin 4.12.1

  • Bug 328791: When adding columns in Details View, also update items that are currently filtered. See git commit 2260d70e, review request 114266.
  • Bug 329118: If a file is renamed outside Dolphin, update it even if it is filtered at the moment. See git commit c0a85189, review request 114459.
  • Bug 294054: Disable the “Create folder” action in read-only directories. See git commit 67bb99c5, review request 114560.
  • Bug 250787: Kill any running preview jobs before starting a new one. This fixes a race condition that could make the Information Panel show an incorrect preview image in some situations. See git commit 8ed499f2, review request 114561.

Dolphin 4.12.0

  • Bug 302703: Fix layout issues in the view when switching from Details View (without expandable folders) to Icons View. See git commit 69c9100f, review request 111632.
  • Bug 288629, bug 322299, bug 322812: Do not allow that panels are dragged out of the main window. This feature was not extremely useful, but it caused some serious bugs. See git commit 5583fc63, review request 111692.
  • Bug 321577: Do not enable the “Create New…” menu when a search is finished. See git commit 8325140a, review request 111805.
  • Bug 260717: Show the full information for a file in the status bar if only one file is selected. See git commit ba56ec86, review request 111934.
  • Bug 260717: Show the full status bar information also for hovered folders. See git commit 8941745b, review request 112106.
  • Bug 318518: Count the items inside subfolders (e.g., for the “Size” column in Details View) in another thread. This can prevent a blocking of the user interface while counting many files on a slow device. See git commit 81a6f33a, review request 111920.
  • Bug 324371, bug 325359: Make the code that removes items from KFileItemModel more robust. This fixes two bugs, including a crash. See git commit 84b40da8, review request 113070.
  • Bug 325543: Make it easier to expand folders in Details View with a large icon size. See git commit 4873685e, review request 113169.
  • Bug 323181: Abort loading the current URL if the user presses Escape. See git commit 1a997903, review request 113234.
  • Bug 304363: If an expanded folder with subfolders which are also expanded is collapsed and then re-expanded in Details View, ensure that the full expanded directory tree is restored. See git commit 07f0d125, review request 113293.
  • Bug 319282: Update the Places Panel entries when switching the language. See git commit 6dd2ae4e, review request 113850.
  • Store the selected items in a more efficient way. Rather than storing all selected indexes in a QSet, which requires a lot of memory and CPU time when pressing Ctrl+A in a huge directory, we now store the beginning and length of each contiguous interval in the selection in a sorted list. See git commit 5c5d87fe, review request 113488.
  • Simplify the relationship between DolphinMainWindow and DolphinNewFileMenu. See git commit d0a9410e, review request 111989.
  • Replace a loop that resets all items in a QVector to the default value by a call to QVector::fill(). See git commit 38adcc0c, review request 112179.
  • Simplify error handling of the “Create New…” menu. See git commit dd16a11d, review request 112178.

Dolphin 4.11.5

  • Bug 328262: When canceling a rename operation because a file with the new name exists already, do not change the file name in the view. See git commit 385e5fef, review request 114228.

Dolphin 4.11.4

  • Bug 287983: Do not truncate the text in tool tips for files with very long names. See git commit 1af756f1, review request 113101.
  • Bug 327224: Fix a regression that broke opening the trash by clicking the trash widget on the desktop. See git commit 1c856e44.
  • Bug 327412: When going back by clicking the “back” mouse button in the empty space of the view, do not select any items in the previous directory if the view is scrolled down. See git commit 41ece8e9.
  • Bug 306631: Fix incorrect scrollbar spacing when using the QtCurve style. See git commit 39e7ba46, review request 113902.
  • Bug 327709: Fix incorrect geometry updates of the view after resizing the window quickly. See git commit b3322111, review request 113939.

Dolphin 4.11.3

  • Bug 325344: Remove all children of expanded subfolders when switiching from Details View to Icons View, including the children which are filtered. See git commit befa646f, review request 112962.
  • Bug 267171: Show the right version control states for expanded items. See git commit bbbfeb28, review request 112980.
  • Bug 324479: Make it possible to select file names containing Space with the keyboard serch. See git commit c802f3d2, review request 113071.
  • Bug 161385: Reload the view if a previously unmounted device is mounted again. See git commit 7f8dca1b.
  • Bug 325517: Fix crash when triggering the “Compare files” action via D-Bus. Note that the new code is much simpler than the buggy version! See git commit 42c26b15.

Dolphin 4.11.2

  • Bug 286459: Fix colors in the “Services” section of the settings dialog. See git commit 91a2e523, review request 112483.
  • Bug 296970: Fix unwanted interactions between split views when searching. See git commit 576481d1, review request 112534.
  • Bug 311099: Scroll the view to the bottom when pressing “Page down” repeatedly. Before this fix, the scrolling stopped a few pixels above the bottom. See git commit 6566f757, review request 112678.
  • Bug 324713: If the view is sorted by “Size”, and there are some items with the same size, then these are sorted by their names. Ensure that the sort order is updated if one of these files is renamed outside Dolphin. See git commit be391bda, review request 112561.
  • Remove useless “Copy text” action from the status bar context menu. See git commit 4c17ce2c, review request 112355.
  • Bug 322093: Make preview loading faster when scrolling. See git commit bf2a0d69, review request 112580.

Dolphin 4.11.1

  • Bug 323248: Fix possible crash when disabling “Show in groups”. See git commit 292e11fc, review request 111919.
  • Bug 310662, bug 314339: Fix slow scrolling when hidden files or symbolic links are shown. See git commit 381b1796, review request 111956.
  • Bug 323518: Make sure that the sort order is correct after renaming – similar to bug 324713, which was fixed later in Dolphin 4.11.2, but for the case that the file is renamed in Dolphin. See git commit 6b375d2e, review request 111721.
  • Bug 314544: Fix crash when failing to get block device for audio CD. See git commit ae81a800, review request 112117.
  • Bug 323789: Prevent repeated expensive resortings if many files are renamed at the same time. See git commit 9cbca724, review request 111195.
  • Bug 322969: Fix possible crash after renaming files. See git commit 85f29746, review request 111988.
  • Bug 321710: Show the mime type “Folder” in the view also for subfolders which have not been accessed yet on disk. See git commit ab8ee1a6, review request 111830.
  • Bug 310412: Adjust the size and position of the selection toggle if the icon size is changed. See git commit f3ca9435, review request 112250.
  • Bug 304558, bug 321882: Fix filename trucation issues in Icons View if a maximum number of lines is set: sometimes file names were truncated too early, and sometimes, it was not indicated that a file name is truncated. See git commit 82d42b8d, review request 112265.
  • Bug 323946: When pressing the left or right arrow keys while a part of a file name which is being renamed is selected, move the cursor to the beginning or the end of the selection, respectively. See git commit d5521168, review request 112256.

Thanks!

These improvements were made possible by

  • Emmanuel Pescosta, who worked on an impressive number of bugs,
  • Christoph Feck, who not only handles a large number of incoming bug reports, but also contributed quite a few patches,
  • everyone who fixed bugs, provided advice, or tested patches: Albert Astals Cid, Alex Levkovich, Burkhard Lück, David Rosca, Grigoriadis Grigoris, Kai Uwe Broulik, Mark Gaiser, Phil Schaf, Wolfgang Bauer, and Yichao Yu,
  • sysadmins, packagers, translators, and many others who help to improve our software and get it to our users.

A brief history of Dolphin’s performance and memory usage

Since I took over Dolphin’s maintainership almost a year and a half ago, we have not only fixed lots of bugs and improved many details which make Dolphin more pleasant to use, but also made quite a few changes under the hood to make Dolphin faster and less memory-hungry. The other day, I compiled a few different Dolphin versions and ran some benchmarks to see how much has changed during the past 18 months.

The most straightforward way to get an idea of an application’s performance is to see how it can handle a large amount of data, so I created 100,000 empty files in a new directory using

touch a{1..100000}.txt

I then built the tags v4.9.0, v.4.10.0, v.4.11.0, and the KDE/4.12 branch (to be precise, commit f863968b7) of the kde-baseapps repository in release mode, and checked how long it takes to open the directory with my rather old computer (Athlon 64 X2 5000+ CPU) in all view modes: Icons, Compact, and Details. For all my tests, I used the same recent KDE/4.12 checkout of kdelibs (commit 3028cc4b6). I opened the directory in Dolphin once and then closed the application before starting the measurements, to ensure that the files are cached by the kernel, and that slow disk reads with non-reproducible access times do not add noise to the results.

The time required to load the directory is measured and written to the shell that Dolphin is run from with a simple patch. After the initial loading, I pressed F5 eight times and calculated the mean value of all results to reduce random fluctuations. Here are the results:

performanceMoreover, I measured with KSysGuard how much memory the Dolphin process uses. The memory usage fluctuates less than the time which is required for loading the directory, but still, I started Dolphin five times and calculated the mean value.

memoryI hope that you like these results as much as I do! All Dolphin users should see similar relative savings, but the absolute values might obviously be different. E.g., the memory usage will be much lower on a 32-bit system because all pointers need only 4 bytes instead of 8.

I will tell you in a minute how these savings were achieved, but let me first make some remarks, which will also answer the questions that some readers might already have.

  • The timer whose measurements the first plot is based on is started just before the KDirLister’s openUrl(const KUrl&) method is called, and stopped when the files are shown on the screen. In v4.9.0 and v4.10.0, the CPU and disk were still busy for quite some time after that, because Dolphin versions before 4.11.0 loaded the icons for all files in the background (some details can be found in a recent blog post). In v4.10.0, it took more than two minutes on my system until this process was completed. During these two minutes, the memory consumption grew very slowly, but constantly, until it reached the final value which is shown in the second plot. In v4.9.0, that took even much longer due to some inefficiencies which were fixed before v4.10.0 (for example this one). Therefore, it was not really possible to perform detailed memory usage measurements for v4.9.0. I just took a single data point and found that there is not a huge difference between v4.9.0 and v4.10.0.
  • The reason why Details View often requires more memory than the other view modes is that the correct handling of expanded folders requires some additional data, and earlier Dolphin versions initialized the corresponding data structures even if it was not clear yet if these data would be needed or not. This is also the reason why Details View was the slowest view mode in v4.9.0 and v4.10.0.
  • In Icons and Compact View, more time is spent on the item layout because the height of a row (in Icons View) or the width of a column (in Compact View) depends on the file names. In Icons View, we calculate how many lines of text will be shown under each icon, and in Compact View, we calculate the width of the text (which is slightly cheaper because no line breaks have to be taken into account). This information is needed to initialize the scroll bar correctly. We might consider using guessed values for the files far from the ones which are visible on the screen, but ensuring that this works correctly in all cases when scrolling in the view, and no subtle bugs occur when items which had their on-screen size guessed suddenly become visible, and the maximum value of the scroll bar needs to be updated, is not trivial.

Now I will summarize some of the main ideas that were used to achieve these results. Maybe some of them are also interesting for other applications and libraries.

Dolphin 2.2/KDE SC 4.10.0

  • Emmanuel Pescosta speeded up sorting by modifying our Merge Sort implementation such that it makes use of multiple CPU cores.
  • He also modified a function that is responsible for loading icons: all recently loaded icons are now cached, which reduces the CPU and memory usage if there are many items shown on the screen which share the same icon.

Dolphin 4.11.0

Dolphin 4.12.0 (not yet released)

  • The view receives all data for each item from the class KFileItemModel in a QHash<QByteArray, QVariant>. The QByteArray key is something like “text”, “size”, “iconName”, “iconPixmap”, etc., and QVariant is a very flexible data type that can wrap, e.g., a QString or a QPixmap. In earlier Dolphin versions, KFileItemModel initialized this QHash for every item in the current directory before anything was shown in the view. This took quite a bit of time and memory – there is a Qt Quarterly article which visualizes the overhead that QHash adds to the actual data that it stores. In most cases, we now initialize this QHash lazily, i.e., only just before the item that it belongs to is really shown on the screen. The corresponding patch could only work because some code that relied on the QHash always being initialized had also been modified (1, 2, 3).
  • Even if we have to initialize the QHash for an item, we do not store “default” values any more. One example: the QHash stores, among other things, the information if the item is “expandable”, i.e., if it is a non-empty folder that can be expanded in Details View. However, if we have a
    QHash<QByteArray, QVariant> hash

    that does not contain the key “isExpandable”, then

    hash.value("isExpandable").toBool()

    will return false. Therefore, it is not necessary to store the value “false” at all.

    Similarly, storing other kinds of data in the QHash which are equal to the default values returned by QHash::value(key) is not necessary.

  • Sorting the files “naturally”, i.e., sorting them such that the order in the view is
    file1.txt
    file2.txt
    ...
    file9.txt
    file10.txt

    rather than

    file1.txt
    file10.txt
    file2.txt
    ...

    is quite time-consuming because the string comparison function

    KStringHandler::naturalCompare(const QString &a, const QString &b)

    essentially has to chop the two strings a and b into little “number” and “text” chunks, which are then compared in a locale-aware way.

    Qt 5 provides a class QCollator (public API since Qt 5.2 thanks to the awesome work of Aleix Pol), which can be used to pre-compute a “sort key” for every QString in advance. Comparing two “sort keys” is much faster than comparing the QStrings in a natural and locale-aware fashion. This means that O(N log(N)) expensive comparisons are replaced by O(N) “sort key” calculations, which might be expensive, and O(N log(N)) cheap comparisons.

    However, it turned out that we can improve the sorting performance even while we are still using Qt 4.8. The key observations are:

    1. Sorting the items with QString’s operator “<” as the comparison function is very fast.
    2. After this fast sorting, the order of the items is not totally correct, but most of the time, the order is quite close to the correct one.
    3. The number of comparisons that our Merge Sort implementation needs is greatly reduced if the order of the items is “mostly correct”.

    Therefore, we now sort the items twice: first using QString’s operator “<”, and then using KStringHandler::naturalCompare. This simple change reduced the time required for sorting by 63% in my tests.

    Some readers will probably have noted that there is a sorting algorithm which is designed to make use of input data which are “mostly sorted” already, and which might therefore improve the performance even further: Timsort, which is used, e.g, as the default sort algorithm in Python (1, 2). Maybe we will use it in future Dolphin releases.

  • In the class KItemListViewLayouter, which is responsible for the layout of the items in the view, we calculated and stored some redundant data for all files in the current directory: both the size of an item (i.e., how much space the icon and text need on the screen, that is 2 numbers for width and height), and the rectangle that contains the item (storing a rectangle requires 4 numbers). Before an item was shown on the screen, the rectangle was adjusted according to the current value of the scroll bar(s). Now we only store the size and the position of the top-left corner of the rectangle. The final rectangle can easily be constructed from these once it is necessary.

Outlook

I still have some ideas for improvements inside Dolphin, but as Mark Gaiser pointed out in his comment to a review request, the major remaining consumer of huge amounts of memory is the class UDSEntry from kdelibs. I assume that few people know this class, but it’s being used indirectly by every application that uses the KIO framework to access directory contents either directly or indirectly (e.g., via the “Open/Save File” dialog).

UDSEntry is used to transmit all kinds of information about files from kioslaves (which are the helper applications that read files from disk, or access remote files via sftp, smb, etc.) to the applications. A first optimization which is already in the master branch of kdelibs is to make use of Milian Wolff’s implicit sharing trick once more. The idea is that the user name and the group are very often the same for many files in a directory. Therefore, the function that receives the UDSEntries from the kioslave now checks if, e.g., the user name for a file is the same as the one for the previous file. If that is the case, the same implicitly shared QString is used for both. If all files in a directory belong to the same user, the same implicitly shared QString will be used for all.

In the benchmarks I discussed at the beginning of this post, there were 100,000 copies of my user name, and 100,000 copies of the group name “users” in memory – it should be obvious that this is not really efficient.

I will discuss further possibilities to optimize UDSEntry in a future blog post. For the time being, here are the raw data for the plots above, including the benchmark results for the master branches of kde-baseapps and kdelibs, which include the first UDSEntry optimization:results

Conclusion

I hope that I made clear what the key ideas are, and that these might be useful also for some readers of this post:

  • Do not determine large amounts of data in advance if it is likely that most of these data will never be needed.
  • Do not store redundant data: think twice if everything that is stored in memory is really needed, or if there is another solution that can achieve the same with less memory.
  • Make use of implicit sharing: if there are many QStrings, QByteArrays, or other implicitly shared data in an application, many of which are likely to be equal, check if they can share their memory.
  • If some calculation takes very long, try to find a more efficient algorithm.

Finally, I would like to thank everyone who helped to make these improvements possible. In particular, I would like to mention Mark Gaiser, who loves to stress-test file managers with half a million files and share his findings on our mailing list. Without his challenges, some of the work presented here might not have happened.

Posted in KDE

Dolphin 4.11

Dolphin 4.11.0 has been released some time ago, and since we are approaching the 4.12 release already, I guess it’s about time that I summarize the most important changes.Dolphin-4-11

The screenshot shows a new feature which can come in handy in many situations: the “Filter Bar”, which you can enable, e.g., by pressing Control+I, can now be locked, such that the filter string remains unchanged when visiting different folders. Thanks to Stuart Citrin for this contribution!

Here is a summary of the other changes which make Dolphin more pleasant to use:

“Tree View” improvements

Emmanuel Pescosta, who has contributed a lot of bug fixes to recent Dolphin releases, greatly improved the handling of expanded folders in Details View. The code is much more robust now, and lots of bugs are fixed: Bug 304565, bug 311912, bug 312890, bug 315593, bug 306219, bug 320335.

Reduced memory usage

I already told you some time ago that we do not load icons and previews for all files in the current folder in advance any more. A nice side effect of this change is that the memory which would be needed to store this information is now saved.

There are a few more changes in Dolphin 4.11 which help to reduce the memory consumption:

  • For each file or folder, all relevant information is stored in a QHash<QByteArray, QVariant>. The QByteArray keys are, e.g., “text”, “date”, “size”, “iconName”, etc. In previous Dolphin releases, all these keys were stored N times in memory for a folder with N items. Since Dolphin 4.11.0, we make use of the fact that QByteArray is one of Qt’s implicitly shared classes, and let all QByteArrays with equal contents use the same memory location for their data. The motivation for this change, which can reduce the overall memory consumption by about 10%, was a blog post by Milian Wolff. See git commit 7e532911, review request 110686.
  • In a few cases, we used a QList to store items of a data type which requires more size than a pointer. In that case, QList does not store the items directly, but allocates space for them on the heap, and stores pointers to them. Therefore, there is an overhead of one pointer for each item, plus any overhead that the memory allocator adds.
    Simply replacing QList by QVector therefore saves a lot of memory in these cases. See git commit 8aea59bb, review request 111304.

How much the memory consumption decreases when upgrading to Dolphin 4.12 depends on the number of files in the current folder, the view mode, and some other factors. For large folders, it is not uncommon to see a 30% reduction.

Since allocating a lot of memory in many small chunks is a rather expensive operation, these changes also improve Dolphin’s performance.

Performance improvements

Some time ago, I wrote a post about the O(N²) run time behavior that can occur when using a container such as QList, QVector, or std::vector, and adding or removing many items in an inefficient way. Dolphin also had a few functions with quadratic worst-case complexity, which could be triggered by creating, deleting, or filtering many files with a particular pattern. These functions were rewritten such that they have O(N) worst-case complexity, and are even a bit faster in all cases:

  • KFileItemModel::removeItems(). See git commit ff3267c6, review request 108540.
  • KFileItemModel::insertItems(). See git commit fba053e9, review request 110355.
  • KItemListView::slotItemsRemoved(). See git commit 6c157fcb, review request 111398.

There are a few more changes which can save many CPU cycles in some situations:

  • Bug 299565: Avoid an unnecessary resorting when items are changed. See git commit d70a4811, review request 111146.
  • When switching from, e.g., “Sort by Name” to “Sort by Date”, or changing the sort order, we accidentally converted all URLs to a QString and then back to a KUrl, which was quite expensive. This is now fixed. See git commit bf2618d7, review request 111700.
  • Moreover, the performance is also improved by not loading icons and previews for all files. In earlier Dolphin versions, one could notice that the CPU and disk remain busy for a very long time after entering a large directory.

Preparations for porting to KDE Frameworks

There is no KDE Framworks port of Dolphin yet, but a few changes were made to make the porting easier:

  • Remove use of KIO::SlaveConfig. See git commit e45d4310, review request 107421.
  • Handle Shift key presses in the context menu without KModifierKeyInfo. See git commit 58fc982e, review request 110303.

Bug fixes in the view engine

  • Bug 301800: Fix “truncated header” in Details View with non-Oxygen styles. See git commit 25f208eb, review request 111608.
  • Bug 322212: Fix incorrect scroll position after changing the URL. See git commit 02e41237, review request 111557.
  • Bug 302373: Prevent that removing items can cause icons to overlap. See git commit 6e1a8774, review request 111630.
  • Bug 319951: Prevent that items may disappear from Details View after deleting other items. See git commit ba2c5c71, review request 111486.

Improvements concerning the “Move to Trash”/”Delete” actions in the context menu

Handling these actions is less trivial than it seems – note that pressing “Shift” while Dolphin’s context menu is open transforms the “Move to Trash” action into “Delete” (at least if the latter is not enabled permanently in the settings dialog). The behavior of the context menu of the DolphinPart, which can be used in Konqueror, was different before KDE SC 4.11, but both implementations still shared some of the code for these actions, which made the code a bit buggy and the maintenance complicated.

This is only one of the things concerning “Move to Trash” and “Delete” which was improved greatly in Dolphin 4.11:

  • Factored out the “Delete/Move To Trash” action into its own class. See git commit 8e023ae9, review request 108802.
  • Bug 307254: Do not delete files if the user clicks “Move to Trash” in the context menu while Shift is pressed. See git commit fda03a10, review request 107509.
  • Bug 294013bug 297510: Disable “Move to Trash” and “Delete” inside archives, which are read-only from Dolphin’s point of view. See git commit 4cd23183, review request 111160.
  • Bug 261762: Do not show the “Move To Trash” action in context menu for remote URLs. See git commit e688a52e, review request 111206.
  • Make it clear that the “Trash/Delete” confirmation applies to all KDE applications. See git commit ff712cbb, review request 111324.

Usability improvements

  • Bug 157593: Switch location bar back to breadcrumb mode when pressing Enter if “Editable location bar” is not enabled in the settings. See git commit e74dfee5, review request 107748.
  • Bug 297140: Return the focus from the filter bar to the view when pressing Enter. See git commit 2f011473, review request 109020.
  • Bug 301276: Move “Bluetooth” to the “Devices” section of the Places Panel. See git commit 05fb8186, review request 109622.
  • Bug 315722: Scroll the view to a new file after it has been pasted. See git commit e92b4ba2, review request 109950.
  • Bug 312872: If the URL changes, hide the widget that shows error messages. See git commit 5324fcd3, review request 110369.
  • Bug 310049: Make it easier to drag and drop items in the Places Panel. See git commit 6bcc70a1, review request 110342.
  • Bug 304775: Remove the confusing “decoration icon” in the KMessageWidget. See git commit c8264896, review request 110327.
  • Bug 312296: Allow to open multiple selected folders in new tabs. See git commit 05d9210e, review request 110371.
  • Do not allow drops on the “Search”/”Recent” entries in the Places Panel. See git commit 27553914, review request 110348.
  • Do not allow creating entries for files in the Places Panel. See git commit 2b700c50, review request 110347.
  • Bug 196035: Open archives in a new tab on middle click. See git commit afcf8961, review request 110487.
  • Focus the view where items are dropped. See git commit 08eae43a, review request 110167.
  • Bug 319373: Do not use a fixed width for the zoom slider and the “free space” widget in the status bar. See git commit 3e172053, review request 110966.
  • Use the Nepomuk2::FileMetaDataConfigWidget for configuring what kind of data is shown in the Information Panel. See git commit 626f4cb6, review request 111294.

Bug fixes

  • Bug 233335: Select the correct items if pasted/dropped files have to be renamed because of a name conflict. See git commit fd65a97b, review request 107351.
  • Bug 181337: Make grouping by “Date” more consistent. See git commit d7845775, review request 108667.
  • Make the handling of the view’s palette more robust. See git commit 71be3439, review request 110505.
  • Bug 318442: Fix searching in hidden folders. See git commit 85ea7bda, review request 110697.
  • Bug 319119: Fix incorrect item name in the view if a “Rename” operation fails. See git commit 0d0b9583, review request 110922.
  • Make sure that the “drop” indicator is hidden if an item is not hovered any more. See git commit 92854f90, review request 111037.
  • Bug 321286: Make it possible to select files like “a_b” using keyboard search. See git commit e401d892, review request 111102.
  • Bug 321299: Do not try to connect to Nepomuk if it is not running. See git commit 44791e7f.
  • Bug 299646: Enable KIO error reporting when using non-inline renaming. See git commit cafbdb59, review request 111111.
  • Bug 318534: Make sure that changing the view works even if the settings file is not writable. See git commit 4e80918a, review request 111120.
  • Bug 321359: Handle music files with more than one “Artist” correctly. See git commit 4b0498a4.
  • Bug 192139, bug 256338, bug 293220, bug 309076: Fix crash when dropping URLs on the URL navigator’s drop down menus. See git commit 669ee325, review request 111273.
  • Bug 322348: Make sure that removed tags are also removed in the view. See git commit 5b81abea, review request 111505.
  • Bug 283475, bug 318217: Don’t try to open .desktop files with http/https URLs inside Dolphin,. See git commit f14352f1, review request 111674.
  • Bug 321778: Escape HTML-like file names in the status bar, which prevents a freeze of the application if there is a file with a malicious name. See git commit 4450f844, review request 111746.
  • Bug 323170: Escape HTML-like file names also in the tool tip for the status bar. See git commit d9b111b0, review request 111836.
  • Bug 299156: Handle ramfs mounts correctly so copying files to them works (this was a bug in kdelibs). See git commit af114cc9, review request 111115.

Code simplifications

Removing code without loss of functionality or replacing complex code by a shorter and simpler version is always very rewarding because it makes future maintenance easier. Sometimes, the simplification itself is even sufficient to fix bugs.

  • Simplifcations in the version control code. See git commits 173efe0b and cb6d1080, review request 107656.
  • Simplifications in the sorting code. See git commit ccbc9916, review request 108386.
  • Bug 294616, bug 311947: Replace the code which compares expanded items while sorting, which was quite complex and caused some crashes which were hard to reproduce, by a simple, robust and faster solution. See git commit b80384ca, review request 108766.

Thanks!

Many people provided patches or advice, helped with managing the endless stream of incoming bug reports, or used the master branch for their daily work and reported any regressions before they had the chance to cause serious trouble:

Aurélien Gâteau, Christoph Feck, Daniel Kreuter, David Faure, Dawit Alemayehu, Emmanuel Pescosta, Fabio D’Urso, Hrvoje Senjan, Jekyll Wu, Jens Rutschmann, Kai Uwe Broulik, Romário Rios, Simeon Bird, Stuart Citrin, Thomas Lübking, Vishesh Handa and Weng Xuetian. If I forgot anyone, please let me know.

Also the contributions by the people who work on documentation, translation, preparation of the released tarballs, packaging, and system administration are greatly appreciated! Without their help, it would not be possible to run a project like Dolphin successfully.

Posted in KDE

Some thoughts on algorithmic complexity, part 3: How to randomize an array

This is the third part of a series of blog posts. You can find the previous parts here:

Introduction

A task that many applications have to perform at some point is to randomize an array. Simple example: a card game like KPat should make sure that the order of the cards is random before the game starts. In this post, I will tell you how this can be done correctly and efficiently.

The wrong way

According to an interesting blog post that I stumbled across some time ago on Planet GNOME and another article that is linked from there, this is sometimes implemented like this (I’ve tried to translate the JavaScript version to C++):

#include <algorithm>
#include <ctime>
#include <cstdlib>
#include <vector>

bool randomLessThan(int a, int b)
{
    return (rand() % 2) == 1;
}

void randomize(std::vector<int>& v)
{
    std::srand(std::time(0));
    std::sort(v.begin(), v.end(), randomLessThan);
}

Anyone who knows a bit about how sorting algorithms work might expect that this will go terribly wrong. If you don’t, follow the links above and keep in mind that most sorting algorithms expect that the “lessThan” function or the operator “<” that is used for comparing items yields self-consistent results, i.e., that a<b and b<c implies a<c, for example.

Note that the C++ code above may cause even worse trouble than the JavaScript version. I found out about that only when this post was mostly ready to be published, and I thought that testing the code to find out if I made any stupid mistakes might be a good idea. The results were quite surprising. I started to write a new section that explains what happens and why, but then this post got a bit too long for my taste, and this is about a different topic anyway (“undefined behavior” is the keyword). So I copied that part to a new post which I will publish soon.

The slow way

I became curious and investigated how KDE applications randomize arrays. We have a class KRandomSequence with a member function that looked like this:

template<typename T> void randomize(QList<T>& list) {
    if (!list.isEmpty()) {
        QList<T> l;
        l.append(list.takeFirst());
        while (list.count())
            l.insert(int(getLong(l.count()+1)), list.takeFirst());
        list = l;
    }
}

Note that the function

unsigned long KRandomSequence::getLong(unsigned long max)

returns a random number x in the range 0 <= x < max.

Fortunately, this algorithm does not suffer from the problems of the sorting-based implementation, such that the result is a truly randomized QList. But if you read my last post in this series, you might have noticed the problem already. Inside the loop, the function inserts an item into a QList at a random position. Therefore, the function has O(N^2) complexity. My investigations with a simple benchmark showed that a QList with 100,000 ints is enough to keep my CPU busy for almost 4 seconds.

KRandomSequence:randomize(QList<T>&) is used by various games. Maybe the arrays that they randomize are never so big that the delay becomes noticeable. However, also the Plasma picture frame uses that function, and considering that I sometimes see Dolphin bug reports and forum posts from users who have tens or even hundreds of thousands of pictures in one directory, I would be surprised if nobody ever tried to set up a random slide show with 100,000 or more images. Probably the poor users thought that the computer was doing something useful, like examining the image files on disk, while they were waiting.

The right way

The Fisher-Yates shuffle algorithm, invented in 1938, randomizes an array in O(N) steps. For kdelibs 4.11, I have replaced the previous version of KRandomSequence::randomize() with an implementation based on this algorithm. It looks like this:

template<typename T> void randomize(QList<T>& list) {
    // Fisher-Yates algorithm
    for (int index = list.count() - 1; index > 0; --index) {
        const int swapIndex = getLong(index + 1);
        qSwap(list[index], list[swapIndex]);
    }
}

Randomizing a QList with 100,000 items only takes 10 milliseconds on my machine now :-)

Conclusion

If you are trying to implement a solution for a non-trivial problem, always do some research and look for existing solutions! If you don’t, you might end up reinventing the wheel and making mistakes that cause a lot of trouble. Also consider checking if a library like the C++ STL already has an implementation that you can use.

Posted in KDE

No more “unknown” icons

UnknownIconsIn recent versions of Dolphin, the view sometimes looked like this just after entering a directory.

Some of the files and sub-directories have “unknown” icons, which are replaced by the correct icons later.

This will not happen any more in Dolphin 4.11.

Why did we show you “unknown” icons at all in the first place?

Dolphin can show you a lot of information about your files. Some of this information can be obtained quite easily, like, e.g., the name, the file size, and the modification time.

However, some other properties of a file or directory might be more expensive to determine. Examples are:

  • The type of a file: if it cannot be determined unambiguously from the file name, we have no choice but to read the contents of the file.
  • The icon that has to be shown for a file depends on the type. A directory icon, on the other hand, can be chosen by the user in the “Properties” dialog. The chosen custom icon is then stored in a file named “.directory” inside the directory, so determining the icon for a directory always requires a disk access (because we don’t know in advance if there is a .directory file or not).
  • If previews are enabled, the preview for each item has to be retrieved from the .thumbnails directory in your home folder. If no cached preview can be found there, it gets worse, and the (possibly large) file must be read from disk to generate a new preview image.
  • Counting the number of files inside a directory also requires a disk access. It is shown, e.g., in the “Size” column in Details View.
  • Meta data provided by Nepomuk.

To make things more complicated, some of these properties can have “preliminary” and “final” values. For example, the preliminary icon for any directory is the generic “folder” icon, but the final icon is only known after checking the contents of the corresponding .directory file.

We obviously cannot determine the final values of all these properties synchronously, i.e., blocking the application’s main thread, for all files and directories. In large directories, it might take very long until this process is finished. Therefore, we have to find some compromise between “show the items in the view as quickly as possible” and “only show any kind of information if we know what its final state is”.

What happened in earlier Dolphin versions

We have a class called KFileItemModelRolesUpdater (.h, .cpp) which is responsible for determining the data that are possibly expensive to retrieve.

When you enter a directory, it tries to determine the final icons for the visible items for up to 200 milliseconds. It then then returns to the event loop, such that the view can be painted, mouse events can be processed, etc. The remaining icons and other data are determined asynchronously – first for the visible items in ascending order, then for all invisible ones in random order. We still do it in the main thread, but only for one item at a time, such that it never blocks for too long until the next events can be processed. Moreover, the asynchronous processing of the remaining items is suspended if the view is scrolled. This ensures that scrolling feels smooth.

This approach has two downsides:

  1. Some “unknown” icons can remain if 200 milliseconds are not sufficient to determine the final icons for all visible items. This can happen easily if there are many directories in the view, whose final icons can only be determined by accessing the contents of the directory.
  2. It wastes resources when viewing large directories. Loading previews for thousands of images (I know from user feedback at bugs.kde.org and forum.kde.org that it is not uncommon to have hundreds of thousands of images in one directory) can easily make the hard drive and the CPU busy for a few minutes, even though it is very unlikely that the user will ever scroll through all these images in the view. Moreover, the previews need a lot of memory.

Improvements in Dolphin 4.11

KFileItemModelRolesUpdater still only tries to load the final icons for 200 milliseconds. However, just before an item is shown on the screen, Dolphin will check if its final icon has been determined already. If that is not the case, it will use the “preliminary” icon, which can be determined very fast. In many cases, the preliminary icons are actually equal to the final ones. Notable exceptions are directories with custom icons, and most of the files in /usr/bin.

Some changes in kdelibs were required to make this possible, see this discussion on the mailing list for details about this, and how “Poor Man’s Profiling” helped to solve the problem. Many thanks to David Faure for his help!

Thanks to this modification, Dolphin will never show you “unknown” icons again for items which do have a valid icon.

Moreover, Dolphin will not try to load icons/previews/etc. for all items in the directory asynchronously any more. It will just do that for the items which are currently visible, and those which can be reached quickly by scrolling up or down a few pages or to the top or bottom of the view (up to 500 at a time).

This greatly reduces the amount of memory and CPU cycles that we use to load and show a directory.

Some more information about the code changes that were required for this can be found in the corresponding review requests: 1, 2, 3, 4, 5, 6.

Posted in KDE

Some thoughts on algorithmic complexity, part 2: Be careful if you insert into or remove from a std::vector, QVector, or QList in a loop

This is the second part of a series of blog posts. Like in part 1, most or all of what I write here will be obvious to many readers. However, I have seen problems which are equivalent to those that I show here more than once in widely-used code, so I think it makes sense to make people aware of these issues and possible solutions.

If you are familiar with containers like std::vector already, feel free to skip the introduction.

Introduction

Different kinds of data structures, also called containers, are a core part of most applications. Depending on the programming language, various containers are provided by the language itself, by libraries which are shipped with the compiler (like the STL in the case of C++), or by libraries like Qt. Containers like vectors, linked lists, sets, hashes, etc., have quite different properties. Each container has advantages in some situations and disadvantages in others. If you want to know more about these, you might want to have a look at the relevant page in the Qt docs or the “Containers” page at cppreference.com for a general overview, or a Qt quarterly article and an excellent article by Marc Mutz for some very interesting details.

QVector, QList, and std::vector

In this post, I will focus on containers which store their elements in one contiguous block of memory. These have the following useful properties:

  • The items have a well-defined order (this is not the case, e.g., in a QSet).
  • Each item has an integer index: the item stored at the beginning of the block has the index 0, the next one has the index 1, and so on. Moreover, accessing an item with a certain index is very fast, no matter how many items there are. It takes O(1), or constant, time, in contrast to linked lists, which also have a well-defined order, but require O(N) time to access a particular element.
  • Appending items at the end also needs constant time. The containers reserve some free space at the end to make sure that appending many items one by one will not require an expensive memory reallocation every time. Removing the last item also needs constant time.

However, inserting or removing items at a random position takes O(N) time because all items after the inserted/removed item (on average, that is N/2 items) have to be moved.

Special case: QList

One important difference between QList and the other two containers is that it does not store the items themselves in the contiguous block of memory if an item requires more size than a pointer. It rather stores pointers to the actual items in the contiguous block then. Moreover, it never uses less than the size of a pointer (4 or 8 bytes on a 32-bit or 64-bit platform, respectively) for an item. These properties can be beneficial in some situations (for example, it enables QLists with different data types to share much of their code in the binary, and it makes moving large items inside the list faster), but storing many bools in a QList might waste a lot of memory.

Another difference is that QList can even add items at the beginning or remove the first item in O(1) time because it also reserves some memory at the beginning. When inserting/removing at a random position, it checks if there are less items to the left or to the right and moves the smaller part of the memory block. Therefore, only N/4 items have to be moved on average, but that is still O(N).

Removing many items can be very slow if implemented incorrectly

Let’s assume that we have stored some items of a type T in a QVector, and we have to remove all items for which a function filter(T) returns true.

Simple example: remove all odd numbers from a QVector with many numbers.

remove

The bad solution

A possible solution would be to go through all numbers and remove any odd number that we find:

template<class Container, class Filter>
void remove1(Container& c, Filter filter)
{
    auto it = c.begin();

    while (it != c.end()) {
        if (filter(*it)) {
            it = c.erase(it);
        } else {
            ++it;
        }
    }
}

To test the efficiency of the function, I wrote a small benchmark that creates a sorted QVector containing the integers between 1 and N, and then removes either the first half or the second half of these numbers, or all odd numbers. The results are shown in this plot (the y-axis shows the time in milliseconds):

plot1

It turns out that the function is quite inefficient and has O(N^2) complexity in all test cases. The reason should be clear: every time an item is removed, all items after that one are moved one position to the front. In other words, O(N) items are moved O(N) times, so we get O(N^2) behaviour. What actually happens is shown here:

removeBad

Note that we could fix the “remove many items from the end” case by going through the items in reverse order (remember that removing the last item is fast). However, this would not help much in the other cases.

Good solution 1

Rather than moving all following items one position to the front when an item is removed, we could just replace it with the last item. This guarantees that removing a single item will always cause at most one other item to be moved:

template<class Container, class Filter>
void remove2(Container& c, Filter filter)
{
    auto it = c.begin();
    auto end = c.end();

    while (it != end) {
        if (filter(*it)) {
            --end;
            *it = std::move(*end);
            c.erase(end);
        } else {
            ++it;
        }
    }
}

The new function has linear complexity for all types of test cases:

plot2

The obvious disadvantage is that the relative order of the items is not preserved. However, if the order does not matter, then this solution might be worth considering, in particular if only a few items have to be removed (note that removing a single item is always O(1) with this solution).

Good solution 2

We can achieve linear complexity and still preserve the sort order. The solution is to make sure that every item is moved at most once: we move an item to the front only when its final position is known.

removeGood

Implementing this idea is quite straightforward, but it’s not even necessary, because this is just what std::remove_if does.

template<class Container, class Filter>
void remove3(Container& c, Filter filter)
{
    const auto oldEnd = c.end();
    const auto newEnd = std::remove_if(c.begin(), oldEnd, filter);
    c.erase(newEnd, oldEnd);
}

In our test cases, this is faster than the “replace by the last item” solution by a factor of 2:

plot3

Final remarks

  • If you insert many items into an existing std::vector, QVector, or QList, similar considerations apply.
  • Such problems really do exist in the software you use, even though they might not cause any serious trouble as long as you don’t let applications handle a large data set that triggers the worst case behaviour. I’ve fixed a few such problems in Dolphin recently, and I will tell you about more problems of this kind in the next post.
  • Thanks for all the interesting feedback that I got after my last post! In particular, I would like to mention Emilio Coppa from the University of Rome here. He works on aprof, a Valgrind-based tool that can help to find inefficient functions like those that I presented here. Just run your program inside this tool and make sure that all interesting functions are fed with various amounts of input data. I ran my benchmark in aprof (I just made it consider a larger set of different vector sizes to make the output look nicer and decreased the overall size of the vectors to reduce the runtime on Valgrind’s simulated CPU). Here is the visualization of the output for the function remove1:
    aprof-remove1
    And the same for remove2:
    aprof-remove2
    Note that you don’t have to write a benchmark at all to get such results with aprof. You can just run your application normally, make sure that it handles inputs with different sizes, and then you can spot the inefficient functions quite quickly (provided that you trigger the inefficient behavior of these functions). If you want to know how the tool works, have a look at this paper. It found a bug that was basically just the strlen issue that I discussed in my last post in the word frequency counter “wf”.
  • Just to prevent misunderstandings: I’m not saying that algorithms with O(N^2) worst-case complexity are always evil. If it is guaranteed that N will never get really large or that the input data will never trigger a bad worst-case behaviour, then there may be reasons why such an algorithm might be preferable.

Dolphin bugs fixed in June 2013

The last bug fix update for Dolphin 2.2 has been shipped recently. Here is a summary of the changes that you can enjoy if you upgrade:

  • Bug 320823: Rename the right item when items are added to the view while renaming. See git commit 4de9a233, review request 110909.
  • Bug 319912: Do not rename files unexpectedly when changing the URL while renaming. See git commit 5948dc0f, review request 110908.
  • Bug 310705, bug 312014: Ensure that switching to “Sort by Type” takes effect without reloading the view. See git commit 735c046d, review request 111004.
  • Bug 320897: Prevent that the rubber band selection rectangle gets a width or height of zero pixels. See git commit 22d93e8f, review request 111144.
  • Bug 321234: Allow renaming multiple files without providing one or more ‘#’ as a placeholder for a number if all extensions are different. This was an “accidental” feature that we were not aware of at all until a user reported that it had been broken by a recent bug fix. See git commit 6f2e1403, review request 111079.
  • Bug 316209: Prevent possible infinite recursion in the ViewProperties constructor, which caused crashes on MacOS. See git commit 04229f8f, review request 111182.
  • Bug 307336: When dragging a file from a search result to KMail, use the correct URL. See git commit c847aacf, review request 111209.
  • Bug 321198: Make searching recent files work correctly. See git commit 25a2255e.

Moreover, Dawit Alemayehu, who has fixed an amazing number of kdelibs bugs that affect file management, made two more commits last month which:

These improvements were contributed and/or tested by Dawit Alemayehu, Emmanuel Pescosta, Kurt Hindenburg, and Vishesh Handa. Thanks for your help! If I forgot anyone, please let me know.

Unfortunately, I’m unable to attend Akademy, but I hope that everyone who is in Bilbao has a good time there!

Posted in KDE