Optimize multithreading in Krita's Tile Manager
- Project Name: Optimize multithreading in Krita's Tile Manager
- Proposal: Project proposal
- Abstract: Nowadays almost everybody has a pc with multiple cores, this opens doors to parallel programming, when 2 or more tasks are executing simultaneously, allowing for better performance. And thus software must be written by taking this into account. In single threaded execution of a program there is no need to monitor shared resources, because it is guaranteed that only one thread can access resource at a given point in time. But in multithreaded program flow it is a common problem that resources must be shared between threads, futhermore, situations such as dirty read/write, etc. must be excluded for normal program behavior. So the simplest solution is to use locks on operations that must access shared resources so that only one thread can perform read/write operations on resources. This works good in low concurrent environments, but starts to show drop downs when more threads are added. To avoid that, one can use so called lock-free idiom, when no locks are held to access shared resources and this is basically the main project idea, to rewrite logic from using mutexes on read/write locks and lock free methods.
Optimize multithreading in Krita's Tile Manager Tasks
- Lock free hash table insted of old blocking one "implemented, merged" T8874
- Lock fixing in tiles data manager "implemented, merged" T9130
- Scheduler optimization "suspended" T9268
- 55 Commits total were merged/pushed in master.
Status report on each goal implementation.
Goal: The goal is to change this blocking table for a lock free one to relieve threads waiting on its methods.
Krita's canvas consists of special objects called KisTile, each has column and row, which can be negative. These objects are stored in a hash table, resembling sparse matrix and each method is guarded by read/write lock. So threads tend to wait a lot to do some action.
Current Status This is one of the trickiest parts of the project, and there were a lot of bugs related to this. There were suckh problems as: undo/redo broke, crashes etc. After merge, some people who use krita's nightly builds tested how program works with new hash table and reported several bugs that were fixed:
These bugs current status is FIXED, some were reported to happen again, but were fixed with latest commit.
TODO This feature is still to be tested thoroughly by people using krita, and there still can be some bugs. However most dangerous places were fixed.
Goal: This one is about locks in tiled manager. There are 3 hot places total that must be fixed.
Three main spots are:
- KisTiledExtentManager's blocking accesses to QMap.
- KisTiledDataStore's blocking accesses to QList.
- KisTileData's memory allocation using boost::pool to avoid memory fragmentation.
Current Status KisTileData's lock was fixed by introducing cache based on lock free stack, we push data on delete and pop on new. KisTiledDataStore's lock was protecting QList, that is not good for concurrent environment, it was changed for lock free map. KisTiledExtent manaer was the hardest one, cause due to its logic, it was hard to simply use concurrent container, so another approach was invented what helped to get rid of threads waiting. For now these features were tested for about a month and there were no major issues.
TODO I don't think that there is something to be done here.
Goal: Each multithreaded application needs a scheduler for jobs, this part is about optimizing it.
There are a lot of issues to pay attention to, better read about them here: stroke docs
Current Status Algorithm parts are deeply connected to each other, that prevents executing it concurrently. Main issues:
- QT signals use mutexes in internal work processing.
- KisUpdaterContext (class that stores information about jobs currently executing and that manages work schedule on QThreadPool) can be used only under mutex.
- KisStrokesQueue and KisSimpleUpdateQueue operate on non friendly container in concurrent environment and each item of container is also not guarded.
QT signals were fixed by simple brute force and KisUpdaterContext was rewritten to be more concurrent friendly. But issue with queues could't be solved by simply changing internal containers for lock free analogs and by the time being are not fixed.
TODO Simple hacks didn't prove to fix anything thread related here, only caused errors, so the problem itself must be fixed on a higher level by introducing new concurrent friendly and interfaces, changing jobs processing algorithm and maybe adding more job types to increase concurrent code execution.
GSoC work report chronicle
During community bounding period I still had university going on and was focused mainly on it. I didn't have any problems communicating with mentors/people on IRC though. They always answered if i had any questions.
I really started coding somewhere in the end of May, first part of my project was to port some lock free hash table on Qt's internal classes and to embed it into krita. I choose one from library called junction, written by Jeff Preshing. I have already ported it before GSoC sarted so it was easy to copy and paste it into krita's 3dparty sources. Then I begun to write a wrapper around it, so that I could easily switch interfaces. It was really tough, cause at that time my concurrent programming analyzing skills were very low and complexity was added by special object relation used in krita. I struggled with it about a month and then finally merged it into master, which leaded to some nasty bugs, described higher. Then I started to fix locks in tiled manager.
Cache in KisTileData (object that each tile references to) was the simplest one, cause krita already had lock free stack implemented, I just used it there. After that I went on fixing mutex guarding access to QList in KisTileDataStore (created and manages KisTileData). Firstly, I tried RW lock, but it didn't give any chances of success, then was made an attempt to introduce cache there too, but it failed. At last QList was changed for lock free map, with static counter indicating current data number. The hardest part was related to KisTiledExtentManager (stores and updates image extent that must be changed). The idea of this one is to manage current image extent, there were 2 QMaps that stored current used tiles, one for rows and one for columns. The when update was needed, method took min and max indexes from each map and was building extent based on their values. Concurrent map that I was using didn't have necessary methods, so I decided to go another way. I allocated an array of atomics, where each array index resembles used tile and was doing increment on add and decrement on remove. If there was no space or index was large/small to be stored there, array was growing by the power of 2. Every access was made under RW lock so that threads won't wait. there.
The last one is associated with scheduling jobs, there are 4 hot spots that had to be fixed, I managed to get rid of 2 of them, but last 2 are quite deeply connected with scheduling algorithm. As part of GSoC, I did the following tries to fix concurrency: made classes lock free and introduced RW locks, but they failed because scheduler algorithm parts are deeply connected to each other and it was quite hard to break that connection into independent pieces. There is some work on that, but last 2 spots are really tough, maybe they must be fixed on higher level by introducing new interfaces. So there is still job to be done.
Things left to do
The project is finished by 2/3, work scheduling algorithm still needs more research and testing, and it may lead to rewriting whole code itself.
Tables and graphs
Vtune output with all merged fixes