Krita/Optimization: Difference between revisions
(→Easy optimization: fix a copy-pasto in example pseudocode) |
Miabrahams (talk | contribs) |
||
(One intermediate revision by the same user not shown) | |||
Line 15: | Line 15: | ||
=== Tips === | === Tips === | ||
You can tell callgrind to focus only on the part of the code you want to optimize. This results in cleaner data. For example, you may want to only monitor the performance when drawing a stroke. Unless the thing you're trying to optimize is the program startup, you can tell valgrind to run with the logging, or instrumentation, turned off at start: | |||
valgrind --tool=callgrind --instr-atstart=no krita | |||
Instrumentation can then be activated and deactivated with callgrind_control. To begin performance monitoring: | |||
callgrind_control -i on | |||
And then to end it: | |||
callgrind_control -i off | callgrind_control -i off | ||
I usually write a few aliases in my .bashrc (or .zshrc): | |||
alias callgrind="valgrind --tool=callgrind --instr-atstart=no" | |||
valgrind | alias callgrind-on="callgrind_control --instr=on" | ||
alias callgrind-off="callgrind_control --instr=off" | |||
== Sysprof == | == Sysprof == | ||
Line 81: | Line 87: | ||
= Vector instructions = | = Vector instructions = | ||
Krita takes heavy advantage of the [Vc https://github.com/VcDevel/Vc] library to speed up its brush strokes with CPU vector instructions. If you are planning to work with that library, it is worth reading through its documentation. | |||
There are more general introductions to what vector instructions are for, and how they work here. | |||
* [http://developer.intel.com/design/archives/processors/mmx/ reference about MMX on Intel's website] | * [http://developer.intel.com/design/archives/processors/mmx/ reference about MMX on Intel's website] |
Latest revision as of 06:26, 27 October 2015
Hot Spots
- thumbnails are recalculated a lot
- the histogram docker calculates even when hidden
- brush outline seems slow
- the calculation of the mask for the autobrush is very slow and doesn't cache anything
- caching a whole row or column of tiles in the h/v line iterators should speed up things a lot
- tile engine 1 has the BKL; tile engine 2 cannot swap yet and isn't optimized yet
- projection recomposition doesn't take the visible area into account
- pigment preloads all profiles (startup hit)
- gradients are calculated on load, instead of being associated with a png preview image that is cheap to load
Tools
Valgrind
Tips
You can tell callgrind to focus only on the part of the code you want to optimize. This results in cleaner data. For example, you may want to only monitor the performance when drawing a stroke. Unless the thing you're trying to optimize is the program startup, you can tell valgrind to run with the logging, or instrumentation, turned off at start:
valgrind --tool=callgrind --instr-atstart=no krita
Instrumentation can then be activated and deactivated with callgrind_control. To begin performance monitoring:
callgrind_control -i on
And then to end it:
callgrind_control -i off
I usually write a few aliases in my .bashrc (or .zshrc):
alias callgrind="valgrind --tool=callgrind --instr-atstart=no" alias callgrind-on="callgrind_control --instr=on" alias callgrind-off="callgrind_control --instr=off"
Sysprof
mutrace
mutrace is a tool that count how much time is spend waiting for a mutex to unlock.
Easy optimization
As soon as you see slow code, try to have a look at the code to see if we aren't creating a lot of unnecesserary objects, 90% of the time slow code is caused by this (the remain 10% are often caused by a lot of access to the tilesmanager, like with random accessor)
For instance:
- Avoid:
for(whatever) { QColor c; ... }
Do:
QColor c; for(whatever) { }
It might seems insignificant, but really it's not, on a loop of a milion of iterations, this is expensive as hell.
An other example:
- avoid
for(y = 0 to height) { KisHLineIterator it = dev->createHLineIterator(0, y, width); for(whatever) { ... } }
Do:
KisHLineIterator it = dev->createHLineIterator(0, 0, width); for(y = 0 to height) { for(whatever) { ... } it.nextRow(); // or nextCol() if you are using a VLine iterator }
Vector instructions
Krita takes heavy advantage of the [Vc https://github.com/VcDevel/Vc] library to speed up its brush strokes with CPU vector instructions. If you are planning to work with that library, it is worth reading through its documentation.
There are more general introductions to what vector instructions are for, and how they work here.
* reference about MMX on Intel's website * Fundamentals of Media Processor Designs: introduction to the use of MMX/SSE instructions * Software Optimization Guide for AMD64 * STL like programming but using MMX/SSE{1,2,3} when available
Profile guided optimization
Profile guided optimization is something else though. It is a special way of compiling and linking, that the compiler and linker use profiling information to know how best to optimize the code. So code that is used a lot is compiled with -O3 (the most optimizations), while code that is not used a lot gets -Os (to take less space), and so forth. This is a very useful technique that was not available on Linux until last year, and the news today is that Firefox now builds properly with it and there is a nice noticeable speed improvement for Linux users.
source: http://linux.slashdot.org/comments.pl?sid=2117150&cid=35987784
wikipedia: http://en.wikipedia.org/wiki/Profile-guided_optimization
g++ -O3 -march=native -pg -fprofile-generate ... //run my program's benchmark g++ -O3 -march=native -fprofile-use ...
Links
- Design for Performance : great read about performance optimization (aimed at game developers, but many tricks apply for Krita)
- TCMalloc: a malloc replacement which make faster allocation of objects by caching some reserved part of the memory
- Optmizing CPP: extensive manual on writing optimized code.