Hot Spots

thumbnails are recalculated a lot
the histogram docker calculates even when hidden
brush outline seems slow
the calculation of the mask for the autobrush is very slow and doesn't cache anything
caching a whole row or column of tiles in the h/v line iterators should speed up things a lot
tile engine 1 has the BKL; tile engine 2 cannot swap yet and isn't optimized yet
projection recomposition doesn't take the visible area into account
pigment preloads all profiles (startup hit)
gradients are calculated on load, instead of being associated with a png preview image that is cheap to load

Tools

Valgrind

Tips

only turn on instrumentation when you need it, ie only before the function you want to optimize, you can use callgrind_control to control valgrind. For instance, to stop instrumentation:

callgrind_control -i off

And then to activate it:

callgrind_control -i on

And unless you want to optimize startup, I suggest that you use the following startup line (which switch off instrumentation untill a call to "callgrind_control -i on"):

valgrind  --tool=callgrind --instr-atstart=no krita

Sysprof

mutrace

mutrace is a tool that count how much time is spend waiting for a mutex to unlock.

Easy optimization

As soon as you see slow code, try to have a look at the code to see if we aren't creating a lot of unnecesserary objects, 90% of the time slow code is caused by this (the remain 10% are often caused by a lot of access to the tilesmanager, like with random accessor)

For instance:

Avoid:

for(whatever)
{
        QColor c;
        ...
}

Do:

QColor c;
for(whatever)
{

}

It might seems insignificant, but really it's not, on a loop of a milion of iterations, this is expensive as hell.

An other example:

avoid

for(y = 0 to height)
{
        KisHLineIterator it = dev->createHLineIterator(0, y, width);
        for(whatever)
        {
                ...
        }
}

Do:

KisHLineIterator it = dev->createHLineIterator(0, y, width);
for(y = 0 to height)
{
        for(whatever)
        {
                ...
        }
        it.nextRow(); // or nextCol() if you are using a VLine iterator
}

Vector instructions

* reference about MMX on Intel's website
* Fundamentals of Media Processor Designs: introduction to the use of MMX/SSE instructions
* Software Optimization Guide for AMD64
* STL like programming but using MMX/SSE{1,2,3} when available

Links

TCMalloc: a malloc replacement which make faster allocation of objects by caching some reserved part of the memory
Optmizing CPP: extensive manual on writing optimized code.