Krita/Benchmarking

From KDE Community Wiki
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Benchmarking Krita performance

Tile engine

Data Manager

  • Image dimension for test 4096*4096, RGB
  • I executed every test few times and I selected the results that popped again more times
  • callgrind backend did not produced callgrind.* files so I used valgrind directly, but that does create benchmarking also for Qt test lib
  • http://lukast.mediablog.sk/callgrind/DatamanagerBenchmarks.tar.gz
benchmark name walltime tickcounter Mb/s
benchmarkWriteBytes 38.0 msec per iteration (total: 380, iterations: 10) 77,528,468.2 ticks per iteration (total: 775284683, iterations: 10) 1333.3 Mb/s
benchmarkReadBytes 39.3 msec per iteration (total: 394, iterations: 10) 77,311,910.2 ticks per iteration (total: 773119103, iterations: 10) 1628.4 Mb/s
benchmarkReadWriteBytes 46.2 msec per iteration (total: 462, iterations: 10) 91,198,881.7 ticks per iteration (total: 911988817, iterations: 10) 1391.3 Mb/s
benchmarkExtent 0.00020 msec per iteration (total: 34, iterations: 163840) 735.0 ticks per iteration (total: 7350, iterations: 10) N/A
benchmarkClear 1.3 msec per iteration (total: 26, iterations: 20) 2,542,070.2 ticks per iteration (total: 25420702, iterations: 10) N/A

Iterators

Horizontal Iterator

benchmark name walltime tickcounter Mb/s
benchmarkWriteBytes 1,383.4 msec per iteration (total: 13834, iterations: 10) 4,389,801,089.3 ticks per iteration (total: 43898010893, iterations: 10) 46.3 Mb/s
benchmarkReadBytes 1,443.2 msec per iteration (total: 14433, iterations: 10) 4,461,418,645.5 ticks per iteration (total: 44614186455, iterations: 10) 44.4 Mb/s
benchmarkConstReadBytes 1,380.7 msec per iteration (total: 13808, iterations: 10) 4,501,257,062.3 ticks per iteration (total: 45012570623, iterations: 10) 46.3 Mb/s
benchmarkReadWriteBytes 2,041.7 msec per iteration (total: 20418, iterations: 10) 5,736,531,494.3 ticks per iteration (total: 57365314943, iterations: 10) 31.3 Mb/s
benchmarkNoMemCpy 655.7 msec per iteration (total: 6557, iterations: 10) 3,025,535,970.6 ticks per iteration (total: 30255359707, iterations: 10) 97.7 Mb/s
benchmarkConstNoMemCpy 583.7 msec per iteration (total: 5837, iterations: 10) 2,889,942,765.8 ticks per iteration (total: 28899427658, iterations: 10) 109.6 Mb/s
benchmarkTwoIteratorsNoMemCpy 1,205.7 msec per iteration (total: 12057, iterations: 10) 3,952,530,421.5 ticks per iteration (total: 39525304215, iterations: 10) 53.1 Mb/s


Update state:trunk 17.feb 2010 15:38

benchmark name walltime Mb/s
benchmarkWriteBytes 1,548.0 msec per iteration (total: 15481, iterations: 10) 41.34 Mb/s
benchmarkReadBytes 3,087.8 msec per iteration (total: 30878, iterations: 10) 20.73 Mb/s
benchmarkConstReadBytes 3,062.0 msec per iteration (total: 30620, iterations: 10) 20.90 Mb/s
benchmarkReadWriteBytes 3,725.0 msec per iteration (total: 37251, iterations: 10) 17.18 Mb/s
benchmarkNoMemCpy 2,264.4 msec per iteration (total: 22644, iterations: 10) 28.26 Mb/s
benchmarkConstNoMemCpy 2,316.8 msec per iteration (total: 23168, iterations: 10) 27.62 Mb/s
benchmarkTwoIteratorsNoMemCpy 2,950.0 msec per iteration (total: 29501, iterations: 10) 21.69 Mb/s

state: caching patch applied to trunk

benchmark name walltime Mb/s
benchmarkWriteBytes 1,211.4 msec per iteration (total: 12114, iterations: 10) 52.83 Mb/s (speedup 1.28)
benchmarkReadBytes 1,196.2 msec per iteration (total: 11962, iterations: 10) 53.50 Mb/s (speedup 2.58)
benchmarkConstReadBytes 1,202.2 msec per iteration (total: 12022, iterations: 10) 53.24 Mb/s (speedup 1.28)
benchmarkReadWriteBytes 1,563.0 msec per iteration (total: 15631, iterations: 10) 40.95 Mb/s (speedup 2.38)
benchmarkNoMemCpy 389.1 msec per iteration (total: 3891, iterations: 10) 164.48 Mb/s (speedup 5.82)
benchmarkConstNoMemCpy 372.5 msec per iteration (total: 3725, iterations: 10) 171.81 Mb/s (speedup 6.21)
benchmarkTwoIteratorsNoMemCpy 670.3 msec per iteration (total: 6704, iterations: 10) 95.48 Mb/s (speedup 4.4)

Vertical Iterator

benchmark name walltime tickcounter Mb/s
benchmarkWriteBytes 1,541.9 msec per iteration (total: 15419, iterations: 10) Not measured 41.52 Mb/s
benchmarkReadBytes 1,534.4 msec per iteration (total: 15344, iterations: 10) Not measured 41.7 Mb/s
benchmarkConstReadBytes 1,460.5 msec per iteration (total: 14606, iterations: 10) Not measured 43.82 Mb/s
benchmarkReadWriteBytes 2,156.3 msec per iteration (total: 21563, iterations: 10) Not measured 29.7 Mb/s
benchmarkNoMemCpy 649.0 msec per iteration (total: 6490, iterations: 10) Not measured 98.6 Mb/s
benchmarkConstNoMemCpy 599.3 msec per iteration (total: 5994, iterations: 10) Not measured 106.7 Mb/s
benchmarkTwoIteratorsNoMemCpy 1,231.5 msec per iteration (total: 12316, iterations: 10) Not measured 52 Mb/s

Rectangular Iterator

benchmark name walltime Mb/s
benchmarkWriteBytes 118.2 msec per iteration (total: 1182, iterations: 10) 541.4 Mb/s
benchmarkReadBytes 121.7 msec per iteration (total: 1217, iterations: 10) 525.9 Mb/s
benchmarkConstReadBytes 120.5 msec per iteration (total: 1205, iterations: 10) 533.3 Mb/s
benchmarkReadWriteBytes 167.0 msec per iteration (total: 1670, iterations: 10) 383.2 Mb/s
benchmarkNoMemCpy 35.7 msec per iteration (total: 358, iterations: 10) 1792.7 Mb/s
benchmarkConstNoMemCpy 37.7 msec per iteration (total: 377, iterations: 10) 1697.6 Mb/s
benchmarkTwoIteratorsNoMemCpy 65.2 msec per iteration (total: 652, iterations: 10) 981.6 Mb/s

Random Iterator

benchmark name walltime Mb/s
benchmarkWriteBytes 1,641.5 msec per iteration (total: 16415, iterations: 10) 39.0 Mb/s
benchmarkReadBytes 1,598.5 msec per iteration (total: 15985, iterations: 10) 40.0 Mb/s
benchmarkConstReadBytes 1,654.5 msec per iteration (total: 16545, iterations: 10) 38.68 Mb/s
benchmarkReadWriteBytes 2,934.8 msec per iteration (total: 29348, iterations: 10) 21.8 Mb/s
benchmarkNoMemCpy 971.3 msec per iteration (total: 9714, iterations: 10) 65.9 Mb/s
benchmarkConstNoMemCpy 938.6 msec per iteration (total: 9386, iterations: 10) 68.2 Mb/s
benchmarkTwoIteratorsNoMemCpy 1,929.7 msec per iteration (total: 19298, iterations: 10) 33.2 Mb/s
benchmarkTileByTileWrite 1,310.0 msec per iteration (total: 13101, iterations: 10) 48.9 Mb/s
benchmarkTotalRandom 27,999 msec per iteration (total: 27999, iterations: 1) 2.2 Mb/s
benchmarkTotalRandomConst 29,124 msec per iteration (total: 29124, iterations: 1) 2.2 Mb/s

KisPainter

Composition (bitBlt)

benchmark name walltime Mb/s
benchmarkBitBlt 5,456.8 msec per iteration (total: 54569, iterations: 10) 234.6 Mb/s
benchmarkBitBltSelection 5,922.8 msec per iteration (total: 59228, iterations: 10) 216.1 Mb/s
benchmarkFixedBitBlt 3,635.5 msec per iteration (total: 36356, iterations: 10) 352.1 Mb/s
benchmarkFixedBitBltSelection 5,342.1 msec per iteration (total: 53421, iterations: 10) 239.6 Mb/s

Filters

Brightness/Contrast

benchmark name walltime Mb/s
benchmarkFilter 1,783.5 msec per iteration (total: 17835, iterations: 10) 14.47 Mb/s

Blur


benchmark name walltime Mb/s
benchmarkFilter 31,674 msec per iteration (total: 31674, iterations: 1) 0.81 Mb/s

Projection

Everything is benchmarked in one go.

benchmark name walltime Mb/s
benchmarkProjection 834.6 msec per iteration (total: 8346, iterations: 10) N/A

Painting strokes

  • we paint on empty 4096x4096 paint device
  • The brush used is 70px pixelbrush, autobrush (the default one)
  • the benchmark can run with any paintop, just need to change the preset
  • first test paints the stroke you can see in the preview box in different scale. On 4096x4096px image.
  • the second test paints 20 random lines (every test the same 20 lines) with varying pressure (from 0.0 to 1.0)
  • http://lukast.mediablog.sk/callgrind/strokeBenchmarks.tar.gz [TODO add bouds result]


benchmark name walltime Mb/s
benchmarkStroke 2,962 msec per iteration (total: 2962, iterations: 1) N/A
benchmarkRandomLines 18,576 msec per iteration (total: 18576, iterations: 1) N/A

First results

Computer specification

Compiler options

gcc -Wnon-virtual-dtor -Wno-long-long -ansi -Wundef -Wcast-align -Wchar-subscripts -Wall -W -Wpointer-arith -Wformat-security -fno-exceptions -DQT_NO_EXCEPTIONS -fno-check-new -fno-common -Woverloaded-virtual -fno-threadsafe-statics -fvisibility=hidden -fvisibility-inlines-hidden -O2 -g -fPIC -Wl,--enable-new-dtags

In CMake Configuration we have option called KritaDevs, that's what I used for the benchmarking. This output was found by make VERBOSE=1

First optimizations

With performance fix + FastMath::atan2

benchmark name walltime Mb/s
benchmarkStroke 650.2 msec per iteration (total: 6503, iterations: 10) N/A
benchmarkRandomLines 4,158.8 msec per iteration (total: 41589, iterations: 10) N/A

Cyrille's tuning commits around lunch

benchmark name walltime Mb/s
benchmarkStroke 533.3 msec per iteration (total: 5334, iterations: 10) N/A
benchmarkRandomLines 3,555.5 msec per iteration (total: 35556, iterations: 10) N/A


Just with performance fix

benchmark name walltime Mb/s
benchmarkStroke 683.7 msec per iteration (total: 6838, iterations: 10) N/A
benchmarkRandomLines 4,696.3 msec per iteration (total: 46964, iterations: 10) N/A

Compute 1/4 for the symmetrical brushes

benchmark name walltime Mb/s
benchmarkStroke 257.3 msec per iteration (total: 2574, iterations: 10) N/A
benchmarkRandomLines 1,449.2 msec per iteration (total: 14492, iterations: 10) N/A