Digikam: Image Quality Sorter Improvement
Image Quality Sorter is a tool of digiKam. It helps users to label image by accepted or pending or rejected. However, the current implementation does not have a good performance. My proposal provides my perspective to improve this functionality.
Mentors : Gilles Caulier, Maik Qualmann, Thanh Trung Dinh
GitLab development branch
Community Bonding is the time for preparation. I spend my first post describing this important period. Thanks to mentors Gilles Caulier, Maik Qualmann, and Thanh Trung Dinh for helping me prepare for the work this summer.
I have spent these 3 weeks reading the huge codebase of digiKam, preparing an environment for coding, and study in-depth the algorithm that I am about to implement. This is the branch that will have my work for GSoC21.
After reading enough documentation and mathematics, I start preparing a base for testing each functionality of Image Quality Sorter (IQS). I found that it is the most important task in this period. This helps me have a tool to evaluate my future work without thinking further about how to test it. There are 2 types of unit tests: unit test for each factor (for example detectionblur) and unit test by pre-defined use-case.
Although these tests can not be passed now, they will be useful in the future.
Blur detection (first week : June 07 - June 14)
The figure below designs my algorithm.
As discussed in the proposal, there are 2 types of blur: motion blur and defocused blur. Hence, there should be detectors for each one due to the difference between their nature. Defocused blur appears when the edges of the object are fuzzy. While motion blur appears when the object or the camera moves when capturing.
As an observation, I found that defocused blur is the disappearance of edges, while motion blur is the appearance of many parallel edges in a small part of the image. Based on the algorithm of edges detection Laplacian of OpenCV, I extract an edges map of the image. I used this map to detect defocused blur and motion blur of the image.
The advantage of Laplacian is its sensitivity. It detects not only the main edges but also the small edges inside the object. So, I can distinguish the sharp region that belonged to the object and the blurred region of the background.
- Defocused detector :
- This edges map is not enough to detect defocused blur. To distinguish sharp pixels of edges and blurred pixels, I apply a log filter. (parameter : ordre_log_filtrer).
- Then, I smooth the image by the operator blur and medianblur of OpenCV to get a blurred map.
- At last, I determine if a specific pixel is a blur or sharp by using a threshold. (parameter : filtrer_defocus).
- Motion blur detector :
- At first, I consider only the clear edges from the edges map, by using a threshold. (parameter : edges_filtrer).
- Then, the image is divided into small parts with a pre-defined size (parameter: part_size_motion_blur).
- Then, I apply Hough Line Transform to determine every line in the part. This transformation uses parameter theta_resolution and threshold_hough to find lines in an image. Besides, instead of using class Hough transform (cv::HoughLines() ), I use Probabilistic Hough Line Transform (cv::HoughLinesP). HoughLinesP uses more criteria to determine a line than HoughLines includes: min_line_length. This parameter helps to prevent too short line.
- Having lines of a part, it is easy to calculate the angle of them. As mentioned in the proposal, a small part with many parallel lines is a motion blurred part. The parameter min_nb_lines is used to determine if a part can be motion blur. Then, I check if these lines are parallel by calculate the standard deviation of angle values and comparing it to the parameter max_stddev.
I tested with test images in unit test of digikam. These images are some different use-cases for blur detection. For instance, the algoritm can recognize the sharp image like rock, or even difficult case like tree. It is also success to recognize motion blur by cheking on image like rock_2 or tree_2. However, the recognition is not very sensitive to jugde them as rejected but only pending.
However, there are still some hard situations. Image street is easy to be recognized as motion blurred because of it nature. Image sky is recognized as defocused blur because of its part sky and cloud.
The project should cover also the artistic image which focus on some specific position and blurs the rest on purpose . It demands an implementation of focus region extractor based on metadata. Moreover, as mentioned above, there is a dead case for the algorithm when the image have background (sky, wall, water, ...). It is reasonable because background doesn't have edges to detect. It also can not be seen as an blurred object. I will try to recognize this part in the next week.
Focuspoint extraction and completion of blur detection (second week : June 14 - June 21)
Camera can use some points (name auto focus points) on image to make the region around sharper. Besides, it blurs the futher region, in order to make an aristic product. These points are named af points. An af point can be pre-defined by camera or by user, depend on model of camera. There are only 2 cases for af points : infocused and selected. Infocues points are the points sharpenned by camera, when selected points is the points expected to be clear by user.
An point can be both in 2 cases (infocus-selected) or neither both (inactive).
Being inspired by the plugins of focus points from LightRoom, I implemented the class FocusPointExractor in the library of metaengines of digiKam. By using exiftool which has been alread deployed in version 7.2 of digiKam, I can easily extract the auto focus infomation (af point). However, there is no standardized document. Therefore, each model of camera has its own function of extracting af points.
Currently, there are 4 models of camera which are extractable for af points :
Each point is not only a point on the image but a rectangle. Therefore, there are 4 properties : coordinates of center of the rectangle, width, and height. At last, the type of point is the fifth property.
FocusPoint is used to recognized focus region of image. This is a region where user expected to be clear. Therefore, I extract only selected points. For each point, I define focus region is the region around and propotional to the rectangle of point. I consider only pixels within this region for blur detection.
As mentioned in the last week, the algorithm should not consider the region of background, even in the image without focus point. By observation, these regions often have mono-color. Therefore, I detect them by dividing image small part, and calculate the standard deviation of color in this path. If it is smaller than a threshold (parameter : mono_color_threshold), I judge it a part of background and no longer consider it. By this adjustment, the figure below designs the last version of my algorithm.
The main problem now is fine tunning the parameter. If I set ordre_log_filtrer and filtrer_defocus too high, the sharp image can be judged pending. Though, If I set them too small, the blurred one can be judged pending.
Noise detection (third and fourth week : June 22- July 04)
Noise detection is implemented base on the paper noise estimation by kurtosis projection. The main idea is to define noise as a random variable followed Gaussian noise. The algorithm estimates the variance of this variable as the level of noise. By using band-pass filter transforms Haar, I can approximate the noise of image to Gaussian noise.
At the end of the algorithm, the result indicates the level of noise. However, it is expected to belong the interval [0 - 1]. However, the variance of noise can vary from 0 to infinity. Therefore, I use sigmoid function to normalize the result. By using to parameter alpha and beta, I can define a sensitive interval of variance. Image with variance at the begin of the interval would be labeled accepted while image with variance at the end of the interval would be labeled rejected.
To test the algorithm, I use some different use-case of noise:
- At first, the most common cause of noise is the wrong configuration of ISO parameter in the camera. I use book 1 and graffi 1 as good configuration of ISO, compare to book 2 and graffi 2 as bad configuration of noise. It passes this test on distinguishing these two cases.
- Secondly, I test on images with increasing amount of noise from test noised 1 to test noised 9. It passes the test by labeling test noised 1 as accepted, test noised 5 as pending and test noised 9 as rejected.
- At last, the detection should detect different kinds of noise. Follows the paper different kinds of noise, I generate noised images from a good image, and test with each case. For now, only noise salt & pepper and noise structure are not yet recognized by the algorithm
Exposure detection and Compressed detection (Fifth week : July 05 - July 11)
The main idea is using the histogram of luminance of image. I consider number of pixel in only 4 intervals [0 - threshold_underexposed],[threshold_underexposed - threshold_demi_underexposed],[threshold_demi_overexposed - threshold_overexposed],[threshold_overexposed - 255]. Each interval is attached to a weight which present its influence to the image. By using the following expression, I can indicate the level of overexposure and and underexposure of image:
- overexposed_level = (over_exposed_pixel * weight_over_exposure + demi_over_exposed_pixel * weight_demi_over_exposure) / (normal_pixel + over_exposed_pixel * weight_over_exposure + demi_over_exposed_pixel * weight_demi_over_exposure)
- underexposed_level = (under_exposed_pixel * weight_under_exposure + demi_under_exposed_pixel * weight_demi_under_exposure) / (normal_pixel + under_exposed_pixel * weight_under_exposure + demi_under_exposed_pixel * weight_demi_under_exposure)
At last, I configure the parameters to covers as much as possible user-cases. As image is rarely overexposed underexposed at the same time, I get over/underexposed level is the max between overexposed_level and underexposed_level.
I tested with various situations :
- Firstly, the algorithm can distinguish clearly overexposed / underexposed image.
- Then, the algorithm can also recognize backlight image as bad image (rejected label).
- At last, in some case of artistic work like sun image , the algorithm can only label it as pending.
At first, by studying the algorithm of compression, a compressed image will have an amount of pixel-blocks (as the left half of follow picture) which will reduce the quality of image.
A block is a square mono-color. My algorithm tries to detect the region of all pixel-blocks. I detects 2 types of pixel : pixel inside the block and pixel at the edges of each block:
- Inside a block, all pixels have exactly same colors.
- I detect pixel at the edges of a block by using this algorithm
Then, I count the number of each one and attach to a weight as parameter. At last, I configure the parameter to fix test-case. Image from test compressed 1 to test compressed 9 is a good example. Test compressed 9 is the original image (2.2 MB) and test compressed 1 is fully compressed (214 KB). For now, the algorithm can label test compressed 9 'accepted' and test compressed 1 as 'pending' ( with very low score of quality).
Calculation of quality and optimization (Sixth and Seventh week : July 12 - July 25)
The result of each quality detection only presents the influence of one aspect on the image. However, quality score is the result of a combination of divers detection. In addition, due to the perspective of users, each quality has a different weight on the quality. Therefore, I make a class of `image quality calculator` to gather the result of each detector and its correspondent weight. The calculator normalizes the weight of each detector to ensure that the result will be in the interval 0 to 100.
In order to test the performance of my algorithm, I use the database KonIQ-10k. Its advantage is that the quality of all images are scored by reliable quality ratings from many crowd workers and include in the interval of 0 to 100 which is suitable to our case. To perform a unit test for the whole `Image quality sorter` (IQS), I create 5 test cases based on five intervals of score in the database: bad image (score : 0 - 10), quite bad image (score: 10 - 30), normal image (score: 30 - 50), quite good image (score: 50 - 60) and good image (score > 60). For each test case, I take from the database 6 images which the correspondent score. Bad and quite bad images would be labeled REJECTED, normal images with PENDING and the last test case would be labeled ACCEPTED. The result is quite promising while it can give the right label on more than half of test-images. However, there are still various fail cases. One of the main reasons is that when an aspect of quality is extremely poor, people will rate it a bad image in spite of the other quality is good. I apply a trick to cover these cases. When a score of an aspect is extremely low, the final quality will get a punish to downgrade the score. This trick works well while it can cover more use-cases.
At last, I run IQS on the whole data set in order to calculate the accuracy of the algorithm. By applying threshold of REJECTED, PENDING, and ACCEPTED label, I classified each image as a reference label. Then, I compare the result of IQS to the reference. It git the result of 54% of accuracy. this is not a very good result. However, most of false prediction is the confusion between REJECTED label and PENDING label or ACCEPTED label and PENDING label. It can be resolved by user by modifying the threshold. The serious fail is the confusion between ACCEPTED label and REJECTED label. There are only 171 images in this case (on 10k images), this is a quite good result.
After having a quite good performance of accuracy, my work continues in optimizing the run-time of IQS. The main idea is to extract the common processes 4 detections and execute it only one time. Then, by applying multi-thread on 4 detections, I can decrease more time of processing.
UI for adding customized focus point and showing focus point (Eighth and ninth week : July 25 - August 9)
One of the purposes of the project is to cover artistic images with blurred backgrounds. The applied technique is to extract the auto-focus point (af point) of the camera read from ExifTool metadata. However, it is impossible to apply this technique if the image doesn't have auto-focus point or the model of the camera is not included in auto-focus point extractor. To get around these cases, I make to feature allowing user to create a customized focus point on the image that doesn't have this such information.
After adding this feature, it is reasonable to add a feature of showing all focus points of an image. As there are 4 types of focus points: inactive, selected, infocus and selected-infocus, I make a separation between them: red rectangle for selected or selected-infocus point, and black rectangle for inactive and infocus point as we concern the most selected region. We can only add a manual focus point for an image which doesn't have the default focus point of camera model.
Finish the project (Last week : August 09 - August 16)
I spent my last week to fix bug and resume my work. Hence, I have the list of TODO and DONE at this point.
- Implement new algorithm for blur detection, noise detection and compressed image detection. Rewrite the algorithm of over/under exposure detection.
- Refactor the classes of detectors and apply multi thread on executing it.
- Implement auto focus point extractor on 4 image model : Nikon, Canon, Panasonic, and Sony.
- Implement writer of focus point for image that doesn't come from above types.
- Implement feature of adding manual focus point and showing focus points of an images
- Make Image Quality Sorter as an action which can be executed on only selected image in preview panel.
- Write test for each feature.
- Blur detection can not recognize water or sky as sharp region but always blur. This should be fixed in the future.
- Noise detection can not recognize bandpass noise and salt & pepper noise. This should be fixed in the future.
- There are only 4 models of camera in focus point extractor. Another implementation for another camera models should be added.