GSoC/2020/StatusReports/NghiaDuong

Digikam : DNN based Faces Recognition Improvements

DigiKam is a famous open-source photo management software. With a huge effort, the developers of digiKam have implemented face detection and facial recognition features in a module called faces engine. This module implements different methods to scan faces and then label them based on the pre-tagged photos given by users.

Since last year, as a result of Thanh Trung Dinh's project during GSoC 2019, digiKam's faces engine has adopted new CNN based face processing methods. These methods have been proven to give a better performance than other traditional image processing methods implemented in digiKam. However, there still are some limitations in the current implementation of the faces engine, therefore the main goals of this project to continue Thanh Trung Dinh's works and improve the performance of digiKam's faces engine.

Mentors : Gilles Caulier, Maik Qualmann, Thanh Trung Dinh

Important Links

Project Proposal

Digikam DNN based Faces Recognition Improvements

GitLab development branch

gsoc20-facesengine-recognition

Contacts

Email: [email protected]

Github: MinhNghiaD

LinkedIn: https://www.linkedin.com/in/nghia-duong-2b5bbb15a/

Project Goals

The current goals of this project are to :

Improve the accuracy of faces classifier
Optimize the use of memory of faces engine
Decrease storage space of faces engine
Improve processing speed
Re-structure faces engine architecture
Port faces engines to Plugin architecture

Project Report

Community Bonding period (May 1 to May 31)

During this period, my main objective was to familiarize myself with the work of Thanh Trung Dinh, in order to evaluate the current implementation. After going through Thanh Trung Dinh's codes and final report, I have a better understanding of the current implementation of digiKam's faces engine. Generally, the architecture of DNN faces engine can be divided into 3 main parts:

Face extractor is in charge with face detection. This module gives users the option to choose between 2 prominent face detection algorithms: YOLOv3 and SSD-MobileNet. The faces detected shall be cropped then passed to Face recognizer.
Face recognizer is in charge with face recognition process. It receives cropped face from the Face extractor and applies face alignment then passed the preprocessed face image through the neural network. After GSoC 2019, the CNN algorithm used by digiKam is OpenFacev1 - an implementation of [FaceNet paper].
Face database is in charge of database operations for the storage of functional data of digiKam's faces engine. This is the link between faces engine with digiKam application.

According to bug reports of digiKam's faces engine, the implementation of this module remains some problems that need to be addressed. The main problem reported in several bugs is that the performance of the faces engine decreases with the expansion of the data set. Therefore, for the rest of this period, I aimed to revaluates the exact state of different components of the faces engine. Because the 3 parts of the DNN version of faces engine are fully integrated into digiKam, it is difficult to evaluate the performance of each part without being added up more biases. In order to benchmark each component of the faces engine correctly, I created replicates of digiKam's Face extractor and Face recognizer as stand-alone modules that apply the previously implemented DNN algorithms to solve their problems, without any link to digiKam database or digiKam core library. After that, I finally programmed the first sketch of 3 unit tests for Face detector and Face recognizer.

Here is my plan for the first 2 weeks of the coding period is to:

Complete the unit tests for Face detector and Face recognizer.
Search for the problems that cause the decrease of performance.
Try out different methods for face classification.
Compare the performances of these different techniques.

Coding period : Phase one (June 1 to June 29)

In this phase, my work mostly concentrated on building the unit tests and applying different classifier methods for Face recognizer. Throughout these tests, points that need to be improved were revealed, so as to improve a better and faster face recognition module.

June 1 to June 14 (Week 1 - 2) - Report of current state of digiKam's faces engine

DONE

Unit test with GUI for Face Extractor (YOLOv3 and SSD-MobileNet).
Comparison of performance of YOLOv3 and SSD-MobileNet implementation in digiKam's faces engine.
Unit test with GUI for Face Recognizer (OpenFacev1).
Automatic unit test on large datasets to evaluate the performance of Face Recognizer.
Evaluation of current recognition methods used by the faces engine.

TODO

Add and test new recognition methods on Face recognizer in order to improve accuracy and processing speed.
Compare the performances of these different methods.

During these first weeks of GSoC 2020, I finalized my unit tests for 2 essential components of faces engine: Face extractor and Face recognizer. The purpose of these tests is to understand the internal work of faces engine and the reasons for the degradation over time reported in bug reports.

To verify the functionalities of Face Extractor, I built a test with GUI, in order to show the image matrices after each step of the face detection process. In this way, we can have a sense of what it is doing and then evaluate its performance.
To verify the functionalities of Face recognizer, I built 2 tests. A test with GUI to reproduce the face suggesting process. Another test receives a dataset as arguments and split it into a training set and a test set then passes them to the Face recognizer, in order to evaluate its performance.

Face Extractor status

Currently, digiKam's faces engine employs 2 different CNN algorithms for face detection. One is YOLOv3 and the other is SSD-MobileNet. The performances of the implementation of these 2 algorithms are slightly different in digiKam. For each image, YOLOv3 scans 10600 bounding boxes and therefore it gives very high accuracy, but it takes about 400 ms on average, on each image. On another hand, SSD-MobileNet scans only 20 boxes for about 20 ms on each image and it gives a lower accuracy. The default method used by the faces engine is SSD-MobileNet, because of its lightweight and rapidity.

Although the implementation of SSD-MobileNet performs rather well on average use cases, where all faces are clear and can be easily detected, it still has some limitations that need to be addressed. As an example above, I performed

Face Recognizer status

June 15 to June 29 (Week 3 - 4) - 84% accuracy and 104.804 ms/face speed on the ExtendedYaleB dataset

DONE

Apply Machine Learning classifiers on Face recognizer.
Accuracy improvement from 0% to 84% on the ExtendedYaleB dataset.
Processing speed improvement from 671.449 ms/face to 104.804 ms/face on the ExtendedYaleB dataset.
High dimensional data partitioning with KD-Tree.
Implementation of online learning in Face recognizer.
First sketch of database model for Face recognizer.

TODO

Fully integrate new improvements to the faces engine.
Reorganize the databases of the faces engine.
Apply map-reduce to distribute the calculations on multiple threads.
Port faces engine to plug-in architecture
Test UMAP Dimensionality reduction algorithm to have an insight into the global structure of face embedding.