GSoC/2019/StatusReports/ThanhTrungDinh

digiKam AI Face Recognition with OpenCV DNN module

digiKam is KDE desktop application for photos management. For a long time, digiKam team has put a lot of efforts to develop face engine, a feature allowing to scan user photos and suggest face tags automatically basing on pre-tagged faces by users. However, that functionality is currently deactivated in digiKam, as it is slow while not adequately accurate. Thus, this project aims to improve the performance and accuracy of facial recognition in digiKam by exploiting state-of-the-art neural network models in AI and machine learning, combining with highly-optimized OpenCV DNN module.

The project includes 2 main parts:

Improve face recognition: implementation with OpenCV DNN module
- reduce processing time while keeping high accuracy
- classify unknown faces into classes of similar faces
Improve face detection: implementation to be investigated
- detect faces across various scales (e.g. big, small, etc.), with occlusion (e.g. sunglasses, scarf, mask etc.), with different orientations (e.g. up, down, left, right, side-face etc.)

Mentors : Maik Qualmann, Gilles Caulier, Stefan Müller

Important Links

Proposal

Project Proposal

Git dev branch

gsoc19-face-recognition

Important commits

Work report

Bonding period (May 6 to May 27)

Generally, I familiarized myself with current Deep Learning (DL) based approach for face recognition in digiKam and picked up the work of Yingjie Liu (digiKam GSoC 2017). His proposal, blog posts and status report led me to FaceNet paper and OpenFace - an opensource implementation of neural network (NN) model inspired by FaceNet paper. They seemed very potential to my work. In addition, Liu also implemented unit tests on his DL implementation and their results (benchmark and accuracy), which can be used as a reference for my work later.

For the rest of the bonding period, I focused on reading carefully FaceNet paper and also looking for other promising NN models to implement when coding period begins. Besides, I also prepared unit tests for face recognition, basing on current test programs and Liu's work.

My plan for next 2 weeks of coding period is to:

Finish NN model selection
Finish unit tests
Start to implement chosen NN model with OpenCV DNN

Coding period : Phase one (May 28 to June 23)

May 28 to June 11 (Week 1 - 2) - Face Recognition got an 8x speed up

DONE

NN model selection -> OpenFace pretrained model.
First draft working implementation of face recognition with OpenCV DNN and OpenFace pretrained model.
Unit test for evaluation and benchmarking new implementation (time and accuracy).
Tested and compared new implementation with Liu's work basing on Dlib, for performance and accuracy.

TODO

Improve performance (i.e. speed) and accuracy of new implementation.
Investigate the effect of better face detection on face recognition accuracy.

OpenFace model is chosen since it is appropriate for application like digiKam. OpenFace is a pretrained based on FaceNet paper, taking a cropped, aligned face as input and resulting a 128-D vector representing that face. The vector can be used later to compute similarity to pretagged faces, or clustered into groups of similar faces. For photo management software as digiKam, user has an "open" face database, where an unlimited number of faces and people can be added. Therefore, the face recognition should be flexible enough for extension, while having a decent accuracy. However, false positives are not so critical, since they can be always corrected by user. From all of the reason above, OpenFace is a good choice.

Liu's implementation is based on a version of OpenFace pretrained model, customized for compatibility with Dlib. It achieves an astonishing accuracy on orl database with above 98% of accuracy for tag prediction on faces with only 20% of pretagged images. However, its drawback is the performance. Face recognition on 112x92 images took on average 8s for a face. In case that the image is bigger, performance is worse.

My attempt on a new implementation with OpenCV DNN has shown great potential. While not being able to reach the accuracy of Liu's work, it runs much faster. For my first draft, the processing time for each image is only about 1.3s, while an accuracy of above 80% for 20% of pretagged images was achieved.

Indeed, talking about the potential of the implementation with OpenCV DNN, I was able to identify the points whose solution is going to improve:

Accuracy:
- It was on prediction phase when euclidean distance between 128-D vectors was used as a metric to evaluate whether a face is "similar" to another face or not. However, other types of distance (e.g. cosine similarity) more suitable for unnormalized vectors may return better results.
- More suitable face alignment is promising for better recognition accuracy. OpenFace github has python script on how faces should be aligned for the best accuracy achieved by output of the neural network
Speed:
- file containing the data to compute face landmarks (useful for aligning face before recognition) is loaded every time a face is recognized. This can be easily eliminated by loading and storing the data on memory. In addition, detection is conducted twice on face, one by OpenCV Haar Cascade face detection and other internally by Dlib. All of them are unnecessary and take a lot of time to finish.
Modularity:
- different NN models require different kinds of preprocessing for input images (e.g. appropriate face alignment in case of OpenFace). Therefore, an abstraction needs to be implemented to allow possible future use of other NN models.

June 11 to June 23 (Week 3 - 4) - Face Recognition achieves above 90% and 100x speed up

For the last 2 weeks, I have well completed my plan:

By having models loaded only once at the beginning and keeping only Haar Cascade face detection, average time needed is now reduced to around 80ms per image, which is 10x faster than last implementation and 100x faster than current implementation.
Using cosine similarity instead of euclidean distance has proved an increase in recognition accuracy to 90% for 20% pre-tagged images, which opens a new direction for me.
I also tested a new approach for face detection with OpenCV DNN and SSD neural network model. The script was on python, but has also proved an accuracy of above 96% for 20% pre-tagged images.

More details can be found in my [ blog post].

Since new implementation of face recognition has achieved some promissing results now, I intend to concentrate on optimization for the next coding phases:

Modularity: This should be addressed firstly. I will factor and restructure codes so that it will be easier for changing to new neural network models, which require different preprocessing.
Test and benchmark: I am thinking of changing to new dataset, such as LFW, which can provide better possibility to evaluate the accuracy of new implementation. Therefore, test codes need to be rewritten for better modularity.
Face detection optimization: Implement neural network approach for face detection with OpenCV DNN.
Face prediction optimization: Investigate OpenCV FLANN and types of distance indice between vectors. All of that are expected to improve the accuracy for face recognition, as well as introduce the possibility of classify unknown faces into groups of similar faces.

Coding period : Phase two (June 24 to July 21)

For this phase, I have finished some of the work on my check list that I stated at the end of the previous phase. Deciding to optimize to maximum face recognition beforehand, I have concentrated on Modularity and started to implement Face detection optimization. Indeed, changing to new dataset may help to better evaluate the implementation, I thought that it is more for tuning and having statistical results, which should be let to the end of GSoC.

June 24 to July 07 (Week 5 - 6) - Face Recognition is modularized for different neural network models

During these 2 weeks, I restructured the codes, so as to facilitate future development when we need to test different models of neural network for face recognition.

The motivation for this stems from the fact that different models need different preprocessing techniques. Therefore, the codes to load the neural network model with its own preprocessing techniques should be isolated. In addition, this allows the neural network model to be loaded only one time, which improve significantly the performance. Currently, the facesengine only takes from 40 - 60ms to process and recognize a face.

Besides, I delete the old codes using dlib, since after restructuring the codes, there are no places using dlib codes anymore. This is indeed a goal of my GSoC project this year, because this reduces efforts for maintenance, as well as eliminates dlib dependencies, compiler warning and complicated rules for compiler when compiling dlib.

Discussing with my mentors and others digiKam contributors about face recognition, we were all agree that face detection is one of the key factor to improve face recognition in dk. Indeed, current face detection with OpenCV Haar cannot detect face in photos with complicated light condition, shadow or non-frontal faces. When faces cannot be detected, they cannot be recognized. Moreover, the bounding box created by OpenCV Haar are to "small" (i.e. it losts some details on the face). Consequently, this decreases the robust of face recognition.

So for next week, my plan is:

Study SSD (Single Shot Multibox Detector) with MobileNet model for embedded application
Implement with OpenCV DNN module

July 08 to July 21 (Week 7 - 8) - Face Detection achieves 100% accuracy on orl test set

For these 2 weeks, I have studied and implemented SSD neural network model.

SSD is one of the neural network models tested with OpenCV DNN models. Therefore, in the github of OpenCV there is a folder supporting face detection with SSD. The model can be downloaded with scripts from this folder. There are also examples on how to implement SSD with OpenCV DNN module.

Implementing SSD, I got a very good result on orl test set, since all faces are detected in comparing with around 90% in case of OpenCV Haar. It also increases the accuracy of face recognition to 96% for 20% of pretagged images. I have not measured the latency for face detection with OpenCV DNN yet, but it seems comparable with OpenCV Haar.

However, when testing on photos downloaded from the internet, the implementation did not work well. Actually, faces are not detected or detected at the wrong places. Therefore, SSD implementation with OpenCV DNN must be studied more.

For the final coding phase of GSoC this year, I intend to work on:

Face detection optimization: Improve SSD implementation with other photos.
Face prediction optimization: Investigate OpenCV FLANN and types of distance indice between vectors. In addition, unknown faces must be classified into groups of similar faces.
Test and benchmark: Changing to LFW dataset. Test codes need to be rewritten for better modularity.

Blog Posts

Contacts

Email: [email protected]

Github: TrungDinhT