Jump to content

GSoC/2019/StatusReports/ThanhTrungDinh

From KDE Community Wiki

digiKam AI Face Recognition with OpenCV DNN module

digiKam is KDE desktop application for photos management. For a long time, digiKam team has put a lot of efforts to develop face engine, a feature allowing to scan user photos and suggest face tags automatically basing on pre-tagged faces by users. However, that functionality is currently deactivated in digiKam, as it is slow while not adequately accurate. Thus, this project aims to improve the performance and accuracy of facial recognition in digiKam by exploiting state-of-the-art neural network models in AI and machine learning, combining with highly-optimized OpenCV DNN module.

The project includes 2 main parts:

  • Improve face recognition: implementation with OpenCV DNN module
    • reduce processing time while keeping high accuracy
    • classify unknown faces into classes of similar faces
  • Improve face detection: implementation to be investigated
    • detect faces across various scales (e.g. big, small, etc.), with occlusion (e.g. sunglasses, scarf, mask etc.), with different orientations (e.g. up, down, left, right, side-face etc.)

Mentors : Maik Qualmann, Gilles Caulier, Stefan Müller

Important Links

Proposal

Project Proposal

Git dev branch

gsoc19-face-recognition

Contribution

Work report

Bonding period (May 6 to May 27)

Generally, I familiarized myself with current Deep Learning (DL) based approach for face recognition in digiKam. I picked up the work of Yingjie Liu (the student working on that topic in 2017), investigated his codes, read his proposal, his blog posts and status report in order to understand clearly what he did and what he left. His work led me to FaceNet paper and a C++ implementation of the OpenFace face recognition library. They seemed very potential to my work. In addition, Liu also indicated the results of unit tests on his DL implementation. However, those tests were conducted externally, without using any digiKam preprocessing feature.

For the rest of the bonding period, I decided to read carefully FaceNet paper and also investigated other neural network models in order to select the right model to implement when coding period begins. I also started coding test program, so that I could evaluate more exactly the benchmark of current DL implementation for face recognition in digiKam.

My plan for next 2 weeks of coding period is:

  • Finish neural network model selection
  • Finish test codes
  • Start to port current DL implementation to OpenCV DNN module

Coding period : Phase one (May 28 to June 23)

May 28 to June 11 (Week 1 - 2) - Face Recognition got an 8x speed up

I completed my plan for these 2 weeks. I eventually came up with conclusion on using openface pretrained model, as well as the first draft working implementation with OpenCV DNN. On the other hand, I also finished my test codes and benchmarking for face recognition, and tested exhaustively current implementation of face recognition in digiKam, comparing with my new implementation using OpenCV DNN.

Current implementation with Dlib achieves an astonishing accuracy on orl database. It reached above 98% accuracy for 112x92 images in orl database, with only 20% of pre-tagged images. However, it took on average 8s for each image, which is too much. New implementation with OpenCV DNN didn't reach that accuracy, but run much faster. It reached more than 80% of accuracy for 20% of pretagged images. However, it only needed about 1.3 s for each image.

I was able to identify some problems with new implementation as follow:

  • Accuracy: It was on prediction phase when euclidean distance was used as a metric to evaluate if a face is closer to another. There are clues that other types of distance (e.g. cosine similarity) which does not require normalized vector may give better results.
  • Speed: a file containing model to compute face landmarks is loaded every time a face needed to recognize. In addition, detection is conducted twice on face, one by OpenCV Haar Cascade face detection and other internally by Dlib. All of them are unnecessary and take a lot of time to finish.
  • Modularity: different neural network model requires different kind of preprocessing for input images.


More details can be found in my [ blog post].

For final 2 weeks of first coding period, I intends to improve accuracy and speed of that new implementation. On the other hand, I will also investigate the effect of a better face detection model on accuracy of face recognition in digiKam. Scalability problem will be addressed later.

June 11 to June 23 (Week 3 - 4) - Face Recognition achieves above 90% and 100x speed up

For the last 2 weeks, I have well completed my plan.

  • By having models loaded only once at the beginning and keeping only Haar Cascade face detection, average time needed is now reduced to around 80ms per image, which is 10x faster than last implementation and 100x faster than current.

implementation.

  • Using cosine similarity instead of euclidean distance has proved an increase in recognition accuracy to 90% for 20% pre-tagged images, which opens a new direction for me.
  • I also tested a new approach for face detection with OpenCV DNN and SSD neural network model. The script was on python, but has also proved an accuracy of above 96% for 20% pre-tagged images.


More details can be found in my [ blog post].

Since new implementation of face recognition has achieved some promissing results now, I intend to concentrate on optimization for the next coding period:

  • Modularity: This should be addressed firstly. I will factor and restructure codes so that it will be easier for changing to new neural network models, which require different preprocessing.
  • Test and benchmark: I am thinking of changing to new dataset, such as LFW, which can provide better possibility to evaluate the accuracy of new implementation. Therefore, test codes need to be rewritten for better modularity.
  • Face detection optimization: Implement neural network approach for face detection with OpenCV DNN.
  • Face prediction optimization: Investigate OpenCV FLANN and types of distance indice between vectors. All of that are expected to improve the accuracy for face recognition, as well as introduce the possibility of classify unknown faces into groups of similar faces.

Blog Posts

Contacts

Email: [email protected]

Github: TrungDinhT