GSoC/2019/StatusReports/ThanhTrungDinh: Difference between revisions

From KDE Community Wiki
< GSoC‎ | 2019‎ | StatusReports
No edit summary
No edit summary
Line 34: Line 34:


=== Coding period : Phase one  (May 28 to June 23) ===
=== Coding period : Phase one  (May 28 to June 23) ===
===== May 28 to June 11 (Week 1 - 2) =====
I completed my plan for thesee 2 weeks. I eventually came up with conclusion on using openface pretrained model, as well as the first draft working implementation with OpenCV DNN. On the other hand, I also finished my test codes and benchmarking for face recognition, and tested exhaustively current implementation of face recognition in digiKam, comparing with my new implementation using OpenCV DNN.
Current implementation with Dlib achieves an astonishing accuracy on [https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html orl database]. It reached above 98% accuracy for 112x92 images in orl database, with only 20% of pre-tagged images. However, it took on average 8s  for each image, which is too much. New implementation with OpenCV DNN didn't reach that accuracy, but run much faster. It reached more than 80% of accuracy for 20% of pretagged images. However, it only needed about 1.3 s for each image.
Although the accuracy achieved by openface model with OpenCV DNN implementation was not as good as dlib implementation with its own model, I could eventually indentify the bottleneck for both:
* '''Accuracy''': It was on prediction phase when euclidean distance was used as a metric to evaluate if a face is closer to another. There are clues that other types of distance (e.g. cosine similarity) which does not require normalized vector may give better results.
* '''Speed''': a file containing model to compute face landmarks is loaded every time a face needed to recognize, which is unnecessary.
More details can be found in my [ blog post].
For final 2 weeks of first coding period, I intends to improve accuracy and speed of that new implementation. On the other hand, I will also investigate the effect of a better face detection model on accuracy of face recognition in digiKam.
===== June 11 to June 23 (Week 3 - 4) =====





Revision as of 22:29, 23 June 2019

digiKam AI Face Recognition with OpenCV DNN module

digiKam is KDE desktop application for photos management. For a long time, digiKam team has put a lot of efforts to develop face engine, a feature allowing to scan user photos and suggest face tags automatically basing on pre-tagged faces by users. However, that functionality is currently deactivated in digiKam, as it is slow while not adequately accurate. Thus, this project aims to improve the performance and accuracy of facial recognition in digiKam by exploiting state-of-the-art neural network models in AI and machine learning, combining with highly-optimized OpenCV DNN module.

The project includes 2 main parts:

  • Improve face recognition: implementation with OpenCV DNN module
    • reduce processing time while keeping high accuracy
    • classify unknown faces into classes of similar faces
  • Improve face detection: implementation to be investigated
    • detect faces across various scales (e.g. big, small, etc.), with occlusion (e.g. sunglasses, scarf, mask etc.), with different orientations (e.g. up, down, left, right, side-face etc.)

Mentors : Maik Qualmann, Gilles Caulier, Stefan Müller

Important Links

Proposal

Project Proposal

Git dev branch

gsoc19-face-recognition

Contribution

Work report

Bonding period (May 6 to May 27)

Generally, I familiarized myself with current Deep Learning (DL) based approach for face recognition in digiKam. I picked up the work of Yingjie Liu (the student working on that topic in 2017), investigated his codes, read his proposal, his blog posts and status report in order to understand clearly what he did and what he left. His work led me to FaceNet paper and a C++ implementation of the OpenFace face recognition library. They seemed very potential to my work. In addition, Liu also indicated the results of unit tests on his DL implementation. However, those tests were conducted externally, without using any digiKam preprocessing feature.

For the rest of the bonding period, I decided to read carefully FaceNet paper and also investigated other neural network models in order to select the right model to implement when coding period begins. I also started coding test program, so that I could evaluate more exactly the benchmark of current DL implementation for face recognition in digiKam.

My plan for next 2 weeks of coding period is:

  • Finish neural network model selection
  • Finish test codes
  • Start to port current DL implementation to OpenCV DNN module

Coding period : Phase one (May 28 to June 23)

May 28 to June 11 (Week 1 - 2)

I completed my plan for thesee 2 weeks. I eventually came up with conclusion on using openface pretrained model, as well as the first draft working implementation with OpenCV DNN. On the other hand, I also finished my test codes and benchmarking for face recognition, and tested exhaustively current implementation of face recognition in digiKam, comparing with my new implementation using OpenCV DNN.

Current implementation with Dlib achieves an astonishing accuracy on orl database. It reached above 98% accuracy for 112x92 images in orl database, with only 20% of pre-tagged images. However, it took on average 8s for each image, which is too much. New implementation with OpenCV DNN didn't reach that accuracy, but run much faster. It reached more than 80% of accuracy for 20% of pretagged images. However, it only needed about 1.3 s for each image.

Although the accuracy achieved by openface model with OpenCV DNN implementation was not as good as dlib implementation with its own model, I could eventually indentify the bottleneck for both:

  • Accuracy: It was on prediction phase when euclidean distance was used as a metric to evaluate if a face is closer to another. There are clues that other types of distance (e.g. cosine similarity) which does not require normalized vector may give better results.
  • Speed: a file containing model to compute face landmarks is loaded every time a face needed to recognize, which is unnecessary.


More details can be found in my [ blog post].

For final 2 weeks of first coding period, I intends to improve accuracy and speed of that new implementation. On the other hand, I will also investigate the effect of a better face detection model on accuracy of face recognition in digiKam.

June 11 to June 23 (Week 3 - 4)

Blog Posts

Contacts

Email: [email protected]

Github: TrungDinhT