GSoC/2021/StatusReports/NghiaDuong: Difference between revisions
Line 58: | Line 58: | ||
As mentioned in the section above, the first goal of this project is to select a suitable model for the face classifier of digiKam's faces engine. Currently, we are using the Openface pre-trained model as the neural network of the faces engine. Cropped-face images are passed through the neural network to output 128-dimension face representation vectors called face embeddings. After exporting the face embedding data extracted from digiKam's faces engine, we stored it in a csv formatted file for easier experimentation and to eliminate the overhead recomputation. With R scripting language, we can easily manipulate the data and explore some classical algorithms like [https://en.wikipedia.org/wiki/Linear_discriminant_analysis LDA], [https://en.wikipedia.org/wiki/Quadratic_classifier QDA] and [https://en.wikipedia.org/wiki/Support-vector_machine SVM]. | As mentioned in the section above, the first goal of this project is to select a suitable model for the face classifier of digiKam's faces engine. Currently, we are using the Openface pre-trained model as the neural network of the faces engine. Cropped-face images are passed through the neural network to output 128-dimension face representation vectors called face embeddings. After exporting the face embedding data extracted from digiKam's faces engine, we stored it in a csv formatted file for easier experimentation and to eliminate the overhead recomputation. With R scripting language, we can easily manipulate the data and explore some classical algorithms like [https://en.wikipedia.org/wiki/Linear_discriminant_analysis LDA], [https://en.wikipedia.org/wiki/Quadratic_classifier QDA] and [https://en.wikipedia.org/wiki/Support-vector_machine SVM]. | ||
====== + Dimension reduction ====== |
Revision as of 19:29, 10 July 2021
Digikam: Faces engine improvements
digiKam is a famous open-source photo management software. Face engine is a tool helping users recognize and label faces in photos. Following the advance of Deep Learning, digiKam development team has been working on the Deep Learning implementation of the Faces engine since 2018. During the past few years, with the huge effort of digiKam developers and the great support from users, the Faces engine has been improved gradually.
Last year, during the 2020 Google Summer of Codes, I had a chance to work on digiKam's faces engine, as a part of the DNN based Faces Recognition Improvements project. At the end of this project, we were able to finish the implementation of a machine-learning-based classification system for facial recognition. On top of that, we also remodeled the face database of digiKam which was specialized in face embedding storage. As the final result, we achieved an accuracy of 84% on facial recognition, with a processing speed of about 19 ms/face.
However, after receiving reviews and bug reports from users, we found out that there are a few remaining problems on the faces engine. Therefore the main goals of this project to pick up the previous work and focus on improving the accuracy of digiKam's faces engine.
Mentors : Gilles Caulier, Maik Qualmann, Thanh Trung Dinh
Important Links
Project Proposal
Digikam Faces engine improvements
GitLab development branch
Contacts
Email: [email protected]
Github: MinhNghiaD
Invent KDE: minhnghiaduong
LinkedIn: https://www.linkedin.com/in/nghia-duong-2b5bbb15a/
Project Goals
The current goals of this project are to :
- Improve the accuracy of faces classifier with outlier detection
- Improve the speed of facial recognition and detection, improve batch processing
- Improve the face management pipeline organization
Project Report
Community Bonding period (May 17 - June 7)
The current version of the face classifier of the faces engine support K-Nearest neighbors and Linear Support vector machine as algorithms. With K-Nearest neighbors, we select a certain number of closest data points that lie inside a threshold area to select the best label to match with the data. If there aren't any points that satisfy the threshold, the classifier will consider that it is an outlier. This algorithm performs quite well on a large dataset. However, when the training data is limited, which is mostly the case for digiKam users, K-Nearest neighbors do not perform well with high dimensional data, due to the curse of dimensionality. On the other hand, SVM suffers less from the curse with dimensionality, but it only creates boundaries for a known number of labels. When it comes to face the image of a new entity, the classifier will try to assign it to a known label which is considered by the user a wrong label.
Therefore, during the community bonding period, my main goal is to prepare for the selection of the best classification algorithm for the face embedding distribution from the OpenFace CNN that we are currently using in the faces engine. The selection criteria are accuracy on multi-class classification and accuracy on outlier detection. This means that not only the selected algorithm has to perform well when it comes to classification between known labels, but also to be able to guess if a face is already "known" or not.
For this phase, some machine learning analytical tools like R or Python are suitable for quick experimentation and statistical exploration. Therefore, we decided to perform model selection in R, using the extract data from the Extended Yale B dataset that contains over 12900 face images. For this process, we split the entire dataset into 3 subsets, a validation set contains 50% of the data for the final performance evaluation of the selected model, a training set that contains 40% of the data, and a test set that contains 10% of the data for cross-validation. To support classification and novelty detection, we have algorithms that output the posterior probability of the data point belong to a group, for example, Discriminant analytics and Logistic regression. Or, we can use a dedicated algorithm to filter out the outliers, like One-class SVM, and then perform the classification using another algorithm.
Coding period : Phase one (June 7 - July 12)
In this phase, my work mostly concentrated on testing and improving the accuracy and speed of the facial recognition module of the faces engine. The works include finding suitable classification algorithms, trying out other deep learning models for facial recognition if needed, and implement data processing to improve general performance.
June 7 to June 21 (Week 1 - 2) - Experimentation on face embedding data
DONE
- Exploring face embedding data.
- Evaluating the performance of different classification algorithms on this dataset.
- Implementing the suitable algorithm in C++.
- Evaluating the performance of the C++ implementation.
TODO
- Try out the Keras-Facenet implementation of the Facenet model.
- Classification on the face embedding dataset
As mentioned in the section above, the first goal of this project is to select a suitable model for the face classifier of digiKam's faces engine. Currently, we are using the Openface pre-trained model as the neural network of the faces engine. Cropped-face images are passed through the neural network to output 128-dimension face representation vectors called face embeddings. After exporting the face embedding data extracted from digiKam's faces engine, we stored it in a csv formatted file for easier experimentation and to eliminate the overhead recomputation. With R scripting language, we can easily manipulate the data and explore some classical algorithms like LDA, QDA and SVM.