Digikam/GSoC2019/AIFaceRecognition: Difference between revisions

From KDE Community Wiki
(Created page with "Hello reader, Tis article describes the current state of the face detection algorithms of digiKam and the desired outcome of the corresponding GSoC project. It is recommended...")
 
(added Thanh-Trung Dinh paper and mentioned that the scope is extended)
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
Hello reader,
=Introduction=
Tis article describes the current state of the face detection algorithms of digiKam and the desired outcome of the corresponding GSoC project.  
Hello reader, <br>
This article describes the current state of the face detection algorithms of digiKam and the desired outcome of the corresponding GSoC project.     <br>
It is recommended to read [https://community.kde.org/Digikam/GSoC2019/FacesManagementWorkflowImprovements Faces Management workflow improvements], as this describes the entire face management workflow. Thus it helps to understand the scope of these algorithms and where it need clarification about its structure and interfaces with other parties (code modules).
It is recommended to read [https://community.kde.org/Digikam/GSoC2019/FacesManagementWorkflowImprovements Faces Management workflow improvements], as this describes the entire face management workflow. Thus it helps to understand the scope of these algorithms and where it need clarification about its structure and interfaces with other parties (code modules).


{{under construcion}}
Currently, there are four different methods using the corresponding algorithm, which are more or less operational. The used algorithm can be chosen in the one Face Scan dialogue. <br>
The goal is to be able to recognize automatically faces in images, which are not tagged, using a previous ''face tag'' registered in the face recognition database. The algorithms are complex but explained in more detail below.


2 The faces recognition:
=currently implemented face recognition algorithms=
This introduce 4 different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non tagged face from images, using a previous face tags registered in database. The algorithms are complex but explained in more details in the wiki page for the GSoC faces recognition project.
<ol>
The 4 different methods are explained here in brief only
  <li>Deep Neural Network (DNN) DLib <br>
2a Deep Neural Network DLib
        This is an experimental implementation of a neural network to perform faces recognition. <br>
digiKam has already an experimental implementation of Neural Network to perform faces recognition  
        This DNN is based on the DLib code, a low-level library used by OpenFace project. This code works, but it slow and complex to maintain. It is rather a proof of concept than being used for productive use. <br>
This DNN is based on DLib code, a low level library used by OpenFace project. his code work, but it slow and is complex to maintain. It's more and less a proof of concept. The documentation in source code is inexistant.
Moreover, the documentation in the source code is non-existent.


The code from Dlib is mostly the machine learning core implementation : http://dlib.net/ml.html https://sourceforge.net/p/dclib/wiki/Known_users/ This DNN code was introduced by a student with a previous GoSC project : Yingjie Liu <[email protected]>. This code work, but it slow and is complex to maintain. It's more and less a proof of concept. The documentation in source code is inexistant.
      The code of Dlib is mostly the machine learning core implementation of [http://dlib.net/ml.html Dlib C++  Library] and referenced in projects in [https://sourceforge.net/p/dclib/wiki/Known_users the Dlib users list on SourceForge].
2b Deep Neural Network lbph :  
<br> <br>
this is the most complete implementation and the older one implemented in digiKam. It's not perfect and require at least 6 face already tagged manually by end user to identify the same person. This algorithm record a histogram of face in database, which is used later to perform the comparisons.  
  </li>
This one use OpenCV backend.https://towardsdatascience.com/face-recognition-how-lbph-works-90ec258c3d6b
  <li> [https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html#local-binary-patterns-histograms OpenCV] - [https://en.wikipedia.org/wiki/Local_binary_patterns Local Binary Patterns Histograms] ([http://www.scholarpedia.org/article/Local_Binary_Patterns LBPH])<br>
This is the most complete implementation of a face detection algorithm. Moreover, it is the oldest implementation of such an algorithm in digiKam. It's not perfect and requires at least six faces already tagged manually by the user to identify the same faces in non-tagged images. <br>
This algorithm records a histogram of the face in the database, which is used later to perform the comparisons against new/non-tagged faces.  
This one use OpenCV backend based on [https://towardsdatascience.com/face-recognition-how-lbph-works-90ec258c3d6b Towards Data Science - Face Recognition: Understanding LBPH Algorithm].
<br> <br>
</li> 
<li>[https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html#eigenfaces OpenCV] - [https://en.wikipedia.org/wiki/Eigenface Eigen Faces] <br>
An alternative algorithm what uses the OpenCV backend. It was introduced to have a different source of results for face detection, enabling to proof the DNN approaches.
<br> <br>
</li>
  <li> [https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html#fisherfaces OpenCV] - [http://www.scholarpedia.org/article/Fisherfaces Fisher Face] <br>
Another algorithm what uses the OpenCV backend. It was introduced for the same purposes as Eigen Faces.  <br>
According to rumours, this one is not finalized, it is said that not all methods are implemented.
</li>
</ol>
<br>
There is a paper explaining the difference between Fisher and Eigen Faces, see [http://disp.ee.ntu.edu.tw/~pujols/Eigenfaces%20and%20Fisherfaces.pdf Eigenfaces and Fisherfaces - Presenter: Harry Chao - Multimedia  Analysis  and  Indexing –Course 2010 .pdf]


2b  Eigen Faces :
==why so many different approaches?==
Another algorithm what uses OpenCV backend. https://en.wikipedia.org/wiki/Eigenface. It was used to compare it with the DNN approaches.
2c Fisher Faces :
Another algorithm what uses OpenCV backend. https://en.wikipedia.org/wiki/Eigenface. It was used to compare it with the DNN approaches.
According rumors this one is not finalized, as i remember some method not implemented.
This paper explain well the difference between fisher and eigen faces http://disp.ee.ntu.edu.tw/~pujols/Eigenfaces%20and%20Fisherfaces.pdf


All code code was introduced by a student with a previous GoSC project : Yingjie Liu <[email protected]>
: The idea why four different algorithms were implemented is simply to be able to make a comprehnsive assessment of the currently available technologies applicable in digiKam and eventually choose the best one. <br>
The 4 kind of recognizer algorithm are instanced and right one is used depending of user choice from Face Scan dialog.
The student who worked on the DNN project a few years ago has concluded that DNN was the best method to recognize faces with little error rate as possible. Unfortunately, the training and recognition process took too long and slowed down the application.
All the low level steps to train and recognize faces are done in this class.
For the middle level codes, muti-threaded and chained, started by the face scan dialog, all is here:
https://cgit.kde.org/digikam.git/tree/core/utilities/facemanagement?h=development/dplugins
Why 4 kind of recognition algorithms
To compare and choose the best one. The student working on DNN project few year ago has concluded that DNN was the best method to recognize with less error as possible. But the training and recognition process take age and slow down the application.
It is agreed that DNN is the best way to go, but not using the current implementation based on DLib.
With 3.x versions, OpenCV has introduced a DNN API.
I shall be used instead of the others approaches as done for the face detection


: Regardless that fall-back, it is agreed that DNN is the best way to go, but the current implementation based on DLib shall not be used.


=prevoius work=
#DNN
#Eigen Faces
#Fisher Face
: All above-mentioned algorithms were introduced by the student [mailto:[email protected] Yingjie Liu] during the [https://community.kde.org/GSoC/2017/Ideas GSoC 2017]. <br>
: More information is given in Lui's [https://community.kde.org/GSoC/2017/StatusReports/YingjieLiu GSoC 2017 status reports] and  his papers:
:# [https://docs.google.com/document/d/123p766jocGVT9aX2O9OL7FivXfYXd73ieMAET-BJN7M/edit?usp=sharing Face Management improvements] covering Eigen Faces and Fisher Face
:# [https://docs.google.com/document/d/1A7ocCm90RNRUlbde_ywWYvuukY-VEFyqnuMMbPod8Xc/edit?usp=sharing Work Report]
:# [https://docs.google.com/document/d/1OE6w6D8Zr26VV7AzTRpZtgzyz0T9tjWBxOuHYXwDdS4/edit?usp=sharing Added the possibility to manually sort the digiKam icon view] but that was done in the [https://community.kde.org/GSoC/2018/Ideas#digiKam GSoc 2018]
<ol start="4">
<li>LBPH</li>
tba
</ol>


6/ Which kind of info are stored in database?
=code=
This depend of recognition algorithm used. Histograms, Vector, Binary data, all responsible of algorithm computation, and of course all not compatible. Typically, when you change the recognition algorithm in Face Scan dialog, the database must be clear as well.
All the low-level steps, initiating the entire workflow, what is to detect and recognize faces by algorithm training are initiated in the class [https://cgit.kde.org/digikam.git/tree/core/libs/facesengine/recognitiondatabase.cpp?h=development/dplugins#n87 root/core/libs/facesengine/recognitiondatabase.cpp]
But in fact this kind of database mechanism must be dropped, when DNN algorithm will be finalized, and only this one retained to do the job. As i said previously, 4 algorithms are implemented to choose the best one. At end, only one must still in digiKam face engine, and all the code must be simplified.
 
All the middle-level codes, the subsequent actions, are multi-threaded and chained, starting by the face scan dialogue .These are listed [https://cgit.kde.org/digikam.git/tree/core/utilities/facemanagement?h=development/dplugins in the directory root/core/utilities/facemanagement] for better visibility. In the past, this code was mainly written by [mailto:[email protected] Marcel Wiesweg].
 
=database operation=
Which kind of info is stored in the database? <br>
This depends on the recognition algorithm used. Histograms, Vector, Binary data, all are responsible for algorithm computation, and of course, not each of them is compatible to each other. Thus it is necessary to clear the database, when you change the recognition algorithm in Face Scan dialogue. But in fact this kind of database mechanism must be dropped, when the openCV DNN algorithm will be finalized and remaines as the only one to do the job.
 
During the scan process, the following will be done (as described in [https://cgit.kde.org/digikam.git/tree/core/utilities/facemanagement/README.FACE?h=development/dplugins root/core/utilities/facemanagement/README.FACE]
<ol>
<li> DETECTION - MARK IMAGE AS SCANNED, "IMAGE SCANNED"<br>
assign a tag to images, indicating whether they have been scanned or not. <br>
In this case, the scanned images are tagged with the "/Scanned/Scanned for Faces" tag. <br>
This is the most simple approach as avoid to code a new database table. <br>
Other jobs which need to "mark" images like this can create their own "/Scanned/<Name of job>" tag.
<br><br>
<li> DETECTION - MARK THAT AN FACE IS FOUND IN THE IMAGE , ''"/PEOPLE/UNKNOWN''"-TAG <br>
Initially, when any face scan is run, tag the ''People'' tag is added to that image, plus the subtag for ''Unknown People'', resulting in the following database entry "/People/Unknown".
<br><br>
<li> DETECTION - ADD FACE LOCATION TO DATABASE <br>
Subsequently, the position in the detected face is added as a property (with a key "faceRegion") to the corresponding core database entry of the image.
The value of the property is the "region-rectangle" complying with the [https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Basic_Shapes rectangle shape formalism] of [https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Basic_Shapes Scalable Vector Graphics (SVG)]
<br><br>
<li> RECOGNITION  - LINK FACE TO RECOGNITION DATABASE ENTRY
Each image tagged with "/People/Unknown" is scanned, but only partial, as only the region defined in "region-rectangle" without a "face tag" are scanned. <br>
If a face is recognized in a region an ID is written to the "region-rectangle" property, corresponding to the ID of the face in the recognition database (e,g the corresponding histogram when LBPH is selected).
<br><br>
<li> RECOGNITION  - MARK THAT A FACE IS RECOGNIZED , ''"/PEOPLE/UNCONFIRMED''"-TAG <br>
As the algorithm does not tag the names fully automatic the "region-rectangle" is tagged with "/People/Unconfirmed" <br>
Moreover, the "region-rectangle" is unlinked from"/People/Unconfirmed". "/People/Unknown"
<br><br>
<li> RECOGNITION  - CONFIRM FACES, ''"/PEOPLE/<PERSON NAME>''"-TAG <br>
When the face is later identified by the user, the new tag "/People/<Person Name>" is assigned  to the "region-rectangle".
Moreover, the "region-rectangle" is unlinked from "/People/Unknown". <br>
In addition the <Person Name> is added as a keyword to the metadata of the image complying with the schema "/People/<Person Name>" in order to make it findable/filterable by means of metadata tags.
</ol>
 
 
Since the metadata shall not be flooded by this process, anything in "/Scanned/..." is not shown in digiKam's GUI and ignored in any metadata related process. <br>
Furthermore only confirmed faces are taken into account in metadata related processes.
 
=Expected results of this GSoc 2019 project=
DigiKam core already depends on OpenCV library to perform complex image processing.  Moreover. the OpenCV >= 3.3 release provides a new [https://docs.opencv.org/3.4.3/d2/d58/tutorial_table_of_content_dnn.html OpenCV DNN (Deep Neural Network) module].
 
The goal now is to port the current digiKam core face recognition DNN extension to the new OpenCV API and write all unit tests to validate the algorithm usability, efficiency, and performance, while learning and recognizing faces automatically.
The outcome shall be used instead of the other face recognition and may detection approaches, mentioned further above.
 
[[update:]]
As OpenCV provides an integrated workflow of face detection and recognition the scope is extended to face detection.
There is an excellent GSoC proposal made by Thanh-Trung Dinh, it will be published as soon as the proposal submittal period closes.
 
=Project tasks=
All relevant bug reports can be found in
<ul>
<li>  [https://bugs.kde.org/buglist.cgi?bug_status=__open__&component=Faces-Recognition&list_id=1583144&product=digikam digikam Bug List - Component: Faces-Recognition Status: REPORTED, CONFIRMED, ASSIGNED, REOPENED]  <br>
but the workflow entries shall be also present to the student.
<li> [https://bugs.kde.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=CONFIRMED&bug_status=ASSIGNED&bug_status=REOPENED&component=Faces-Workflow&list_id=1595307&product=digikam  digikam Bug List - Component: Faces-Workflow Status: REPORTED, CONFIRMED, ASSIGNED, REOPENED]
</ul>
 
=requirements on the student(s)=
This is a break-down fo the description of how to [https://community.kde.org/GSoC participate in the Summer of Code program with KDE]. <br>
Typically, the student must review all related Bugzilla entries given in the corresponding Bugzilla section of the project. If this project or the Bugzilla does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors.
The student is expected to work autonomous technically-wise, so the answers to challenges will not be found independently of the support of the maintainer. This does not mean that the maintainers cannot be reached by the student. Guidance will be given at any time in any case but shall that be limited to occasional situations to allow the maintainers to follow up on their work. <br>
Regardless of the above-mentioned channel of communication, the maintainers review and validate the code in their development branch bevor merging it to the master branch.
 
Besides coding, it is required to submit a technical proposal, wherein is to list :
* the problematic,
* the code outlining, being merged into the master branch
* the tests
* the overall project plan for this summer,
* documentation to write (mostly in code), etc.

Latest revision as of 19:38, 28 March 2019

Introduction

Hello reader,
This article describes the current state of the face detection algorithms of digiKam and the desired outcome of the corresponding GSoC project.
It is recommended to read Faces Management workflow improvements, as this describes the entire face management workflow. Thus it helps to understand the scope of these algorithms and where it need clarification about its structure and interfaces with other parties (code modules).

Currently, there are four different methods using the corresponding algorithm, which are more or less operational. The used algorithm can be chosen in the one Face Scan dialogue.
The goal is to be able to recognize automatically faces in images, which are not tagged, using a previous face tag registered in the face recognition database. The algorithms are complex but explained in more detail below.

currently implemented face recognition algorithms

  1. Deep Neural Network (DNN) DLib
    This is an experimental implementation of a neural network to perform faces recognition.
    This DNN is based on the DLib code, a low-level library used by OpenFace project. This code works, but it slow and complex to maintain. It is rather a proof of concept than being used for productive use.
    Moreover, the documentation in the source code is non-existent. The code of Dlib is mostly the machine learning core implementation of Dlib C++ Library and referenced in projects in the Dlib users list on SourceForge.

  2. OpenCV - Local Binary Patterns Histograms (LBPH)
    This is the most complete implementation of a face detection algorithm. Moreover, it is the oldest implementation of such an algorithm in digiKam. It's not perfect and requires at least six faces already tagged manually by the user to identify the same faces in non-tagged images.
    This algorithm records a histogram of the face in the database, which is used later to perform the comparisons against new/non-tagged faces. This one use OpenCV backend based on Towards Data Science - Face Recognition: Understanding LBPH Algorithm.

  3. OpenCV - Eigen Faces
    An alternative algorithm what uses the OpenCV backend. It was introduced to have a different source of results for face detection, enabling to proof the DNN approaches.

  4. OpenCV - Fisher Face
    Another algorithm what uses the OpenCV backend. It was introduced for the same purposes as Eigen Faces.
    According to rumours, this one is not finalized, it is said that not all methods are implemented.


There is a paper explaining the difference between Fisher and Eigen Faces, see Eigenfaces and Fisherfaces - Presenter: Harry Chao - Multimedia Analysis and Indexing –Course 2010 .pdf

why so many different approaches?

The idea why four different algorithms were implemented is simply to be able to make a comprehnsive assessment of the currently available technologies applicable in digiKam and eventually choose the best one.
The student who worked on the DNN project a few years ago has concluded that DNN was the best method to recognize faces with little error rate as possible. Unfortunately, the training and recognition process took too long and slowed down the application.
Regardless that fall-back, it is agreed that DNN is the best way to go, but the current implementation based on DLib shall not be used.

prevoius work

  1. DNN
  2. Eigen Faces
  3. Fisher Face
All above-mentioned algorithms were introduced by the student Yingjie Liu during the GSoC 2017.
More information is given in Lui's GSoC 2017 status reports and his papers:
  1. Face Management improvements covering Eigen Faces and Fisher Face
  2. Work Report
  3. Added the possibility to manually sort the digiKam icon view but that was done in the GSoc 2018
  1. LBPH
  2. tba

code

All the low-level steps, initiating the entire workflow, what is to detect and recognize faces by algorithm training are initiated in the class root/core/libs/facesengine/recognitiondatabase.cpp

All the middle-level codes, the subsequent actions, are multi-threaded and chained, starting by the face scan dialogue .These are listed in the directory root/core/utilities/facemanagement for better visibility. In the past, this code was mainly written by Marcel Wiesweg.

database operation

Which kind of info is stored in the database?
This depends on the recognition algorithm used. Histograms, Vector, Binary data, all are responsible for algorithm computation, and of course, not each of them is compatible to each other. Thus it is necessary to clear the database, when you change the recognition algorithm in Face Scan dialogue. But in fact this kind of database mechanism must be dropped, when the openCV DNN algorithm will be finalized and remaines as the only one to do the job.

During the scan process, the following will be done (as described in root/core/utilities/facemanagement/README.FACE

  1. DETECTION - MARK IMAGE AS SCANNED, "IMAGE SCANNED"
    assign a tag to images, indicating whether they have been scanned or not.
    In this case, the scanned images are tagged with the "/Scanned/Scanned for Faces" tag.
    This is the most simple approach as avoid to code a new database table.
    Other jobs which need to "mark" images like this can create their own "/Scanned/<Name of job>" tag.

  2. DETECTION - MARK THAT AN FACE IS FOUND IN THE IMAGE , "/PEOPLE/UNKNOWN"-TAG
    Initially, when any face scan is run, tag the People tag is added to that image, plus the subtag for Unknown People, resulting in the following database entry "/People/Unknown".

  3. DETECTION - ADD FACE LOCATION TO DATABASE
    Subsequently, the position in the detected face is added as a property (with a key "faceRegion") to the corresponding core database entry of the image. The value of the property is the "region-rectangle" complying with the rectangle shape formalism of Scalable Vector Graphics (SVG)

  4. RECOGNITION - LINK FACE TO RECOGNITION DATABASE ENTRY Each image tagged with "/People/Unknown" is scanned, but only partial, as only the region defined in "region-rectangle" without a "face tag" are scanned.
    If a face is recognized in a region an ID is written to the "region-rectangle" property, corresponding to the ID of the face in the recognition database (e,g the corresponding histogram when LBPH is selected).

  5. RECOGNITION - MARK THAT A FACE IS RECOGNIZED , "/PEOPLE/UNCONFIRMED"-TAG
    As the algorithm does not tag the names fully automatic the "region-rectangle" is tagged with "/People/Unconfirmed"
    Moreover, the "region-rectangle" is unlinked from"/People/Unconfirmed". "/People/Unknown"

  6. RECOGNITION - CONFIRM FACES, "/PEOPLE/<PERSON NAME>"-TAG
    When the face is later identified by the user, the new tag "/People/<Person Name>" is assigned to the "region-rectangle". Moreover, the "region-rectangle" is unlinked from "/People/Unknown".
    In addition the <Person Name> is added as a keyword to the metadata of the image complying with the schema "/People/<Person Name>" in order to make it findable/filterable by means of metadata tags.


Since the metadata shall not be flooded by this process, anything in "/Scanned/..." is not shown in digiKam's GUI and ignored in any metadata related process.
Furthermore only confirmed faces are taken into account in metadata related processes.

Expected results of this GSoc 2019 project

DigiKam core already depends on OpenCV library to perform complex image processing. Moreover. the OpenCV >= 3.3 release provides a new OpenCV DNN (Deep Neural Network) module.

The goal now is to port the current digiKam core face recognition DNN extension to the new OpenCV API and write all unit tests to validate the algorithm usability, efficiency, and performance, while learning and recognizing faces automatically. The outcome shall be used instead of the other face recognition and may detection approaches, mentioned further above.

update: As OpenCV provides an integrated workflow of face detection and recognition the scope is extended to face detection. There is an excellent GSoC proposal made by Thanh-Trung Dinh, it will be published as soon as the proposal submittal period closes.

Project tasks

All relevant bug reports can be found in

requirements on the student(s)

This is a break-down fo the description of how to participate in the Summer of Code program with KDE.
Typically, the student must review all related Bugzilla entries given in the corresponding Bugzilla section of the project. If this project or the Bugzilla does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors. The student is expected to work autonomous technically-wise, so the answers to challenges will not be found independently of the support of the maintainer. This does not mean that the maintainers cannot be reached by the student. Guidance will be given at any time in any case but shall that be limited to occasional situations to allow the maintainers to follow up on their work.
Regardless of the above-mentioned channel of communication, the maintainers review and validate the code in their development branch bevor merging it to the master branch.

Besides coding, it is required to submit a technical proposal, wherein is to list :

  • the problematic,
  • the code outlining, being merged into the master branch
  • the tests
  • the overall project plan for this summer,
  • documentation to write (mostly in code), etc.