Digikam/GSoC2019/FacesManagementWorkflowImprovements: Difference between revisions

From KDE Community Wiki
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
Hello reader,
= Introduction =
We begin with the little story, explaining how all the digiKam face recognition related features became a GSoC project. All began in early 2018 as the thread [http://digikam.1695700.n4.nabble.com/digiKam-users-either-face-recognition-screen-is-buggy-or-I-still-don-t-understand-it-at-least-I-can-8-td4705248.html#a4705293 either face recognition screen is buggy or I still don't understand it - at least I can say that more convenient bulk change of face tags (no auto refresh/set faces via context menu) is neccessary] took off. Eventually if found its course in early 2019 what convinced the maintainer of digiKam to refurbish these features earlier than original considered.  
Hello reader, <br>
We begin with a little story, explaining how all the digiKam face recognition related features became a GSoC project. <br>
All began in early 2018 as the thread [http://digikam.1695700.n4.nabble.com/digiKam-users-either-face-recognition-screen-is-buggy-or-I-still-don-t-understand-it-at-least-I-can-8-td4705248.html#a4705293 either face recognition screen is buggy or I still don't understand it - at least I can say that more convenient bulk change of face tags (no auto refresh/set faces via context menu) is neccessary] took off. Eventually, it found its course in early 2019 what convinced the maintainer of digiKam to refurbish these features earlier than originally considered.  
The post what made this change was written on the [http://digikam.1695700.n4.nabble.com/digiKam-users-either-face-recognition-screen-is-buggy-or-I-still-don-t-understand-it-at-least-I-can-8-tp4705248p4707745.html 01.Feb.2019] and describes quite well what has to be polished and redesigned, respectively. If you read the post, you will notice that it content goes beyond the pure face management workflow.  
The post what made this change was written on the [http://digikam.1695700.n4.nabble.com/digiKam-users-either-face-recognition-screen-is-buggy-or-I-still-don-t-understand-it-at-least-I-can-8-tp4705248p4707745.html 01.Feb.2019] and describes quite well what has to be polished and redesigned, respectively. If you read the post, you will notice that it content goes beyond the pure face management workflow.  




= the overall face detection, recognition and management workflow =


{{Construction}}
Before this article goes into the details, an overall description of all involved parts is given in corresponding order.
<ol>
<li> the faces detection  <br>
It is a group of algorithms to analyse the content of images, identify the distinctive regions such as eyes, nose, mouth, etc. Most of them are OpenCV based, and work mostly fine in the background (excepted some technical issues with OpenGL cards acceleration used by OpenCV which introduce instability, but it's another challenge).
These algorithms generate region where a face can be found, typically a rectangle. These areas are written as digikam internal information in digiKams core database. That information will not be added to the metadata of the images yet as this happens during the face recognition workflow, what is explained further down.
<br> <br>
</li>
<li> the faces detection  <br>
This introduces the four different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non-tagged face from images, using previous face tags registered in the database. The algorithms are complex but explained in more details in the wiki page for the GSoC faces recognition project.
The 4 different methods are explained here in brief only, a more detailed description can be found in  [[Digikam/GSoC2019/AIFaceRecognition]]
<br> <br>


Before this article goes into the details, an overall description of all involved parts shall be given in precedence order.
<ol>
1 the faces detection:
<li>Deep Neural Network (DNN) [http://dlib.net/ml.html Dlib C++ Library] <br>
It is a group of algorithms to analyse the content of images, identify the distinctive regions such as eyes, nose, mouth, etc. Most of them are OpenCV based, and work mostly fine in the background (excepted some technical issues with OpenGL cards acceleration used by OpenCV which introduce instability, but it's another challenge).
DigiKam has already an experimental implementation of Neural Network to perform faces recognition what is rather proof of concept than a production-ready function.
These algorithms generate region where a face can be found, typically a rectangle. These areas are written as digikam internal information in digiKams core database. That information will not be added to the metadata of the images yet as this happens during the face recognition workflow, what is explained further down.
This DNN is based on the Dlib implementation in [http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html OpenFace project].  
2 The faces recognition:  
<br> <br>
This introduce 4 different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non tagged face from images, using a previous face tags registered in database. The algorithms are complex but explained in more details in the wiki page for the GSoC faces recognition project.  
</li>
The 4 different methods are explained here in brief only, a more detailed description can be found in  
<li> [https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html#local-binary-patterns-histograms OpenCV] - [https://en.wikipedia.org/wiki/Local_binary_patterns Local Binary Patterns Histograms] ([http://www.scholarpedia.org/article/Local_Binary_Patterns LBPH])<br>
2a Deep Neural Network DLib
This is the most complete implementation of a face detection algorithm. Moreover, it is the oldest implementation of such an algorithm in digiKam. It's not perfect and requires at least six faces already tagged manually by the user to identify the same faces in non-tagged images.
DigiKam has already an experimental implementation of Neural Network to perform faces recognition what is rather proof of concept than a production ready function.
<br> <br>
This DNN is based on DLib code, a low level library used by OpenFace project.  
</li>  
2b Deep Neural Network lbph :  
<li>[https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html#eigenfaces OpenCV] - [https://en.wikipedia.org/wiki/Eigenface Eigen Faces] <br>
This is the most complete implementation and the older one implemented in digiKam. It's not perfect and require at least 6 face already tagged manually by end user to identify the same person.  
An alternative algorithm what uses the OpenCV backend. It was introduced to have a different source of results for face detection, enabling to proof the DNN approaches.
2b Eigen Faces :
<br> <br>
Another algorithm what uses OpenCV backend. It was used to compare it with the DNN approaches.  
</li>
2c Fisher Faces :
<li> [https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html#fisherfaces OpenCV] - [http://www.scholarpedia.org/article/Fisherfaces Fisher Face] <br>
Another algorithm what uses OpenCV backend. It was used to compare it with the DNN approaches.  
Another algorithm what uses the OpenCV backend. It was introduced for the same purposes as Eigen Faces.  <br>
According to rumours, this one is not finalized, it is said that not all methods are implemented.
<br> <br>
</li>
</ol>


Why 4 kind of recognition algorithms
<li> The faces workflow <br>
To compare and choose the best one. It was concluded that DNN is the best method to recognize faces with as less error as possible. Unfortunately, the current implementation makes too much training necessary and it is disturbing slow.  
This is the actual subject of this article where the search for a student(s) for the GSoC 2019 is ongoing. There are not any complex algorithms involved here.  <br>
Therefore the GSoC project was born.  
That is where we switch from the backend, the digiKam core, to the frontend, the GUI. There are numberless posts what could be improved or what is missing. The goal is to answer all those to make the entire workflow flawless, allowing to be widely accepted and enjoyed by the users. This requires some significant effort to assist and guide the student(s) to achieve the desired outcome. As that is related to coding the maintainers would like to so the community to take over here, to take off workload from them and enabling us users to steer the process from a user perspective. The maintainer would only ensure the quality of the code. <br>
The overall face workflow will not change that much, the changes are mainly under the hood, as mentioned in the chapter above. The process is
<ol>
<li> Detect  </li>
<li> suggest faces </li>
<li> user confirms / correct  </li>
</ol>
but there are many ways to achieve this. That is the place where the hard work begins. The following section tries to give guidance to the entire retrofit process aiming at collecting, outlining and streamlining all suggestions to ensure consistency and intuitive face workflow. <br> <br>


6/ Which kind of info are stored in database?
I mentioned that, since as far as I know that content of person-related metadata fields are not taken into account when you search or filter a collection by certain keywords. Thus, in order to make the names findable by digiKam, the name has to be added to the keywords related metadata fields to make the magic happen.
This depend of recognition algorithm used. Histograms, Vector, Binary data, all responsible of algorithm computation, and of course all not compatible. Typically, when you change the recognition algorithm in Face Scan dialog, the database must be clear as well.
</li>
</ol>


3 The faces workflow :
= Participation =
This is the actual subject of this article where the search for a student(s) for the GSoC 2019 is ongoing. There is not any complex algorithms involved here. That is where we switch from the backend, the digiKam core, to the frontend, the GUI.
This is a break-down fo the description of how to [https://community.kde.org/GSoC participate in the Summer of Code program with KDE]. <br>
There are numberless post what could be improved or what is missing. The goal is to answer all those to make the entire workflow flawless, allowing to be widely accepted and enjoyed by the users. This requires some significant effort to assist and guide the student(s) to achieve the desired outcome. As that is related to coding the maintainers would like to so the community to take over here, to take off workload from them and enabling us users to steer the process from a user perspective. The maintainer would only ensure the quality of code.  
<ol>
The overall face workflow will not chance that much, the changes are mainly under the hood, as mentioned in the chapter above. The process is
<li> AS A MAINTAINER <br>
Detect,
As a maintainer, you are responsible to know the digiKam to source code and check pull requests of the student(s) before they are merged into the master.  
suggest faces,
In addition, you are the contact person for questions of the student(s) in regard to the code and ensure that the student(s)’s documentation is satisfactory.  
user confirms / correct
 
but there are many ways to achieve this. That is the place where to hard work begins. The following section tries to give guidance to the entire retrofit process aiming on collecting, outlining and streamlining all suggestions to ensure a consistence and intuitive face workflow.
<br> <br>
<li> A MENTORING USER <br>
If you wish you more than welcome to contact any of the current users, get you an account on kde and joining the discussion here, in the mailing list and begging contributing to this article. <br>
Volunteered users are: <br>
[mailto:stefan.mueller.83@gmail.com Stefan Müller (user coordinator)]


I mentioned that, since as far as I know that content of person related metadata fields are not taken into account when you search or filter a collection by certain keywords. Thus, in order to make the names findable by digiKam, the name has to be added to the keywords related metadata fields to make the magic happen.
<br> <br>
<li> AS A STUDENT  <br>
Typically, the student must review all related Bugzilla entries given in the corresponding Bugzilla section of the project. If this project or the Bugzilla does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors.  
The student is expected to work autonomous technically-wise, so the answers to challenges will not be found independently of the support of the maintainer. This does not mean that the maintainers cannot be reached by the student. Guidance will be given at any time in any case but shall that be limited to occasional situations to allow the maintainers to follow up on their work. <br>
Regardless of the above-mentioned channel of communication, the maintainers review and validate the code in their development branch bevor merging it to the master branch.


Participation
Besides coding, it is required to submit a technical proposal, wherein is to list :
As a maintainer
* the problematic,  
As a maintainer, you are responsible to know the digiKam to source code and check pull requests of the student(s) before they are merged into the master.
* the code outlining, being merged into the master branch
In addition, you are the contact person for questions of the student(s) in regard to the code and ensure that the student(s)’s documentation is satisfactory.  
* the tests
* the overall project plan for this summer,
* documentation to write (mostly in code), etc.
</ol>


As Student
=Project tasks=
Typically, the student must review all bugzilla entries that listed below and in separated subsection created by the maintainers, some of them were widley discusse in the mainling list. If this page does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors.  
All relevant bug reports can be found in  
Besides coding, it is required a technical proposal, where to list :
<ul>
the problematic,  
<li> [https://bugs.kde.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=CONFIRMED&bug_status=ASSIGNED&bug_status=REOPENED&component=Faces-Workflow&list_id=1595307&product=digikam  digikam Bug List - Component: Faces-Workflow Status: REPORTED, CONFIRMED, ASSIGNED, REOPENED] <br><br>
the code to patch,
but the recognition entries shall be also present to the student(s).
the coding tasks, the tests, the plan for this summer, and of course the documentation to write (mostly in code), etc.  
<li> [https://bugs.kde.org/buglist.cgi?bug_status=__open__&component=Faces-Recognition&list_id=1583144&product=digikam digikam Bug List - Component: Faces-Recognition Status: REPORTED, CONFIRMED, ASSIGNED, REOPENED]
while this summer, the student must analyze the code in details, identify the problems, and start to patch implementations. While these stage, he will ask Q about coding, and about functionalities. We must respond to both, and i will respond to code stuff in prior... The student must be autonomous technically. He must search responses by himself, and sometime, he can be blocked, an we must guide to the right direction. In all case, i review and validate the codes.. Important : students work in a separate development branch in git. It safe.
</ul>
As mentoring User
If you wish you more than welcome to contact any of the current users, get you an account on kde and joining the discussion here, in the mailing list and begging contributing to this article.
Volutnerd users are:
Stefan Müller (user coordinator)


Project task.
In the following is it tried to group them in major tasks, to give the students detailed guidance on how to close the bug reports
All relevant bug reports can be found in In the following is it tried to group them in major tasks, to give the students detailed guidance how to close the bug reports
- Separation between tags and faces
Many players in the media business, such as Adobe, use the expression tag for anything related to metadata others separating between the different types of metadata.
All metadata records are stored in fields (see e.g. photometadata.org) which also often called tags (of the metadata), so a tag is anything what is used in digiKam to filter or search for images, e.g. keywords, colour label, star rating etc... Thus there  is to much space for interpretation what leads to all these questions due to irritations caused by the use of the word tag.
In order to lower the entry hurdle into the world of tagging I would suggest to be consistent with the offical wording, thus new users won't be confused by this. That means that the text for the tag will be named keyword, so on the source selection pane on the left will be Keywords and in filter pane on the right it will say Keywords Filter. The description shall rather say close to digiKam deals with metadata, grouped in (tags of): keywords, label, date and location.


- Ensure that all relevant metadata fields are filled (by Stefan Müller)
{{construction}}
At the end, as soon as a name is confirmed digiKam writes the data to the MP and MWG namespace of the XMP records, it sets a name and area.
'''latest email converstation not refelected yet'''
<ol>
<li> SEPARATION BETWEEN TAGS AND FACES  (by [mailto:[email protected] Stefan Müller]) <br>
Many players in the media business, such as Adobe, use the expression tag for anything related to metadata others separating between the different types of metadata.  <br>
All metadata records are stored in fields (see e.g. photometadata.org) which also often called tags (of the metadata), so a tag is anything that is used in digiKam to filter or search for images, e.g. keywords, colour label, star rating etc... Thus there is to much space for interpretation what leads to all these questions due to irritations caused by the use of the word tag.
In order to lower the entry hurdle into the world of tagging I would suggest to be consistent with the official wording, thus new users won't be confused by this. That means that the text for the tag will be named keyword, so on the source selection pane on the left will be Keywords and in filter pane on the right, it will say Keywords Filter. The description shall rather say close to digital deals with metadata, grouped in (tags of): keywords, label, date and location.
<br><br>
<li> Ensure that all relevant metadata fields are filled (by [mailto:[email protected] Stefan Müller]) <br>
At the end, as soon as a name is confirmed digiKam writes the data to the MP and MWG namespace of the XMP records, it sets a name and area.<br>
More Details about those namespaces can be found here:
More Details about those namespaces can be found here:
Microsoft MP Tags (ExifTool)  
* [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Microsoft.html#MP) Microsoft MP Tags (ExifTool)]          
Microsoft MP Tags (Exif2.org)
* [http://www.exiv2.org/tags-xmp-MP.html)[Microsoft MP Tags (Exif2.org)]
MWG Regions Tags (ExifTool)  
* [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/MWG.html#Regions) MWG Regions Tags (ExifTool)]        
MWG Regions Tags (Exif2.org)
*[http://www.exiv2.org/tags-xmp-mwg-rs.html) MWG Regions Tags (Exif2.org)]
as Apple and Adobe write their information in the MWG namespace, it I would say that MWG is the leading namespace but inconsistent may lead to unexpected behaviour of the applications what reads them.  
 
In my understanding these information should also be written to the IPTC Person structure as mentioned in the IPTC Photo Metadata User Guide (Persons Depicted in the Image), but is not.  
as Apple and Adobe write their information in the MWG namespace, I would say that MWG is the leading namespace but inconsistent may lead to unexpected behaviour of the applications what reads them.  
It need to be clarified and documented why that does not happen any may corrected.
In my understanding, this information should also be written to the [https://iptc.org/std/photometadata/specification/IPTC-PhotoMetadata#person-structure IPTC Person structure] as mentioned in the [https://www.iptc.org/std/photometadata/documentation/userguide/index.htm#!Documents/personsdepictedintheimage.htm IPTC Photo Metadata User Guide (Persons Depicted in the Image)], but is not.  
It needs to be clarified and documented why that does not happen and may be corrected.
Link face region with face name properly
Link face region with face name properly
In order to make images findable by a person's name, the name shall also be written to the keywords field of multiple namespaces, IPTC Photo Metadata User Guide (Persons Depicted in the Image) recommends caption and keywords. I cannot tell all relevant fields/namespaces. My research tells me that should be at least those:  
In order to make images findable by a person's name, the name shall also be written to the keywords field of multiple namespaces, [https://www.iptc.org/std/photometadata/documentation/userguide/index.htm#!Documents/personsdepictedintheimage.htm IPTC Photo Metadata User Guide (Persons Depicted in the Image)] recommends caption and keywords. I cannot tell all relevant fields/namespaces. My research tells me that should be at least those:  
IPTC ⇒ Keywords, see: iptc.org
* IPTC ⇒ Keywords, see: [https://iptc.org/std/photometadata/specification/IPTC-PhotoMetadata#keywords iptc.org]
XMP acdsee (ACD Systems) Tags ⇒ catergories, see: ExifTool / Exif2.org
* XMP acdsee ([https://www.acdsystems.com/ ACD Systems]) Tags ⇒ catergories, see: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#acdsee ExifTool] / [http://www.exiv2.org/tags-xmp-acdsee.html Exif2.org]
XMP digiKam Tags ⇒ Tag List, see: ExifTool / Exif2.org
* XMP digiKam Tags ⇒ Tag List, see: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#digiKam ExifTool] / [http://www.exiv2.org/tags-xmp-digiKam.html Exif2.org]
XMP dc (Dublin Core) Tags ⇒ Subject, see: ExifTool / Exif2.org
* XMP dc ([http://dublincore.org/ Dublin Core]) Tags ⇒ Subject, see: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#dc ExifTool] / [http://www.exiv2.org/tags-xmp-dc.html Exif2.org]
XMP lr (Lightroom) Tags ⇒ hierarchicalSubject, see: ExifTool / Exif2.org
* XMP lr (Lightroom) Tags ⇒ hierarchicalSubject, see: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#Lightroom ExifTool] / [http://www.exiv2.org/tags-xmp-lr.html Exif2.org]
XMP MediaPro Tags ⇒ Catalog Sets, see: ExifTool / Exif2.org
* XMP [https://www.captureone.com/en/ MediaPro](MediaPro)Tags ⇒ Catalog Sets, see: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#MediaPro ExifTool] / [http://www.exiv2.org/tags-xmp-mediapro.html Exif2.org]
Microsoft XMP Tags ⇒Las Keyword XMP, see: ExifTool / Exif2.org
* [https://docs.microsoft.com/en-us/windows/desktop/wic/-wic-codec-metadatahandlers Microsoft XMP] Tags ⇒ LastKeywordXMP, see: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Microsoft.html#XMP ExifTool] / [http://www.exiv2.org/tags-xmp-MP.html Exif2.org]
MWG Keywords Tags: ExifTool / Exif2.org
* [metadataworkinggroup.com MWG] Keywords Tags: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/MWG.html#Keywords ExifTool] / [http://www.exiv2.org/tags-xmp-mwg-kw.html Exif2.org]
has a field but ignored by digiKam, why?
• XMP prism Tags ⇒ Keyword: ExifTool
but to be exclude are:
• XMP xmp Tags, as it says non-standard: ExifTool
• XMP xmpMM Tags as it says undocumented: ExifTool
• XMP pdf Tags: ExifTool-> only for Adobe PDF
I reckon there isn't any leading field as a mismatch could lead to inconsistent search result, depending heavily on the used application.
It need to be clarified and documented why that does not happen any may corrected.
 


The following has a field but ignored by digiKam, why?
* XMP [http://www.prismstandard.org/ prism] Tags ⇒ Keyword: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#prism ExifTool]
but to be excluded shall:
*XMP [https://www.adobe.com/devnet/xmp.html xmp] Tags, as it says ''non-standard'': [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#xmp ExifTool]
*XMP xmpMM Tags as it says ''undocumented'': [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#xmpMM ExifTool]
*XMP pdf Tags: [https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html#pdf ExifTool] -> only for Adobe PDF


I reckon there isn't any leading field as a mismatch could lead to an inconsistent search result, depending heavily on the application being used.
It needs to be clarified and documented why that does not happen any may be corrected.


</ol>




Line 104: Line 141:
2. Second, digiKam reads the information given in the person related fields of the metadata of each image in this particular case . Afterwards it uses this data to populate the person pane. That would be quite of workload on the CPU and isn't very likely.
2. Second, digiKam reads the information given in the person related fields of the metadata of each image in this particular case . Afterwards it uses this data to populate the person pane. That would be quite of workload on the CPU and isn't very likely.
3. Third, it stores the information given in person related fields of the metadata of each image in the database recognition.db. Based on the information stored there digiKam knows which images are to be shown. In this case, are the face thumbnails are stored in this database as well or are they derived from each image, based on the region information?
3. Third, it stores the information given in person related fields of the metadata of each image in the database recognition.db. Based on the information stored there digiKam knows which images are to be shown. In this case, are the face thumbnails are stored in this database as well or are they derived from each image, based on the region information?
In addition I would like to see some changes in regard to the unkown faces thumbnails.
Those wishes are most likely discussed in other bug reports. For convenience I listed those created by woenx and mine again.
As you see most wishes are still unresolved and mine will mostly a duplicate of presents ones I'll list them anyway in order to highlight their necessity.
I would like to be able to
#    stop auto refresh of the thumbnails to avoid confirming a wrong face accidentality. It is a pain in the arse to undo such accidents
#    sort them at least by guessed faced.
#    It would be preferred if sorting in any view is possible by any property what can be used to filter items
#    drag and drop selected faces over an Person Name
#    assign Person Name via right click menu as possible for tags
#    group similar faces in "Unknown" faces
<ol>
<li> Bugs
  <ol>
  <li> [https://bugs.kde.org/show_bug.cgi?id=392013  <del>Bug 392013</del>]: Metadata explorer does not show XMP face rectangles
  <li> [https://bugs.kde.org/show_bug.cgi?id=392017  <del>Bug 392017</del>]: Merging, renaming and removing face tags
  <li> [https://bugs.kde.org/show_bug.cgi?id=392009  <del>Bug 392009</del>]: Weird automatic subtag within "Unknown people" called "da"
  <li> [https://bugs.kde.org/show_bug.cgi?id=392008 Bug 392008]: Inconsistent behaviour of "People" Tag
  </ol>
<li> Wishes
  <ol>
  <li> [https://bugs.kde.org/show_bug.cgi?id=275671 <del>Wish 275671</del>]: Scan single image for faces
  <li> [https://bugs.kde.org/show_bug.cgi?id=392015 Wish 392015]: Show "Unknown" faces in a more visible and preeminent place in the "People" list
  <li> [https://bugs.kde.org/show_bug.cgi?id=392007 Wish 392007]: Face tags and regular tags are mixed together and cannot be told apart
  <li> [https://bugs.kde.org/show_bug.cgi?id=392016 Wish 392016]: Confirmed and unconfirmed faces look the same in a person's face list
  <li> [https://bugs.kde.org/show_bug.cgi?id=392020 Wish 392020]: No possible way of knowing which pictures within a regular tag have been face-tagged
  <li> [https://bugs.kde.org/show_bug.cgi?id=392022 Wish 392022]: Position of a face tag appears on top or bottom of the list, instead of being sorted alphabetically
  <li> [https://bugs.kde.org/show_bug.cgi?id=392023 Wish 392023]: Feature request: add "Ignored" group of faces:
  <li> [https://bugs.kde.org/show_bug.cgi?id=392024 Wish 392024]: Feature request: group similar faces in "Unknown" faces  <br>
                      ⇒ [https://bugs.kde.org/show_bug.cgi?id=384396 Wish 384396]: Wish: display faces sorted by similarity (pre-grouped) instead of album/time/..
  <li> [https://bugs.kde.org/show_bug.cgi?id=386291 Wish 386291]: only refresh found face list/pane upon user request
  <li> [https://bugs.kde.org/show_bug.cgi?id=254099 Wish 254099]: SCAN : refresh collection with a script in commandline
    </ol>
</ol>

Latest revision as of 14:44, 10 March 2019

Introduction

Hello reader,
We begin with a little story, explaining how all the digiKam face recognition related features became a GSoC project.
All began in early 2018 as the thread either face recognition screen is buggy or I still don't understand it - at least I can say that more convenient bulk change of face tags (no auto refresh/set faces via context menu) is neccessary took off. Eventually, it found its course in early 2019 what convinced the maintainer of digiKam to refurbish these features earlier than originally considered. The post what made this change was written on the 01.Feb.2019 and describes quite well what has to be polished and redesigned, respectively. If you read the post, you will notice that it content goes beyond the pure face management workflow.


the overall face detection, recognition and management workflow

Before this article goes into the details, an overall description of all involved parts is given in corresponding order.

  1. the faces detection
    It is a group of algorithms to analyse the content of images, identify the distinctive regions such as eyes, nose, mouth, etc. Most of them are OpenCV based, and work mostly fine in the background (excepted some technical issues with OpenGL cards acceleration used by OpenCV which introduce instability, but it's another challenge). These algorithms generate region where a face can be found, typically a rectangle. These areas are written as digikam internal information in digiKams core database. That information will not be added to the metadata of the images yet as this happens during the face recognition workflow, what is explained further down.

  2. the faces detection
    This introduces the four different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non-tagged face from images, using previous face tags registered in the database. The algorithms are complex but explained in more details in the wiki page for the GSoC faces recognition project. The 4 different methods are explained here in brief only, a more detailed description can be found in Digikam/GSoC2019/AIFaceRecognition

    1. Deep Neural Network (DNN) Dlib C++ Library
      DigiKam has already an experimental implementation of Neural Network to perform faces recognition what is rather proof of concept than a production-ready function. This DNN is based on the Dlib implementation in OpenFace project.

    2. OpenCV - Local Binary Patterns Histograms (LBPH)
      This is the most complete implementation of a face detection algorithm. Moreover, it is the oldest implementation of such an algorithm in digiKam. It's not perfect and requires at least six faces already tagged manually by the user to identify the same faces in non-tagged images.

    3. OpenCV - Eigen Faces
      An alternative algorithm what uses the OpenCV backend. It was introduced to have a different source of results for face detection, enabling to proof the DNN approaches.

    4. OpenCV - Fisher Face
      Another algorithm what uses the OpenCV backend. It was introduced for the same purposes as Eigen Faces.
      According to rumours, this one is not finalized, it is said that not all methods are implemented.

  3. The faces workflow
    This is the actual subject of this article where the search for a student(s) for the GSoC 2019 is ongoing. There are not any complex algorithms involved here.
    That is where we switch from the backend, the digiKam core, to the frontend, the GUI. There are numberless posts what could be improved or what is missing. The goal is to answer all those to make the entire workflow flawless, allowing to be widely accepted and enjoyed by the users. This requires some significant effort to assist and guide the student(s) to achieve the desired outcome. As that is related to coding the maintainers would like to so the community to take over here, to take off workload from them and enabling us users to steer the process from a user perspective. The maintainer would only ensure the quality of the code.
    The overall face workflow will not change that much, the changes are mainly under the hood, as mentioned in the chapter above. The process is
    1. Detect
    2. suggest faces
    3. user confirms / correct

    but there are many ways to achieve this. That is the place where the hard work begins. The following section tries to give guidance to the entire retrofit process aiming at collecting, outlining and streamlining all suggestions to ensure consistency and intuitive face workflow.

    I mentioned that, since as far as I know that content of person-related metadata fields are not taken into account when you search or filter a collection by certain keywords. Thus, in order to make the names findable by digiKam, the name has to be added to the keywords related metadata fields to make the magic happen.

Participation

This is a break-down fo the description of how to participate in the Summer of Code program with KDE.

  1. AS A MAINTAINER
    As a maintainer, you are responsible to know the digiKam to source code and check pull requests of the student(s) before they are merged into the master. In addition, you are the contact person for questions of the student(s) in regard to the code and ensure that the student(s)’s documentation is satisfactory.

  2. A MENTORING USER
    If you wish you more than welcome to contact any of the current users, get you an account on kde and joining the discussion here, in the mailing list and begging contributing to this article.
    Volunteered users are:
    Stefan Müller (user coordinator)

  3. AS A STUDENT
    Typically, the student must review all related Bugzilla entries given in the corresponding Bugzilla section of the project. If this project or the Bugzilla does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors. The student is expected to work autonomous technically-wise, so the answers to challenges will not be found independently of the support of the maintainer. This does not mean that the maintainers cannot be reached by the student. Guidance will be given at any time in any case but shall that be limited to occasional situations to allow the maintainers to follow up on their work.
    Regardless of the above-mentioned channel of communication, the maintainers review and validate the code in their development branch bevor merging it to the master branch. Besides coding, it is required to submit a technical proposal, wherein is to list :
    • the problematic,
    • the code outlining, being merged into the master branch
    • the tests
    • the overall project plan for this summer,
    • documentation to write (mostly in code), etc.

Project tasks

All relevant bug reports can be found in

In the following is it tried to group them in major tasks, to give the students detailed guidance on how to close the bug reports

 
Under Construction
This is a new page, currently under construction!

latest email converstation not refelected yet

  1. SEPARATION BETWEEN TAGS AND FACES (by Stefan Müller)
    Many players in the media business, such as Adobe, use the expression tag for anything related to metadata others separating between the different types of metadata.
    All metadata records are stored in fields (see e.g. photometadata.org) which also often called tags (of the metadata), so a tag is anything that is used in digiKam to filter or search for images, e.g. keywords, colour label, star rating etc... Thus there is to much space for interpretation what leads to all these questions due to irritations caused by the use of the word tag. In order to lower the entry hurdle into the world of tagging I would suggest to be consistent with the official wording, thus new users won't be confused by this. That means that the text for the tag will be named keyword, so on the source selection pane on the left will be Keywords and in filter pane on the right, it will say Keywords Filter. The description shall rather say close to digital deals with metadata, grouped in (tags of): keywords, label, date and location.

  2. Ensure that all relevant metadata fields are filled (by Stefan Müller)
    At the end, as soon as a name is confirmed digiKam writes the data to the MP and MWG namespace of the XMP records, it sets a name and area.
    More Details about those namespaces can be found here: as Apple and Adobe write their information in the MWG namespace, I would say that MWG is the leading namespace but inconsistent may lead to unexpected behaviour of the applications what reads them. In my understanding, this information should also be written to the IPTC Person structure as mentioned in the IPTC Photo Metadata User Guide (Persons Depicted in the Image), but is not. It needs to be clarified and documented why that does not happen and may be corrected. Link face region with face name properly In order to make images findable by a person's name, the name shall also be written to the keywords field of multiple namespaces, IPTC Photo Metadata User Guide (Persons Depicted in the Image) recommends caption and keywords. I cannot tell all relevant fields/namespaces. My research tells me that should be at least those: The following has a field but ignored by digiKam, why? but to be excluded shall:
    • XMP xmp Tags, as it says non-standard: ExifTool
    • XMP xmpMM Tags as it says undocumented: ExifTool
    • XMP pdf Tags: ExifTool -> only for Adobe PDF
    I reckon there isn't any leading field as a mismatch could lead to an inconsistent search result, depending heavily on the application being used. It needs to be clarified and documented why that does not happen any may be corrected.


If I'm correct, what is the source of the list of the people pane on the left? In my opinion there are three options. 1. First, these are the keywords listed below the hierarchy level persons in the keyword list. If the user selects an name it filters images based on the keywords and shows the face area as described in the person related metadata field. 2. Second, digiKam reads the information given in the person related fields of the metadata of each image in this particular case . Afterwards it uses this data to populate the person pane. That would be quite of workload on the CPU and isn't very likely. 3. Third, it stores the information given in person related fields of the metadata of each image in the database recognition.db. Based on the information stored there digiKam knows which images are to be shown. In this case, are the face thumbnails are stored in this database as well or are they derived from each image, based on the region information?


In addition I would like to see some changes in regard to the unkown faces thumbnails. Those wishes are most likely discussed in other bug reports. For convenience I listed those created by woenx and mine again. As you see most wishes are still unresolved and mine will mostly a duplicate of presents ones I'll list them anyway in order to highlight their necessity. I would like to be able to

  1. stop auto refresh of the thumbnails to avoid confirming a wrong face accidentality. It is a pain in the arse to undo such accidents
  2. sort them at least by guessed faced.
  3. It would be preferred if sorting in any view is possible by any property what can be used to filter items
  4. drag and drop selected faces over an Person Name
  5. assign Person Name via right click menu as possible for tags
  6. group similar faces in "Unknown" faces


  1. Bugs
    1. Bug 392013: Metadata explorer does not show XMP face rectangles
    2. Bug 392017: Merging, renaming and removing face tags
    3. Bug 392009: Weird automatic subtag within "Unknown people" called "da"
    4. Bug 392008: Inconsistent behaviour of "People" Tag
  2. Wishes
    1. Wish 275671: Scan single image for faces
    2. Wish 392015: Show "Unknown" faces in a more visible and preeminent place in the "People" list
    3. Wish 392007: Face tags and regular tags are mixed together and cannot be told apart
    4. Wish 392016: Confirmed and unconfirmed faces look the same in a person's face list
    5. Wish 392020: No possible way of knowing which pictures within a regular tag have been face-tagged
    6. Wish 392022: Position of a face tag appears on top or bottom of the list, instead of being sorted alphabetically
    7. Wish 392023: Feature request: add "Ignored" group of faces:
    8. Wish 392024: Feature request: group similar faces in "Unknown" faces
      Wish 384396: Wish: display faces sorted by similarity (pre-grouped) instead of album/time/..
    9. Wish 386291: only refresh found face list/pane upon user request
    10. Wish 254099: SCAN : refresh collection with a script in commandline