Digikam/GSoC2019/FacesManagementWorkflowImprovements: Difference between revisions

From KDE Community Wiki
No edit summary
No edit summary
Line 7: Line 7:
{{Construction}}  
{{Construction}}  


Hello
Before this article goes into the details, an overall description of all involved parts shall be given in precedence order.
1
1 the faces detection:
until when do you need let's say a dateiled specifcation of how the job shall be done?
It is a group of algorithms to analyse the content of images, identify the distinctive regions such as eyes, nose, mouth, etc. Most of them are OpenCV based, and work mostly fine in the background (excepted some technical issues with OpenGL cards acceleration used by OpenCV which introduce instability, but it's another challenge).
These algorithms generate region where a face can be found, typically a rectangle. These areas are written as digikam internal information in digiKams core database. That information will not be added to the metadata of the images yet as this happens during the face recognition workflow, what is explained further down.
2 The faces recognition:
This introduce 4 different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non tagged face from images, using a previous face tags registered in database. The algorithms are complex but explained in more details in the wiki page for the GSoC faces recognition project.
The 4 different methods are explained here in brief only, a more detailed description can be found in 
2a Deep Neural Network DLib
DigiKam has already an experimental implementation of Neural Network to perform faces recognition what is rather proof of concept than a production ready function.
This DNN is based on DLib code, a low level library used by OpenFace project.
2b Deep Neural Network  lbph :
This is the most complete implementation and the older one implemented in digiKam. It's not perfect and require at least 6 face already tagged manually by end user to identify the same person.
2b  Eigen Faces :
Another algorithm what uses OpenCV backend. It was used to compare it with the DNN approaches.
2c Fisher Faces :
Another algorithm what uses OpenCV backend. It was used to compare it with the DNN approaches.


2
Why 4 kind of recognition algorithms
It took me not little hours to get that email done, across two or three weeks, whenever I could spare some time.  
To compare and choose the best one. It was concluded that DNN is the best method to recognize faces with as less error as possible. Unfortunately, the current implementation makes too much training necessary and it is disturbing slow.  
So I would suggest that I email the others who involved themselves in this matter this and last year and try to start to get the main goals on paper, properly in group on pixl.us (as I don't know a better tool what has been recently dicussed).
Therefore the GSoC project was born.  


3
6/ Which kind of info are stored in database?
I would volunteer to get the current documentation about database and face management updated but I need to know what you think about my wording suggestions as I would have diffculites to stick to the word tag (I know is little attempt at blackmail 😋).
This depend of recognition algorithm used. Histograms, Vector, Binary data, all responsible of algorithm computation, and of course all not compatible. Typically, when you change the recognition algorithm in Face Scan dialog, the database must be clear as well.


4
3 The faces workflow :
You mentioned in GSoC 2019 task descritpion, that it shall focus on face management not face detection. Can this really well sepaerated. I afraid that there will be some overlapping, how to deal with that?
This is the actual subject of this article where the search for a student(s) for the GSoC 2019 is ongoing. There is not any complex algorithms involved here. That is where we switch from the backend, the digiKam core, to the frontend, the GUI.
There are numberless post what could be improved or what is missing. The goal is to answer all those to make the entire workflow flawless, allowing to be widely accepted and enjoyed by the users. This requires some significant effort to assist and guide the student(s) to achieve the desired outcome. As that is related to coding the maintainers would like to so the community to take over here, to take off workload from them and enabling us users to steer the process from a user perspective. The maintainer would only ensure the quality of code.
The overall face workflow will not chance that much, the changes are mainly under the hood, as mentioned in the chapter above. The process is
Detect,
suggest faces,
user confirms / correct
but there are many ways to achieve this. That is the place where to hard work begins. The following section tries to give guidance to the entire retrofit process aiming on collecting, outlining and streamlining all suggestions to ensure a consistence and intuitive face workflow.


FEB 4
I mentioned that, since as far as I know that content of person related metadata fields are not taken into account when you search or filter a collection by certain keywords. Thus, in order to make the names findable by digiKam, the name has to be added to the keywords related metadata fields to make the magic happen.
Gilles Caulier sent the following messages at 9:12 AM
View Gilles’ profile
Gilles Caulier
Gilles Caulier  9:12 AM
I will respond not in your order :


4/ The DK faces management is separated in these parts :
Participation
As a maintainer
As a maintainer, you are responsible to know the digiKam to source code and check pull requests of the student(s) before they are merged into the master.
In addition, you are the contact person for questions of the student(s) in regard to the code and ensure that the student(s)’s documentation is satisfactory.


- The faces detection : It's group algorithms to analyze images contents, identify the interest regions (eyes, nose, mouth, etc). Most of then are OpenCV based, and work mostly fine in background (excepted some technical issues with OpenGL cards acceleration used by OpenCV which introduce instability, but it's another Pb). These algorithms generate region where face can be found, typically a rectangle. This area is linked later in database as a "face tag".
As Student
- The faces database : this is a separated engine dedicated to store faces information : areas, names, and histograms. There are more information, but the most important are there. These info are stored in a separated tables/files.
Typically, the student must review all bugzilla entries that listed below and in separated subsection created by the maintainers, some of them were widley discusse in the mainling list. If this page does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors.
- The faces recognition : This introduce 4 different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non tagged face from images, using a previous face tags registered in database. The algorithms are complex, and one is based on neural network, but it still experimental and not optimized. We have a GoSC project for this summer about this topic, but here again it's another stuff.
Besides coding, it is required a technical proposal, where to list :
- The faces workflow : this is the subject where we want to found a student for this summer, and i want to share the mentoring we somebody who know well DK in user space. This group all action, widget, methods given to the user to manage face tags by a non automatized way : rename, move, delete, group, register face in database, etc... There is no algorithms here, only the GUI.
the problematic,  
the code to patch,
the coding tasks, the tests, the plan for this summer, and of course the documentation to write (mostly in code), etc.  
while this summer, the student must analyze the code in details, identify the problems, and start to patch implementations. While these stage, he will ask Q about coding, and about functionalities. We must respond to both, and i will respond to code stuff in prior... The student must be autonomous technically. He must search responses by himself, and sometime, he can be blocked, an we must guide to the right direction. In all case, i review and validate the codes.. Important : students work in a separate development branch in git. It safe.  
As mentoring User
If you wish you more than welcome to contact any of the current users, get you an account on kde and joining the discussion here, in the mailing list and begging contributing to this article.  
Volutnerd users are:
Stefan Müller (user coordinator)


View Gilles’ profile
Project task.
Gilles Caulier
All relevant bug reports can be found in In the following is it tried to group them in major tasks, to give the students detailed guidance how to close the bug reports
Gilles Caulier  9:18 AM
- Separation between tags and faces
4/ (again) : the idea to mentoring a student about Face workflow is to separate 2 topics : the coding space, and the GUI/test/review. I propose to manage the first and to delegate the second.
Many players in the media business, such as Adobe, use the expression tag for anything related to metadata others separating between the different types of metadata.  
Typically, the student must review all bugzilla entries that i separated in a subsection. You know some of these (-:=))...
All metadata records are stored in fields (see e.g. photometadata.org) which also often called tags (of the metadata), so a tag is anything what is used in digiKam to filter or search for images, e.g. keywords, colour label, star rating etc... Thus there  is to much space for interpretation what leads to all these questions due to irritations caused by the use of the word tag.
He must identify the top level entries to fix, with the tips from the mentors of course. He must write a technical proposal, where he must list : the problematic, the code to patch, the coding tasks, the tests, the plan for this summer, and of course the documentation to write (mostly in code), etc.
In order to lower the entry hurdle into the world of tagging I would suggest to be consistent with the offical wording, thus new users won't be confused by this. That means that the text for the tag will be named keyword, so on the source selection pane on the left will be Keywords and in filter pane on the right it will say Keywords Filter. The description shall rather say close to digiKam deals with metadata, grouped in (tags of): keywords, label, date and location.


View Gilles’ profile
- Ensure that all relevant metadata fields are filled (by Stefan Müller)
Gilles Caulier
At the end, as soon as a name is confirmed digiKam writes the data to the MP and MWG namespace of the XMP records, it sets a name and area.
Gilles Caulier  9:23 AM
More Details about those namespaces can be found here:
4/ (again 2) : while this summer, the student must analyze the code in details, identify the problems, and start to patch implementations. While these stage, he will ask Q about coding, and about functionalities. We must respond to both, and i will respond to code stuff in prior...  
• Microsoft MP Tags (ExifTool)
The student must be autonomous technically. He must search responses by himself, and sometime, he can be blocked, an we must guide to the right direction. In all case, i review and validate the codes..
• Microsoft MP Tags (Exif2.org)
Important : students work in a separate development branch in git. It safe.
• MWG Regions Tags (ExifTool)
• MWG Regions Tags (Exif2.org)
as Apple and Adobe write their information in the MWG namespace, it I would say that MWG is the leading namespace but inconsistent may lead to unexpected behaviour of the applications what reads them.
In my understanding these information should also be written to the IPTC Person structure as mentioned in the IPTC Photo Metadata User Guide (Persons Depicted in the Image), but is not.
It need to be clarified and documented why that does not happen any may corrected.
Link face region with face name properly
In order to make images findable by a person's name, the name shall also be written to the keywords field of multiple namespaces, IPTC Photo Metadata User Guide (Persons Depicted in the Image) recommends caption and keywords. I cannot tell all relevant fields/namespaces. My research tells me that should be at least those:
• IPTC ⇒ Keywords, see: iptc.org
• XMP acdsee (ACD Systems) Tags ⇒ catergories, see: ExifTool / Exif2.org
• XMP digiKam Tags ⇒ Tag List, see: ExifTool / Exif2.org
• XMP dc (Dublin Core) Tags ⇒ Subject, see: ExifTool / Exif2.org
• XMP lr (Lightroom) Tags ⇒ hierarchicalSubject, see: ExifTool / Exif2.org
• XMP MediaPro Tags ⇒ Catalog Sets, see: ExifTool / Exif2.org
• Microsoft XMP Tags ⇒Las Keyword XMP, see: ExifTool / Exif2.org
• MWG Keywords Tags: ExifTool / Exif2.org
has a field but ignored by digiKam, why?
• XMP prism Tags ⇒ Keyword: ExifTool
but to be exclude are:
• XMP xmp Tags, as it says non-standard: ExifTool
• XMP xmpMM Tags as it says undocumented: ExifTool
• XMP pdf Tags: ExifTool-> only for Adobe PDF
I reckon there isn't any leading field as a mismatch could lead to inconsistent search result, depending heavily on the used application.
It need to be clarified and documented why that does not happen any may corrected.


View Gilles’ profile
Gilles Caulier
Gilles Caulier  9:24 AM
1/ => se the response given for 4/, more and less...


View Gilles’ profile
Gilles Caulier
Gilles Caulier  9:29 AM
2/ if you talk about about the process to work together with a student, there are plenty of tools :
- a wiki page that the student and us we will use to list all technical point for the project.
- the [email protected] mailing list where student can ask Q, and where team respond as a community. We can use only this channel only when students are selected, not before, as students are in competition.
- The private mail work well, but something the thread can be broken if somebody forget to respond to all people. So the mailing list is better, but we must use this way before the student final selection.


View Gilles’ profile
Gilles Caulier
Gilles Caulier  9:31 AM
3/ sure, the documentation about database already exists and need to be updated, but i don't understand the "word tag" problematic. I forget something ?


Stefan Müller sent the following messages at 9:17 PM
View Stefan’s profile
Stefan Müller
Stefan Müller  9:17 PM
👍


Face workflow is to separate 2 topics : the coding space, and the GUI/test/review. I propose to manage the first and to delegate the second.👍
 
If I'm correct, what is the source of the list of the people pane on the left? In my opinion there are three options.
1. First, these are the keywords listed below the hierarchy level persons in the keyword list. If the user selects an name it filters images based on the keywords and shows the face area as described in the person related metadata field.
2. Second, digiKam reads the information given in the person related fields of the metadata of each image in this particular case . Afterwards it uses this data to populate the person pane. That would be quite of workload on the CPU and isn't very likely.
3. Third, it stores the information given in person related fields of the metadata of each image in the database recognition.db. Based on the information stored there digiKam knows which images are to be shown. In this case, are the face thumbnails are stored in this database as well or are they derived from each image, based on the region information?

Revision as of 20:52, 3 March 2019

Hello reader, We begin with the little story, explaining how all the digiKam face recognition related features became a GSoC project. All began in early 2018 as the thread either face recognition screen is buggy or I still don't understand it - at least I can say that more convenient bulk change of face tags (no auto refresh/set faces via context menu) is neccessary took off. Eventually if found its course in early 2019 what convinced the maintainer of digiKam to refurbish these features earlier than original considered. The post what made this change was written on the 01.Feb.2019 and describes quite well what has to be polished and redesigned, respectively. If you read the post, you will notice that it content goes beyond the pure face management workflow.


 
Under Construction
This is a new page, currently under construction!


Before this article goes into the details, an overall description of all involved parts shall be given in precedence order. 1 the faces detection: It is a group of algorithms to analyse the content of images, identify the distinctive regions such as eyes, nose, mouth, etc. Most of them are OpenCV based, and work mostly fine in the background (excepted some technical issues with OpenGL cards acceleration used by OpenCV which introduce instability, but it's another challenge). These algorithms generate region where a face can be found, typically a rectangle. These areas are written as digikam internal information in digiKams core database. That information will not be added to the metadata of the images yet as this happens during the face recognition workflow, what is explained further down. 2 The faces recognition: This introduce 4 different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non tagged face from images, using a previous face tags registered in database. The algorithms are complex but explained in more details in the wiki page for the GSoC faces recognition project. The 4 different methods are explained here in brief only, a more detailed description can be found in 2a Deep Neural Network DLib DigiKam has already an experimental implementation of Neural Network to perform faces recognition what is rather proof of concept than a production ready function. This DNN is based on DLib code, a low level library used by OpenFace project. 2b Deep Neural Network lbph : This is the most complete implementation and the older one implemented in digiKam. It's not perfect and require at least 6 face already tagged manually by end user to identify the same person. 2b Eigen Faces : Another algorithm what uses OpenCV backend. It was used to compare it with the DNN approaches. 2c Fisher Faces : Another algorithm what uses OpenCV backend. It was used to compare it with the DNN approaches.

Why 4 kind of recognition algorithms To compare and choose the best one. It was concluded that DNN is the best method to recognize faces with as less error as possible. Unfortunately, the current implementation makes too much training necessary and it is disturbing slow. Therefore the GSoC project was born.

6/ Which kind of info are stored in database? This depend of recognition algorithm used. Histograms, Vector, Binary data, all responsible of algorithm computation, and of course all not compatible. Typically, when you change the recognition algorithm in Face Scan dialog, the database must be clear as well.

3 The faces workflow : This is the actual subject of this article where the search for a student(s) for the GSoC 2019 is ongoing. There is not any complex algorithms involved here. That is where we switch from the backend, the digiKam core, to the frontend, the GUI. There are numberless post what could be improved or what is missing. The goal is to answer all those to make the entire workflow flawless, allowing to be widely accepted and enjoyed by the users. This requires some significant effort to assist and guide the student(s) to achieve the desired outcome. As that is related to coding the maintainers would like to so the community to take over here, to take off workload from them and enabling us users to steer the process from a user perspective. The maintainer would only ensure the quality of code. The overall face workflow will not chance that much, the changes are mainly under the hood, as mentioned in the chapter above. The process is Detect, suggest faces, user confirms / correct but there are many ways to achieve this. That is the place where to hard work begins. The following section tries to give guidance to the entire retrofit process aiming on collecting, outlining and streamlining all suggestions to ensure a consistence and intuitive face workflow.

I mentioned that, since as far as I know that content of person related metadata fields are not taken into account when you search or filter a collection by certain keywords. Thus, in order to make the names findable by digiKam, the name has to be added to the keywords related metadata fields to make the magic happen.

Participation As a maintainer As a maintainer, you are responsible to know the digiKam to source code and check pull requests of the student(s) before they are merged into the master. In addition, you are the contact person for questions of the student(s) in regard to the code and ensure that the student(s)’s documentation is satisfactory.

As Student Typically, the student must review all bugzilla entries that listed below and in separated subsection created by the maintainers, some of them were widley discusse in the mainling list. If this page does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors. Besides coding, it is required a technical proposal, where to list : the problematic, the code to patch, the coding tasks, the tests, the plan for this summer, and of course the documentation to write (mostly in code), etc. while this summer, the student must analyze the code in details, identify the problems, and start to patch implementations. While these stage, he will ask Q about coding, and about functionalities. We must respond to both, and i will respond to code stuff in prior... The student must be autonomous technically. He must search responses by himself, and sometime, he can be blocked, an we must guide to the right direction. In all case, i review and validate the codes.. Important : students work in a separate development branch in git. It safe. As mentoring User If you wish you more than welcome to contact any of the current users, get you an account on kde and joining the discussion here, in the mailing list and begging contributing to this article. Volutnerd users are: Stefan Müller (user coordinator)

Project task. All relevant bug reports can be found in In the following is it tried to group them in major tasks, to give the students detailed guidance how to close the bug reports - Separation between tags and faces Many players in the media business, such as Adobe, use the expression tag for anything related to metadata others separating between the different types of metadata. All metadata records are stored in fields (see e.g. photometadata.org) which also often called tags (of the metadata), so a tag is anything what is used in digiKam to filter or search for images, e.g. keywords, colour label, star rating etc... Thus there is to much space for interpretation what leads to all these questions due to irritations caused by the use of the word tag. In order to lower the entry hurdle into the world of tagging I would suggest to be consistent with the offical wording, thus new users won't be confused by this. That means that the text for the tag will be named keyword, so on the source selection pane on the left will be Keywords and in filter pane on the right it will say Keywords Filter. The description shall rather say close to digiKam deals with metadata, grouped in (tags of): keywords, label, date and location.

- Ensure that all relevant metadata fields are filled (by Stefan Müller) At the end, as soon as a name is confirmed digiKam writes the data to the MP and MWG namespace of the XMP records, it sets a name and area. More Details about those namespaces can be found here: • Microsoft MP Tags (ExifTool) • Microsoft MP Tags (Exif2.org) • MWG Regions Tags (ExifTool) • MWG Regions Tags (Exif2.org) as Apple and Adobe write their information in the MWG namespace, it I would say that MWG is the leading namespace but inconsistent may lead to unexpected behaviour of the applications what reads them. In my understanding these information should also be written to the IPTC Person structure as mentioned in the IPTC Photo Metadata User Guide (Persons Depicted in the Image), but is not. It need to be clarified and documented why that does not happen any may corrected. Link face region with face name properly In order to make images findable by a person's name, the name shall also be written to the keywords field of multiple namespaces, IPTC Photo Metadata User Guide (Persons Depicted in the Image) recommends caption and keywords. I cannot tell all relevant fields/namespaces. My research tells me that should be at least those: • IPTC ⇒ Keywords, see: iptc.org • XMP acdsee (ACD Systems) Tags ⇒ catergories, see: ExifTool / Exif2.org • XMP digiKam Tags ⇒ Tag List, see: ExifTool / Exif2.org • XMP dc (Dublin Core) Tags ⇒ Subject, see: ExifTool / Exif2.org • XMP lr (Lightroom) Tags ⇒ hierarchicalSubject, see: ExifTool / Exif2.org • XMP MediaPro Tags ⇒ Catalog Sets, see: ExifTool / Exif2.org • Microsoft XMP Tags ⇒Las Keyword XMP, see: ExifTool / Exif2.org • MWG Keywords Tags: ExifTool / Exif2.org

has a field but ignored by digiKam, why?

• XMP prism Tags ⇒ Keyword: ExifTool but to be exclude are: • XMP xmp Tags, as it says non-standard: ExifTool • XMP xmpMM Tags as it says undocumented: ExifTool • XMP pdf Tags: ExifTool-> only for Adobe PDF I reckon there isn't any leading field as a mismatch could lead to inconsistent search result, depending heavily on the used application. It need to be clarified and documented why that does not happen any may corrected.




If I'm correct, what is the source of the list of the people pane on the left? In my opinion there are three options. 1. First, these are the keywords listed below the hierarchy level persons in the keyword list. If the user selects an name it filters images based on the keywords and shows the face area as described in the person related metadata field. 2. Second, digiKam reads the information given in the person related fields of the metadata of each image in this particular case . Afterwards it uses this data to populate the person pane. That would be quite of workload on the CPU and isn't very likely. 3. Third, it stores the information given in person related fields of the metadata of each image in the database recognition.db. Based on the information stored there digiKam knows which images are to be shown. In this case, are the face thumbnails are stored in this database as well or are they derived from each image, based on the region information?