We begin with a little story, explaining how all the digiKam face recognition related features became a GSoC project.
All began in early 2018 as the thread either face recognition screen is buggy or I still don't understand it - at least I can say that more convenient bulk change of face tags (no auto refresh/set faces via context menu) is neccessary took off. Eventually, it found its course in early 2019 what convinced the maintainer of digiKam to refurbish these features earlier than originally considered. The post what made this change was written on the 01.Feb.2019 and describes quite well what has to be polished and redesigned, respectively. If you read the post, you will notice that it content goes beyond the pure face management workflow.
the overall face detection, recognition and management workflow
Before this article goes into the details, an overall description of all involved parts is given in corresponding order.
- the faces detection
It is a group of algorithms to analyse the content of images, identify the distinctive regions such as eyes, nose, mouth, etc. Most of them are OpenCV based, and work mostly fine in the background (excepted some technical issues with OpenGL cards acceleration used by OpenCV which introduce instability, but it's another challenge). These algorithms generate region where a face can be found, typically a rectangle. These areas are written as digikam internal information in digiKams core database. That information will not be added to the metadata of the images yet as this happens during the face recognition workflow, what is explained further down.
- the faces detection
This introduces the four different methods based on different algorithms, more and less functional. The goal is to be able to recognize automatically a non-tagged face from images, using previous face tags registered in the database. The algorithms are complex but explained in more details in the wiki page for the GSoC faces recognition project. The 4 different methods are explained here in brief only, a more detailed description can be found in Digikam/GSoC2019/AIFaceRecognition
- Deep Neural Network (DNN) Dlib C++ Library
DigiKam has already an experimental implementation of Neural Network to perform faces recognition what is rather proof of concept than a production-ready function. This DNN is based on the Dlib implementation in OpenFace project.
- OpenCV - Local Binary Patterns Histograms (LBPH)
This is the most complete implementation of a face detection algorithm. Moreover, it is the oldest implementation of such an algorithm in digiKam. It's not perfect and requires at least six faces already tagged manually by the user to identify the same faces in non-tagged images.
- OpenCV - Eigen Faces
An alternative algorithm what uses the OpenCV backend. It was introduced to have a different source of results for face detection, enabling to proof the DNN approaches.
- OpenCV - Fisher Face
Another algorithm what uses the OpenCV backend. It was introduced for the same purposes as Eigen Faces.
According to rumours, this one is not finalized, it is said that not all methods are implemented.
- Deep Neural Network (DNN) Dlib C++ Library
- The faces workflow
This is the actual subject of this article where the search for a student(s) for the GSoC 2019 is ongoing. There are not any complex algorithms involved here.
That is where we switch from the backend, the digiKam core, to the frontend, the GUI. There are numberless posts what could be improved or what is missing. The goal is to answer all those to make the entire workflow flawless, allowing to be widely accepted and enjoyed by the users. This requires some significant effort to assist and guide the student(s) to achieve the desired outcome. As that is related to coding the maintainers would like to so the community to take over here, to take off workload from them and enabling us users to steer the process from a user perspective. The maintainer would only ensure the quality of the code.
The overall face workflow will not change that much, the changes are mainly under the hood, as mentioned in the chapter above. The process is
- suggest faces
- user confirms / correct
but there are many ways to achieve this. That is the place where the hard work begins. The following section tries to give guidance to the entire retrofit process aiming at collecting, outlining and streamlining all suggestions to ensure consistency and intuitive face workflow.
I mentioned that, since as far as I know that content of person-related metadata fields are not taken into account when you search or filter a collection by certain keywords. Thus, in order to make the names findable by digiKam, the name has to be added to the keywords related metadata fields to make the magic happen.
- As a maintainer
As a maintainer, you are responsible to know the digiKam to source code and check pull requests of the student(s) before they are merged into the master. In addition, you are the contact person for questions of the student(s) in regard to the code and ensure that the student(s)’s documentation is satisfactory.
- As Student
Typically, the student must review all bugzilla entries that listed below and in separated subsection created by the maintainers, some of them were widley discusse in the mainling list. If this page does not provide enough guidance, the student(s) must identify the top level entries to engage but with help by the listed mentors. Besides coding, it is required a technical proposal, where to list : the problematic, the code to patch, the coding tasks, the tests, the plan for this summer, and of course the documentation to write (mostly in code), etc. while this summer, the student must analyze the code in details, identify the problems, and start to patch implementations. While these stage, he will ask Q about coding, and about functionalities. We must respond to both, and i will respond to code stuff in prior... The student must be autonomous technically. He must search responses by himself, and sometime, he can be blocked, an we must guide to the right direction. In all case, i review and validate the codes.. Important : students work in a separate development branch in git. It safe.
- As mentoring User
If you wish you more than welcome to contact any of the current users, get you an account on kde and joining the discussion here, in the mailing list and begging contributing to this article. Volutnerd users are: Stefan Müller (user coordinator)
All relevant bug reports can be found in In the following is it tried to group them in major tasks, to give the students detailed guidance how to close the bug reports - Separation between tags and faces Many players in the media business, such as Adobe, use the expression tag for anything related to metadata others separating between the different types of metadata. All metadata records are stored in fields (see e.g. photometadata.org) which also often called tags (of the metadata), so a tag is anything what is used in digiKam to filter or search for images, e.g. keywords, colour label, star rating etc... Thus there is to much space for interpretation what leads to all these questions due to irritations caused by the use of the word tag. In order to lower the entry hurdle into the world of tagging I would suggest to be consistent with the offical wording, thus new users won't be confused by this. That means that the text for the tag will be named keyword, so on the source selection pane on the left will be Keywords and in filter pane on the right it will say Keywords Filter. The description shall rather say close to digiKam deals with metadata, grouped in (tags of): keywords, label, date and location.
- Ensure that all relevant metadata fields are filled (by Stefan Müller) At the end, as soon as a name is confirmed digiKam writes the data to the MP and MWG namespace of the XMP records, it sets a name and area. More Details about those namespaces can be found here: • Microsoft MP Tags (ExifTool) • Microsoft MP Tags (Exif2.org) • MWG Regions Tags (ExifTool) • MWG Regions Tags (Exif2.org) as Apple and Adobe write their information in the MWG namespace, it I would say that MWG is the leading namespace but inconsistent may lead to unexpected behaviour of the applications what reads them. In my understanding these information should also be written to the IPTC Person structure as mentioned in the IPTC Photo Metadata User Guide (Persons Depicted in the Image), but is not. It need to be clarified and documented why that does not happen any may corrected. Link face region with face name properly In order to make images findable by a person's name, the name shall also be written to the keywords field of multiple namespaces, IPTC Photo Metadata User Guide (Persons Depicted in the Image) recommends caption and keywords. I cannot tell all relevant fields/namespaces. My research tells me that should be at least those: • IPTC ⇒ Keywords, see: iptc.org • XMP acdsee (ACD Systems) Tags ⇒ catergories, see: ExifTool / Exif2.org • XMP digiKam Tags ⇒ Tag List, see: ExifTool / Exif2.org • XMP dc (Dublin Core) Tags ⇒ Subject, see: ExifTool / Exif2.org • XMP lr (Lightroom) Tags ⇒ hierarchicalSubject, see: ExifTool / Exif2.org • XMP MediaPro Tags ⇒ Catalog Sets, see: ExifTool / Exif2.org • Microsoft XMP Tags ⇒Las Keyword XMP, see: ExifTool / Exif2.org • MWG Keywords Tags: ExifTool / Exif2.org
has a field but ignored by digiKam, why?
• XMP prism Tags ⇒ Keyword: ExifTool but to be exclude are: • XMP xmp Tags, as it says non-standard: ExifTool • XMP xmpMM Tags as it says undocumented: ExifTool • XMP pdf Tags: ExifTool-> only for Adobe PDF I reckon there isn't any leading field as a mismatch could lead to inconsistent search result, depending heavily on the used application. It need to be clarified and documented why that does not happen any may corrected.
If I'm correct, what is the source of the list of the people pane on the left? In my opinion there are three options. 1. First, these are the keywords listed below the hierarchy level persons in the keyword list. If the user selects an name it filters images based on the keywords and shows the face area as described in the person related metadata field. 2. Second, digiKam reads the information given in the person related fields of the metadata of each image in this particular case . Afterwards it uses this data to populate the person pane. That would be quite of workload on the CPU and isn't very likely. 3. Third, it stores the information given in person related fields of the metadata of each image in the database recognition.db. Based on the information stored there digiKam knows which images are to be shown. In this case, are the face thumbnails are stored in this database as well or are they derived from each image, based on the region information?