GSoc/2023/StatusReports/QuocHungTran

From KDE Community Wiki

Add Automatic Tags Assignment Tools and Improve Face Recognition Engine for digiKam

digiKam is an advanced open-source digital photo management application that runs on Linux, Windows, and macOS. The application provides a comprehensive set of tools for importing, managing, editing, and sharing photos and raw files.

The goal of this project is to develop a deep learning model that can recognize various categories of objects, scenes, and events in digital photos, and generate corresponding keywords that can be stored in Digikam's database and assigned to each photo automatically. The model should be able to recognize objects such as animals, plants, and vehicles, scenes such as beaches, mountains, and cities,... The model should also be able to handle photos taken in various lighting conditions and from different angles.

Mentors : Gilles Caulier, Maik Qualmann, Thanh Trung Dinh

Project Proposal

Automatic Tags Assignment Tools and Improve Face Recognition Engine for digiKam Proposal

GitLab development branch

gsoc23-autotags-assignment

Contacts

Email: [email protected]

Github: quochungtran

Invent KDE: quochungtran

LinkedIn: https://www.linkedin.com/in/tran-quoc-hung-6362821b3/

Project goals

Links to Blogs and other writing

Main merge request

KDE repository for object detection and face recognition researching

Issue tracker

My blog for GSoC

My entire blog :

May 29 to June 11 (Week 1 - 2) - Experimentation on COCO dataset

In this phase, I focus mainly on offline analysis, this analysis aims to create a Deep learning pipeline for object detection model

DONE

  • Constructing data sets (training dataset, validation dataset and testing dataset) firsly in some common kind of objects as person, bicycle, car.
  • Preprocessing data, studying about construct of COCO dataset which is used for training dataset and validation dataset.
  • Research and create model pipeline for all YOLO version in python.
  • Evaluate performance of YOLO methode by considering some evaluated metrics.

TODO

Construct of COCO dataset format

The Common Object in Context (COCO) is one of the most popular large-scale labeled image datasets available for public use. It represents a handful of objects we encounter on a daily basis and contains image annotations in 80 categories, with over 1.5 million object instances. You can explore COCO dataset by visiting SuperAnnotate’s respective dataset section.

COCO stores data in a JSON file formatted by info, licenses, categories, images, and annotations. For downloading COCO dataset reason, I used the instances_train2017.json and instances_val2017.json files.

```

   "info": {
       "year": "2021",
       "version": "1.0",
       "description": "Exported from FiftyOne",
       "contributor": "Voxel51",
       "url": "https://fiftyone.ai",
       "date_created": "2021-01-19T09:48:27"
   },
   "licenses": [
       {
         "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
         "id": 1,
         "name": "Attribution-NonCommercial-ShareAlike License"
       },
       ...   
   ],
   "categories": [
       ...
       {
           "id": 2,
           "name": "cat",
           "supercategory": "animal"
       },
       ...
   ],
   "images": [
       {
           "id": 0,
           "license": 1,
           "file_name": "<filename0>.<ext>",
           "height": 480,
           "width": 640,
           "date_captured": null
       },
       ...
   ],
   "annotations": [
       {
           "id": 0,
           "image_id": 0,
           "category_id": 2,
           "bbox": [260, 177, 231, 199],
           "segmentation": [...],
           "area": 45969,
           "iscrowd": 0
       },
       ...
   ]

```

So to extract necessary information, I have used the COCO API who assists in loading, parsing, and visualizing annotations in COCO. The API supports multiple annotation formats

APIs Description
getImgIdsGet Get img ids that satisfy given filter conditions.
getCatIdsGet Get cat ids that satisfy given filter condition
getAnnIdsGet Get ann ids that satisfy given filter conditions.

Firstly, I focus on some common kind of objects need to be used for bench marking the model including person, bicycle and car. In term of these subcategories, currently there are 1101 training images, wheres there are 45 validation images.

For testing dataset, I would like to labeling manually by utilizing customing dataset from user. This use case will be the real case.

You can find some samples in training dataset below. In fact in each image there are plenty of objects annotaions format under the form of bounding box x, y, w, h where (x, y) is coordinate of the top left corner of the box and w, h the width and the height of the box.

YOLO model pipeline

Load the version YOLO network

So we can see that YOLO — You Only Look Once — is an extremely fast multi object detection algorithm which uses convolutional neural network (CNN) to detect and identify objects.

I would like to build a pipeline for YOLO detection in Python first. In order to load the model I have to download the pre-trained YOLO weight file and also the YOLO configuration file. Here I use version v3 in the first step:

This YOLO neural network has 254 elements consist of convolutional layers (conv), rectifier linear units (relu) etc.

  net = cv.dnn.readNetFromDarknet('yolov3.cfg', 'yolov3.weights')
Create a blob

The input to the network is a so-called blob object. The function cv.dnn.blobFromImage(img, scale, size, mean) transforms the image into a blob. In fact, this process is considered as processing data, to obtain (correct) predictions from deep neural networks we first need to preprocess our data.

These two functions perform:

  • Resizing: It resizes the input image to a specific size required by the model. Deep learning models often have fixed input sizes, and the blobFromImage function ensures that the image is resized to match these requirements.
  • Normalizing image: Dividing the image by 255 ensures that the pixel values are scaled between 0 and 1, can help ensure that gradients during the backpropagation process are within a reasonable range. This can aid in more stable and efficient convergence during training.
  • Mean Subtraction: It subtracts the mean values from the image. Mean subtraction helps in normalizing the pixel values and removing the average color intensity. The mean values used for subtraction are usually pre-defined based on the dataset used to train the model.
  • Channel Swapping: It reorders the color channels of the image. Deep learning models often expect images in a specific channel order, such as RGB (Red, Green, Blue). If the input image has a different channel order, the blobFromImage function swaps the channels accordingly.

Here I set the scale factor equal to 1/255. This factor helps to keep the lightness of an image is the same as the original.

(Week 3 - 4)
(Week 5 - 6)
(Week 7 - 8)
(Week 9 - 10)
(Week 11 - 12)