Digikam GSoC 2010 Database Schema Changes
This page is meant to gather ideas and information and the schema changes needed for the three Digikam GSoC projects.
Requirements and Considerations
Whenever possible I would like to preserve upwards compatibility, that is digikam 0.10 - 1.3 should just work with the schema-updated database, even if not using new features.
We will store the version history of a file in an XML format in its metadata. The same information need also be stored in the database.
One approach is to take the history, break it into individual steps, groups steps to changesets, and assign a changeset to two images (describing the development between them). This is a relatively complex design requiring at least three tables. Another problem appears when images in the line of history are deleted; then changesets need to be merged. We need a more simple approach that does not break when images in the line of history disappear.
The simplest approach is to store the history XML, starting from the original, for each picture. The XML could be the same as stored in the metadata, containing hints at intermediate versions. But one advantage we have in the database over metadata is keeping track of file identities over changes of filename, file size or unique hash (the database is simply updated in these cases). We must make use of and preserve this advantage for storing changes as well.
Assigning unique ids (UUIDs) to edited images, as already described, can help us here. We can keep a table of imageid, UUID, history XML where the XML only contains UUID hints at files. Whenever filename/size/uniqueHash change, the database is kept up to date.
For fast searching, I propose additionally a thin table about image relations. For each image, all images it relates to, as original or intermediate, are then available fast and this information can be used for search queries.
For face recognition, we need to assign extra information to a tag and to a tag assignment:
For a tag, we want to add information which person is represented, link with the identifier in the face recognition data collected with libface, and ideally refer to this person in the Nepomuk storage (which will, sooner or later but invariably, give full integration with Akonadi, thus addressbook, email, appointments etc.). I suggest a TagProperties table which can contain arbitrary information about a tag.
For a tag assignment, we want now to tag not the picture as a whole but only a certain region of the picture. We cannot yet know if this part will always be a rectangle, so I suggest not to limit ourselves in the schema in this regard. Instead, we can use SVG-style XML or other text to describe the region(s). Similarly to above, I suggest an ImageTagProperties table which can contain arbitrary information about a tag assignment.
This project can also make use of extra information on a tag, "giving tags a meaning" as it is termed in a long-standing bug report. This includes describing a tag named "Paris" as a City or linking it to the Nepomuk storage (where the Pimo ontology should allow it to be described as a city). A more straightforward option would be to store coordinates along with the tag, and possibly a radius.
Miscellaneous bug reports
Miscellaneous bug reports that request feature which would require changes to the database:
- 149372 Comments on tags
- 181708 Fast tags, which are not written to the database
- 190935 Translations for tags. This would be nice to have, as we could easily make the reverse geocoding look up place names in different languages.
- 192447 Allow tags to have a meaning.
- Another idea on nepomuk integration: Maybe the identifier (URI) of the tag used in Nepomuk could be stored in the image as well? This could be used to rename a tag in the database without updating all the images. Since the tags are then also identified by the identifier, displayed and stored tag name can be different. Quite useful when someone marries and changes his/her name (this was requested on the mailing list once).
CREATE TABLE ImageHistory
(imageid INTEGER PRIMARY KEY,
CREATE INDEX uuid_index ON ImageHistory (uuid);
CREATE TABLE ImageRelations
CREATE INDEX subject_relations_index ON ImageRelations (subject);
CREATE INDEX subject_relations_index ON ImageRelations (object);
I am pondering whether to use INTEGER or TEXT for the type attribute.
If we want a small and concise table, INTEGER seems better currently.
CREATE TABLE TagProperties
CREATE INDEX tagproperties_index ON TagProperties (tagid);
CREATE TABLE ImageTagProperties
CREATE INDEX imagetagproperties_index ON ImageTagProperties (imageid, tagid);
There is deliberately no UNIQUE constraint on (tagid), and (imageid, tagid) respectively, to allow multiple entries per tag or assignment.
|Title in different languages
|Long description of the tag
|Center coordinate for the tag
|Polygon outlining the shape of the tag
|Libkface id, if necessary
|"Database-only" tag which should not be written to images
|Specify the level of privacy under which images with this tag should be shown
|URL associated with a tag
|Date associated with a tag
|Old title of this tag, to be able to correctly recognize this tag in images where the tag was written before it was renamed.
|Link to nepomuk id of the tag
|IDs of this tag given by other digikam instances, to make it possible to have the tag recognized when other people create tag ids on their computers, which may have different titles, but refer to the same "thing"
|Outline of an area
- How do we store region annotation which are not associated with a tag? For example, if the user describes something other than a person, for which he just wants some sort of description, but no tag?