PIM/Development/SingleFileResourceRefactoring: Difference between revisions

From KDE Community Wiki
< PIM
m (10 revisions imported)
 
(7 intermediate revisions by one other user not shown)
Line 8: Line 8:


=== General ===
=== General ===
* The file parsers should never store the complete data in memory
* The file resources should never store the complete data in memory
* When items are requested from the file parsers it always should read from the file it is initialized for.
* When items are requested from the file resource it always should read from the file it is initialized for
* The SFR only reads the file first at retrieveItems.
 
Rational: Data caching is the responsibility of the Akonadi server, controlled by Cache policies. Permanently keeping data in resource memory wastes memory and is a bad example for developers interested in creating resources who might be looking for example code.
 
* The SFR only reads the file at retrieveItems
 
Rational: Akonadi calls this method when it needs items. If it doesn't call it, it doesn't need them. The need could be a client explicitly fetching the items or cache policy interval checking.
 
If it is deemed necessary to have all items loaded once a resource is configured, we can always just call synchronize() instead of synchronizeCollectionTree(), thus getting the retrieveItems() call.


=== On item retrieval ===
=== On item retrieval ===
* Retrieve item first makes a copy of the file (i.e. as done now for remote files but, now its done always)
* Make loading always download the file independent of whether it is remote or local
* Create items from the copied file
 
Rational: a copy of the data file is needed to be able to detect/extract changes on item level when a file change notification happens. As a bonus handling of remote files does no longer need separate code paths.
 
* Create items from the downloaded file
* Compare with items from current working copy
* Call itemsRetrieved
* Call itemsRetrieved
* Make downloaded file the new working copy


=== On file change ===
=== On file change ===
Line 21: Line 33:
* read both copies, compare item by item, using any dirty item in favor of the respective one from copy 1.
* read both copies, compare item by item, using any dirty item in favor of the respective one from copy 1.
* do not call synchronize, modify Akonadi like any resource with item level change notification.
* do not call synchronize, modify Akonadi like any resource with item level change notification.
Rational: does not require ItemSync, usually only very few changes, e.g. non-Akonadi application modifying it.
* finally remove copy 1 and make copy 2 the new copy 1
* finally remove copy 1 and make copy 2 the new copy 1
Or
* Call clearCache()
Rational: if no load has happend yet, we don't have to reload. If items have been loaded before, clearing the cache means the server will request them when necessary.


=== Drawbacks ===
=== Drawbacks ===
* Two files are parsed on fileChanged()
* Potentially necessay to parse two files
* Slower retrieveItems(), difficult to implement retrieveItem() if no aided by offset information or similar


=== Advantages ===
=== Advantages ===
* Reduced memory usage
* Reduced memory usage
* if dirty items are not part of the change, no need to backup or panic ???
* Changes which have not been written to the file yet only result in a conflict if the respective item is either changed between working copy and downloaded file or no longer present in downloaded file
* Better data consistency in case files are changed by non-Akonadi programs.
* Better data consistency in case files are changed by non-Akonadi programs
 
== Implementation Ideas ==
 
=== Asynchronous Parsing ===
 
Make file reading/parsing ItemFetchJob like, i.e. create type specific file parse job and let it emit Items.
 
Simple implementation: emit items with full payload
Advanced implementation: allow emitting of basic items, e.g. when processing the MBOX working copy which has offsets into the file and can read messages on demand
 
=== Parallel Parsing ===
 
Process working copy and downloaded file in parallel.
Have to hashes remoteId->item, one for each file.
 
For each item:
* Check for respective item in other hash
** On miss, put into own hash
** On match check for change
*** Report to Akonadi
*** Remove from other hash
 
When both parsers are finished, re-process new item hash, this time also remove items from own hash if found in other.
All items remaining in working copy hash have been removed in new file. All items remaining in new item hash, have been added.

Latest revision as of 13:01, 11 March 2016

SingleFileResource Refactoring

Problem Statement

The current (KDE 4.[3,4]) design of the SingleFileResource (SFR) is heavily based on the ical/vcard resources. Those two read the whole file at once and keep the entries in memory as long as they exists. However this is not really in line with the Akonadi design, as Akonadi itself keeps items in chache also and has advanced cache policies even. Also, other resources, like the mbox resource don't keep the whole file in memory but only store pointers to the beginning of each new entry. These two concepts conflict with each other as soon as Akonadi detects that the file on disk has changed. Currently SFR makes a backup of the file by calling writeFile() with a different file name. But this can only succeed correctly if the resource has all data in memory. This is not the case for the mbox resource (and this shouldn't be the case for the other SFR based resources in the future) which will result in a backup file that partly consist of the new data and partly the old data (if it succeeds at all).

Suggested Changes

General

  • The file resources should never store the complete data in memory
  • When items are requested from the file resource it always should read from the file it is initialized for

Rational: Data caching is the responsibility of the Akonadi server, controlled by Cache policies. Permanently keeping data in resource memory wastes memory and is a bad example for developers interested in creating resources who might be looking for example code.

  • The SFR only reads the file at retrieveItems

Rational: Akonadi calls this method when it needs items. If it doesn't call it, it doesn't need them. The need could be a client explicitly fetching the items or cache policy interval checking.

If it is deemed necessary to have all items loaded once a resource is configured, we can always just call synchronize() instead of synchronizeCollectionTree(), thus getting the retrieveItems() call.

On item retrieval

  • Make loading always download the file independent of whether it is remote or local

Rational: a copy of the data file is needed to be able to detect/extract changes on item level when a file change notification happens. As a bonus handling of remote files does no longer need separate code paths.

  • Create items from the downloaded file
  • Compare with items from current working copy
  • Call itemsRetrieved
  • Make downloaded file the new working copy

On file change

  • A second copy of the file is created.
  • read both copies, compare item by item, using any dirty item in favor of the respective one from copy 1.
  • do not call synchronize, modify Akonadi like any resource with item level change notification.

Rational: does not require ItemSync, usually only very few changes, e.g. non-Akonadi application modifying it.

  • finally remove copy 1 and make copy 2 the new copy 1

Or

  • Call clearCache()

Rational: if no load has happend yet, we don't have to reload. If items have been loaded before, clearing the cache means the server will request them when necessary.

Drawbacks

  • Potentially necessay to parse two files
  • Slower retrieveItems(), difficult to implement retrieveItem() if no aided by offset information or similar

Advantages

  • Reduced memory usage
  • Changes which have not been written to the file yet only result in a conflict if the respective item is either changed between working copy and downloaded file or no longer present in downloaded file
  • Better data consistency in case files are changed by non-Akonadi programs

Implementation Ideas

Asynchronous Parsing

Make file reading/parsing ItemFetchJob like, i.e. create type specific file parse job and let it emit Items.

Simple implementation: emit items with full payload Advanced implementation: allow emitting of basic items, e.g. when processing the MBOX working copy which has offsets into the file and can read messages on demand

Parallel Parsing

Process working copy and downloaded file in parallel. Have to hashes remoteId->item, one for each file.

For each item:

  • Check for respective item in other hash
    • On miss, put into own hash
    • On match check for change
      • Report to Akonadi
      • Remove from other hash

When both parsers are finished, re-process new item hash, this time also remove items from own hash if found in other. All items remaining in working copy hash have been removed in new file. All items remaining in new item hash, have been added.